OSError: Unable to open file, failed import

Frankx9 · 18 January 2023 08:54

Hi,

I was having a bit of trouble in running pyFR on clusters

When in my bash shell I run the following series of commands

conda init bash
source .bashrc
conda activate pyFR
ml gnu8 metis
cd PyFR-Test-Cases/2d-euler-vortex
pyfr import 2d-euler-vortex.msh 2d-euler-vortex.pyfrm
pyfr partition 2 2d-euler-vortex.pyfrm .
mpiexec -n 2 pyfr run -b openmp -p 2d-euler-vortex.pyfrm 2d-euler-vortex.ini

it works just fine and simulation ended correctly.

When I create che same workflow with an sbatch submission I get this error lines:

  fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 1760, sblock->base_addr = 0, stored_eof = 45284)```

The step which fails when running in sbatch is

pyfr import 2d-euler-vortex.msh 2d-euler-vortex.pyfrm

Moreover I have an issue with cuda backend and I’m not sure if it related only to its version (11.1 which is lower than the required one >=11.4 which I cannot update to at the moment)

func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /lib64/libcuda.so: undefined symbol: cuDeviceGetUuid_v2

Any idea on how to solve?

fdw · 18 January 2023 19:45

For the file issue can you please try running h5dump on the file in question (from sbatch) and seeing if it works?

The version requirements for various components are not chosen arbitrarily. We need 11.4 because some functions used by PyFR were only added in 11.4.

Regards, Freddie.

Frankx9 · 18 January 2023 20:32

Running from sbatch

h5dump -d 2d-euler-vortex.msh -o dataset.txt

and I get

h5dump error: unable to open file "dataset.txt"

As it regards cuda error: according to you it is related to cuda version? or it may be something different other than that?

fdw · 18 January 2023 20:39

This suggests that the error is file system related, and not to do with PyFR. You will need to follow up with your system administrator to debug what might be the cause of this.

If you are using a CUDA version below 11.4 you will see errors.

Regards, Freddie.

Frankx9 · 18 January 2023 20:46

I’m not sure about that, if I run from sbatch:

srun -n 1 test.sh

it works just fine (with openmp backend) but I’m limited to 1 process (see Writing MPS to .h5 file in a multi-core process - ITensor Support Q&A), so it should not be file system related

fdw · 18 January 2023 21:00

Can you confirm if you are getting the error when running PyFR or when trying to import a mesh? The former should only be run on a single rank.

Regards, Freddie.

Frankx9 · 18 January 2023 21:03

The error only appears when I try to import the mesh with n>1

In case this could solve the problem, how I can dynamically change from -n 1 for mesh import to n = 100 (for example) when running the simulation?

fdw · 18 January 2023 21:08

Every command except for running a simulation should be performed on a single rank. Import the mesh, partition it, and then submit your batch job on as many ranks as you have used to partition the domain and have this execute pyfr run ....

Regards, Freddie.

Frankx9 · 18 January 2023 21:11

I’ll try and let you know

Regards, Frank

Topic		Replies	Views
Error when partitioning test example 2d Euler Errors	4	58	14 December 2024
Error: read only file system Errors	1	149	15 March 2023
Problem about partitioning mesh file Errors mesh	9	363	14 April 2023
OSError: libxsmm.so: cannot open shared object file Errors libxsmm	5	470	28 February 2023
RuntimeError: Mesh has 2 partitions but running with 1 MPI ranks, with OpenMPI General	1	234	24 May 2015

OSError: Unable to open file, failed import

Related topics