Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.

Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

Cluster job not saving output file model

Please login with a confirmed email address before reporting spam

Dear All,

I am facing a problem when running my model in a cluster.

I am using a parametric distributed study for a time dependent problem. The model has ~
450000 Dofs. The model runs smoothly if executed in single computer with a single parameter.

However, when using a batch cluster procedure (Linux/fedora), the calculation runs perfectly until the end, but the output file is not saved!

Sometimes, I am getting a message error at the end, which seems an MPI issue!
{
Assertion failed in file ../../socksm.c at line 2576: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
}

and sometimes nothing happens, the CPUs keep running after reaching 100% progress, without saving the output file even after 48h.

It seems that Comsol/cluster cannot handle huge output files, knowing that a single parameter study would generate an output file around 3.5Go.

I have even reduced the number of parameters considered in the study (from 6 to 3 parameters) and the result was the same, no output file saved at the end!!

Moreover, reducing the time range (running just 1 period <=> reducing the output file size) would generate the output file at the end.

However, other simple calculations run smoothly in the cluster and end by saving the output file.
This, just to inform you that the problem is not coming from our cluster!

I wonder if anybody has faced such a problem!
Your comments and suggestions are welcome!!

Cheers



3 Replies Last Post Sep 30, 2011, 9:12 a.m. EDT
Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Sep 28, 2011, 3:59 p.m. EDT
I have not experienced a problem like this, but that may be because I have not generated large output files.

What version of COMSOL are you using ? The current version, 4.2.0.228, corrected several problems I was having while running in cluster mode.

Have you asked COMSOL tech support about this problem ? They may have some insight as to why and how to control, and if others are having a similar problem.

I have not experienced a problem like this, but that may be because I have not generated large output files. What version of COMSOL are you using ? The current version, 4.2.0.228, corrected several problems I was having while running in cluster mode. Have you asked COMSOL tech support about this problem ? They may have some insight as to why and how to control, and if others are having a similar problem.

Ivar KJELBERG COMSOL Multiphysics(r) fan, retired, former "Senior Expert" at CSEM SA (CH)

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Sep 28, 2011, 4:10 p.m. EDT
Hi

I just re-discovered a Windows limitation when saving a large model file to my common Win7 / LINUX disk, that happens to be a FAT32.

The file was too large, I had forgotten about the max file size on FAT32 formatted disks ;)

This might also give an issue if your scratch disk is a FAT32 and the intermediate files become too large (in my case a file of > 5Gb)

--
Good luck
Ivar
Hi I just re-discovered a Windows limitation when saving a large model file to my common Win7 / LINUX disk, that happens to be a FAT32. The file was too large, I had forgotten about the max file size on FAT32 formatted disks ;) This might also give an issue if your scratch disk is a FAT32 and the intermediate files become too large (in my case a file of > 5Gb) -- Good luck Ivar

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Sep 30, 2011, 9:12 a.m. EDT
Hi,

Thanks for your replies!
I am using the latest version 4.2.0.228! I have even tried different platforms and systems (fedora and open suse)

I have asked the Comsol support and the first reply was: "No known solution for this problem exists!"
I have been advised to increase the environment variable I_MPI_DEBUG=5. But since nothing happens after reaching 100% progress, it was impossible to get a debug log!

Personally, I think it's an MPI Intel problem (because of the socksm.c issue). So i decided to explore the MPICH2 world and its connection to the hidden world of Comsol :-) .i.e.,Using the MPICH2 library instead of the Intel MPI.
I could not make it working until now, because of some libraries linking... (even with the help of the Comsol support). It's not that easy!

So I wonder if somebody has already tried to do that!

P.S: If I reduce the DOFs from 450000 to 200000, Comsol would save the file for 4 parameters!

I am still investigating!
Thanks in advance for any comments or suggestions!

Cheers!
Hi, Thanks for your replies! I am using the latest version 4.2.0.228! I have even tried different platforms and systems (fedora and open suse) I have asked the Comsol support and the first reply was: "No known solution for this problem exists!" I have been advised to increase the environment variable I_MPI_DEBUG=5. But since nothing happens after reaching 100% progress, it was impossible to get a debug log! Personally, I think it's an MPI Intel problem (because of the socksm.c issue). So i decided to explore the MPICH2 world and its connection to the hidden world of Comsol :-) .i.e.,Using the MPICH2 library instead of the Intel MPI. I could not make it working until now, because of some libraries linking... (even with the help of the Comsol support). It's not that easy! So I wonder if somebody has already tried to do that! P.S: If I reduce the DOFs from 450000 to 200000, Comsol would save the file for 4 parameters! I am still investigating! Thanks in advance for any comments or suggestions! Cheers!

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.