Bugzilla – Bug 3442
GRAM should use 'mpiexec' before 'mpirun'
Last modified: 2005-08-03 16:46:49
You need to log in before you can comment on or make changes to this bug.
Newer MPI implementations provide mpiexec instead of mpirun as their command for launching processes. In its simplest form, "mpiexec -n <N> <pgm> [args]" is used to launch <N> processes of <pgm>. This is similar to "mpirun -np <N> <pgm>[args]", which is the mechanism used by earlier implementations of MPI. During configuration, GRAM should look for mpiexec first, and only consider mpirun if mpiexec does not exist. Details on mpiexec can be found in Section 6 of http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-doc-user.pdf. Further information can be found in Section 4.1 of the MPI-2 standard (http://www.mpi-forum.org/docs/mpi2-report.pdf). This feature is critical for GT 4.0.1 as it is required for Globus to work with certain builds of MPICH2.
The default device for mpich2 requires a few things to be done by default which aren't things we currently do: - $HOME/.mpd.conf must be created with a secret word - mpd must be started - python2 and the MPIHOME/bin directory must be in the PATH of the script running mpiexec Are these requirements standard and should the job manager scripts be doing these things, or are we to assume that the environment is set properly outside of gram via the RSL or default attribute values for environment, etc? joe
Subject: RE: GRAM should use 'mpiexec' before 'mpirun' |The default device for mpich2 requires a few things to be done by default which |aren't things we currently do: More specifically, the requirements depend on the process manager that was selected at configure time (see --with-pm option). |- $HOME/.mpd.conf must be created with a secret word |- mpd must be started These are only true if the user his own instantiation of MPD, similar in nature to a personal gatekeeper in Globus. If a system wide MPD is running as root, then these actions need not be performed by the user; although, the system administrator is responsible for performing similar tasks. My feeling is that the job manager scripts should not be responsible for starting MPD. If a system wide MPD is not already running, then it is the user's responsibility to start one. Furthermore, the installation of MPICH2 may not be using MPD, and thus would not have the MPD executables installed, making it inappropriate for the job manager scripts to blindly attempt start MPD. So, unless the scripts are going to attempt to handle different types of MPICH2 installations, is seems best to leave the startup of any process management environment to the user / system administrator. |- python2 and the MPIHOME/bin directory must be in the PATH of the script |running mpiexec In the past, I never added MPIHOME/bin to my path; so, assuming you are using the 1.0.1 release, that should not be a requirement. Although, the job manager scripts will need to know where to find mpiexec, but that seems like a given. As for python, I'm quite certain it was in my default path (supplied by softenv), so I wouldn't be surprised if having python in one's path was a requirement, but I can't say for certain that it is necessary. In addition, if MPICH2 is built with shared libraries, LD_LIBRARY_PATH (or equivalent) will need to be set to MPIHOME/lib, but that same problem exists with Globus. As I understand things, the user is required to add $GLOBUS_LOCATION/lib to LD_LIBRARY_PATH in the RSL. If that is true, I suppose it's not unreasonable to force the user to do the same for MPICH2, but it would certainly be nice if that wasn't necessary. --brian
WAIT WAIT WAIT ... We are all heading down the WRONG path. I think the *only* thing that should be done in GRAM is something similar to what is already done in GRAM when using mpirun. If you find mpirun with 'which mpirun' then when (jobtype=mpi) is found in the RSL GRAM launches the job with mpirun. We did not worry about setting up the user's environment to make mpirun run correctly ... that was the user's responsibility. For mpiexec we should do essentially the same thing. If mpiexec is in the path of the person who is configuring Globus then when (jobtype=mpi) is in the RSL we should use mpiexec with the syntax change from mpirun. If mpiexec is not in the path then we should move down to mpirun. As with mpirun, we should put the responsibility of establishing the proper environment so that mpiexec works on the user ... we should NOT be pushing any of that stuff into GRAM. Now, that said, in order to *test* what Joe is doing he will need to know this information so that he can set up his env.
I disagree with your comments, Nick (and not because I've already written the code to deal with this). If we leave these things out of the scripts, there's not really much point of having the mpi jobtype: the user will probably need to know the path to mpiexec and whatever else is used by that and add those to their RSL. If they already know that, why don't they just shove /path/to/mpiexec into their RSL and avoid this stuff altogether. It seems pretty reasonable for the job script to put for its best effort to try to get the job run on the system, using whatever knowledge we have about the job types we support that we have. joe
committed to 4.0 branch and trunk