Bug 3442 - GRAM should use 'mpiexec' before 'mpirun'
: GRAM should use 'mpiexec' before 'mpirun'
Status: RESOLVED FIXED
: GRAM
general
: 4.0.0
: All All
: P3 blocker
: 4.0.1
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-06-01 06:39 by
Modified: 2005-08-03 16:46 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-06-01 06:39:48
Newer MPI implementations provide mpiexec instead of mpirun as their command 
for launching processes.  In its simplest form, 

"mpiexec -n <N> <pgm> [args]" 

is used to launch <N> processes of <pgm>.  This is similar to 

"mpirun -np <N> <pgm>[args]", 

which is the mechanism used by earlier implementations of MPI.  
During configuration, GRAM should look for mpiexec first, and only 
consider mpirun if mpiexec does not exist.

Details on mpiexec can be found in Section 6 of
http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-doc-user.pdf.  
Further information can be found in Section 4.1 of the MPI-2 standard
(http://www.mpi-forum.org/docs/mpi2-report.pdf).

This feature is critical for GT 4.0.1 as it is required for Globus to work with
certain builds of MPICH2.
------- Comment #1 From 2005-06-03 10:23:29 -------
The default device for mpich2 requires a few things to be done by default which
aren't things we currently do:
- $HOME/.mpd.conf must be created with a secret word
- mpd must be started
- python2 and the MPIHOME/bin directory must be in the PATH of the script
running mpiexec

Are these requirements standard and should the job manager scripts be doing
these things, or are we to assume that the environment is set properly outside
of gram via the RSL or default attribute values for environment, etc?

joe
------- Comment #2 From 2005-06-03 12:10:02 -------
Subject: RE:  GRAM should use 'mpiexec' before 'mpirun'

|The default device for mpich2 requires a few things to be done by default which
|aren't things we currently do:

More specifically, the requirements depend on the process manager that was
selected at configure time (see --with-pm option).

|- $HOME/.mpd.conf must be created with a secret word
|- mpd must be started

These are only true if the user his own instantiation of MPD, similar in nature
to a personal gatekeeper in Globus.  If a system wide MPD is running as root,
then these actions need not be performed by the user; although, the system
administrator is responsible for performing similar tasks.

My feeling is that the job manager scripts should not be responsible for
starting MPD.  If a system wide MPD is not already running, then it is the
user's responsibility to start one.  Furthermore, the installation of MPICH2 may
not be using MPD, and thus would not have the MPD executables installed, making
it inappropriate for the job manager scripts to blindly attempt start MPD.  So,
unless the scripts are going to attempt to handle different types of MPICH2
installations, is seems best to leave the startup of any process management
environment to the user / system administrator.

|- python2 and the MPIHOME/bin directory must be in the PATH of the script
|running mpiexec

In the past, I never added MPIHOME/bin to my path; so, assuming you are using
the 1.0.1 release, that should not be a requirement.  Although, the job manager
scripts will need to know where to find mpiexec, but that seems like a given.

As for python, I'm quite certain it was in my default path (supplied by
softenv), so I wouldn't be surprised if having python in one's path was a
requirement, but I can't say for certain that it is necessary.

In addition, if MPICH2 is built with shared libraries, LD_LIBRARY_PATH (or
equivalent) will need to be set to MPIHOME/lib, but that same problem exists
with Globus.  As I understand things, the user is required to add
$GLOBUS_LOCATION/lib to LD_LIBRARY_PATH in the RSL.  If that is true, I suppose
it's not unreasonable to force the user to do the same for MPICH2, but it would
certainly be nice if that wasn't necessary.

--brian
 

------- Comment #3 From 2005-06-03 15:20:02 -------
WAIT WAIT WAIT ... We are all heading down the WRONG path.

I think the *only* thing that should be done in GRAM is something similar to what
is already done in GRAM when using mpirun.  If you find mpirun with 'which mpirun'
then when (jobtype=mpi) is found in the RSL GRAM launches the job with mpirun.
We did not worry about setting up the user's environment to make mpirun run 
correctly ... that was the user's responsibility.

For mpiexec we should do essentially the same thing.  If mpiexec is in the path
of the person who is configuring Globus then when (jobtype=mpi) is in the RSL
we should use mpiexec with the syntax change from mpirun.  If mpiexec is
not in the path then we should move down to mpirun.

As with mpirun, we should put the responsibility of establishing the proper 
environment so that mpiexec works on the user ... we should NOT be pushing
any of that stuff into GRAM.

Now, that said, in order to *test* what Joe is doing he will need to know this 
information so that he can set up his env.
------- Comment #4 From 2005-06-03 15:39:43 -------
I disagree with your comments, Nick (and not because I've already written the
code to deal with this). If we leave these things out of the scripts, there's
not really much point of having the mpi jobtype: the user will probably need to
know the path to mpiexec and whatever else is used by that and add those to
their RSL. If they already know that, why don't they just shove /path/to/mpiexec
into their RSL and avoid this stuff altogether. It seems pretty reasonable for
the job script to put for its best effort to try to get the job run on the
system, using whatever knowledge we have about the job types we support that we
have.

joe
------- Comment #5 From 2005-06-09 14:41:12 -------
committed to 4.0 branch and trunk