Bugzilla – Bug 5525
Add support for OSC mpiexec (not MPICH2 command!) to PBS adapter
Last modified: 2012-09-05 11:43:37
You need to
before you can comment on or make changes to this bug.
First of all, I am not referring to the mpiexec command of MPICH2 here. If you
enter 'mpiexec' in Google, the following page comes up first, and this is what
we need: http://www.osc.edu/~pw/mpiexec/index.php
mpiexec is the recommended/comfortable way to run MPI jobs within PBS/TORQUE.
It implements many MPI-version-specific protocols for starting MPI tasks (makin
g it easy to switch from one MPI implementation to another) and supports
correct accounting of CPU time for parallel jobs. Its mode of use is that you
submit a PBS job, specifying your multi-node requirements as usual, and then
within the job script you run mpiexec as shown below:
/opt/mpiexec/bin/mpiexec -np 4 -comm mpich-p4 /path/to/your/mpiapp
If you want to switch from MPICH/Gigabit Ethernet to MVAPICH/InfiniBand:
/opt/mpiexec/bin/mpiexec -np 4 -comm mpich-ib /path/to/your/mpiapp
This is a lot simpler than the rsh/mpirun approach used by the current (4.0.5)
PBS adapter. There is no machinefile to care about. Also, you don't need to
enable password-less rsh access from the front machine to the execution nodes
or between nodes, provided that they share a file system with the front machine
(as is required by GRAM's file staging anyway). Finally, the interface exposed
by mpiexec (as can be seen above) is much simpler than the mpirun/rsh
interface, which would simplify the implementation of the PBS adapter.
This request for enhancement should probably be generalized. The main current
weakness is that the mpirun mechanism in pbs.pm is not configurable enough. In
addition to the already mentioned desirable "out-of-the-box" support for OSC's
'mpiexec', there should be a way to choose among multiple MPI versions
installed at the particular site - without the need to hack pbs.pm or creating
a custom jobmanager script based on pbs.pm.
Having multiple MPI versions is not unrealistic today: in our scenario the
application code has to be built with a supported commercial compiler and the
(Open)MPI library providing the runtime support must be compiled with the same
compiler. However, other users of the same site may wish to compile their
programs with gcc and accordingly use the gcc-compiled version of MPI.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: