Bugzilla – Bug 5525
Add support for OSC mpiexec (not MPICH2 command!) to PBS adapter
Last modified: 2012-09-05 11:43:37
You need to log in before you can comment on or make changes to this bug.
First of all, I am not referring to the mpiexec command of MPICH2 here. If you enter 'mpiexec' in Google, the following page comes up first, and this is what we need: http://www.osc.edu/~pw/mpiexec/index.php mpiexec is the recommended/comfortable way to run MPI jobs within PBS/TORQUE. It implements many MPI-version-specific protocols for starting MPI tasks (makin g it easy to switch from one MPI implementation to another) and supports correct accounting of CPU time for parallel jobs. Its mode of use is that you submit a PBS job, specifying your multi-node requirements as usual, and then within the job script you run mpiexec as shown below: /opt/mpiexec/bin/mpiexec -np 4 -comm mpich-p4 /path/to/your/mpiapp If you want to switch from MPICH/Gigabit Ethernet to MVAPICH/InfiniBand: /opt/mpiexec/bin/mpiexec -np 4 -comm mpich-ib /path/to/your/mpiapp This is a lot simpler than the rsh/mpirun approach used by the current (4.0.5) PBS adapter. There is no machinefile to care about. Also, you don't need to enable password-less rsh access from the front machine to the execution nodes or between nodes, provided that they share a file system with the front machine (as is required by GRAM's file staging anyway). Finally, the interface exposed by mpiexec (as can be seen above) is much simpler than the mpirun/rsh interface, which would simplify the implementation of the PBS adapter.
This request for enhancement should probably be generalized. The main current weakness is that the mpirun mechanism in pbs.pm is not configurable enough. In addition to the already mentioned desirable "out-of-the-box" support for OSC's 'mpiexec', there should be a way to choose among multiple MPI versions installed at the particular site - without the need to hack pbs.pm or creating a custom jobmanager script based on pbs.pm. Having multiple MPI versions is not unrealistic today: in our scenario the application code has to be built with a supported commercial compiler and the (Open)MPI library providing the runtime support must be compiled with the same compiler. However, other users of the same site may wish to compile their programs with gcc and accordingly use the gcc-compiled version of MPI.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363