Bugzilla – Bug 3185
pbs job manager exit code
Last modified: 2006-02-20 15:04:42
You need to log in before you can comment on or make changes to this bug.
The exit code from PBS jobs is passed from the SEG to the GRAM service, but the scripts generated by the PBS perl module do not propage the exit code for multiple jobs. Applying a similar technique as was done in the lsf script will catch exit codes for multiple jobs which are not run on a cluster. For the cluster, the exit code needs to be passed from the script run via rsh on the machines in the PBS node file to the script started by PBS.
Created an attachment (id=581) [details] pbs-exit.diff Here's a patch for the first bit (jobtype=multiple && !cluster). The cluster case will probably involve updating the script run on the execution nodes to write the instance exit codes to a file which the script started by pbs will read to determine exit code (as rsh does not propagate exit codes)
Please test exit codes for use with jobtype MPI jobs too.
Exit code handling is in the 4.0 branch now. I've modified the gram scheduler tests to check that it works as well. The script relies on a shared file system for all compute nodes (but that seems to be the case for other parts of the script now anyway) joe
The fixed is merged into the trunk as well.