Bugzilla – Bug 3185
pbs job manager exit code
Last modified: 2006-02-20 15:04:42
You need to
before you can comment on or make changes to this bug.
The exit code from PBS jobs is passed from the SEG to the GRAM service, but
the scripts generated by the PBS perl module do not propage the exit code for
Applying a similar technique as was done in the lsf script will catch exit
codes for multiple jobs which are not run on a cluster. For the cluster, the
exit code needs to be passed from the script run via rsh on the machines in
the PBS node file to the script started by PBS.
Created an attachment (id=581) [details]
Here's a patch for the first bit (jobtype=multiple && !cluster). The cluster
case will probably involve updating the script run on the execution nodes to
write the instance exit codes to a file which the script started by pbs will
read to determine exit code (as rsh does not propagate exit codes)
Please test exit codes for use with jobtype MPI jobs too.
Exit code handling is in the 4.0 branch now. I've modified the gram scheduler
tests to check that it works as well. The script relies on a shared file system
for all compute nodes (but that seems to be the case for other parts of the
script now anyway)
The fixed is merged into the trunk as well.