Bugzilla – Bug 5405
custom extensions in WS GRAM
Last modified: 2008-11-20 18:32:51
You need to
before you can comment on or make changes to this bug.
We are experiencing unusual WS GRAM behavior with a custom extension added to
the SPRUCE JobManager on the UC/ANL TeraGrid cluster. With SPRUCE and WS GRAM,
we are adding an extension called <urgency> to the XML job descriptions. For
MPI based jobs that have np greater than 1, the <count> parameter is not being
honored and the np value of the mpirun line for these jobs is then set to 1.
The behavior can be repeated by adding an <extensions/> tag or an empty
extension block (<extensions></extensions>) to the XML job descriptions.
Tracing the code (the globus-wsrf 4.0.1-r3 distribution the TeraGrid CTSS v3),
we found a totalprocesses field added by the
lib/perl/Globus/GRAM/ExtensionHandler.pm to the JobDescription and this
parameter is overriding the count parameter. We are not using the PBS
extensions, but I think the extension handler assumes that we are using these
extensions if it parses an <extensions> block when we are using the PBS
resource manager (the SPRUCE JobManager for the UC/ANL cluster is derived from
the PBS JobManager). We've removed the use of totalprocesses in the SPRUCE
JobManager and this correctly runs the number of expected processes (but this
makes the PBS extensions unavailable or may not make them function correctly).
If needed, we can supply the XML job description we are using, various output
files from a sample run, and some modifications we made to the
ExtensionHandler.pm file that attempt to correct the issue.
Is the behavior we are seeing expected?
I've fixed this in CVS for the 4.0 branch. The 4.2.x branch and trunk are
unaffected by this.