Bug 5405 - custom extensions in WS GRAM
: custom extensions in WS GRAM
: 4.0.7
: TeraGrid Linux
: P3 normal
: 4.0.9
Assigned To:
  Show dependency treegraph
Reported: 2007-06-25 11:05 by
Modified: 2008-11-20 18:32 (History)



You need to log in before you can comment on or make changes to this bug.

Description From 2007-06-25 11:05:47
We are experiencing unusual WS GRAM behavior with a custom extension added to
the SPRUCE JobManager on the UC/ANL TeraGrid cluster. With SPRUCE and WS GRAM,
we are adding an extension called <urgency> to the XML job descriptions. For
MPI based jobs that have np greater than 1, the <count> parameter is not being
honored and the np value of the mpirun line for these jobs is then set to 1.
The behavior can be repeated by adding an <extensions/> tag or an empty
extension block (<extensions></extensions>) to the XML job descriptions.

Tracing the code (the globus-wsrf 4.0.1-r3 distribution the TeraGrid CTSS v3),
we found a totalprocesses field added by the
lib/perl/Globus/GRAM/ExtensionHandler.pm to the JobDescription and this
parameter is overriding the count parameter. We are not using the PBS
extensions, but I think the extension handler assumes that we are using these
extensions if it parses an <extensions> block when we are using the PBS
resource manager (the SPRUCE JobManager for the UC/ANL cluster is derived from
the PBS JobManager). We've removed the use of totalprocesses in the SPRUCE
JobManager and this correctly runs the number of expected processes (but this
makes the PBS extensions unavailable or may not make them function correctly).

If needed, we can supply the XML job description we are using, various output
files from a sample run, and some modifications we made to the
ExtensionHandler.pm file that attempt to correct the issue. 

Is the behavior we are seeing expected?
------- Comment #1 From 2008-11-20 18:32:51 -------
I've fixed this in CVS for the 4.0 branch. The 4.2.x branch and trunk are
unaffected by this.