Bugzilla – Bug 5405
custom extensions in WS GRAM
Last modified: 2008-11-20 18:32:51
You need to log in before you can comment on or make changes to this bug.
We are experiencing unusual WS GRAM behavior with a custom extension added to the SPRUCE JobManager on the UC/ANL TeraGrid cluster. With SPRUCE and WS GRAM, we are adding an extension called <urgency> to the XML job descriptions. For MPI based jobs that have np greater than 1, the <count> parameter is not being honored and the np value of the mpirun line for these jobs is then set to 1. The behavior can be repeated by adding an <extensions/> tag or an empty extension block (<extensions></extensions>) to the XML job descriptions. Tracing the code (the globus-wsrf 4.0.1-r3 distribution the TeraGrid CTSS v3), we found a totalprocesses field added by the lib/perl/Globus/GRAM/ExtensionHandler.pm to the JobDescription and this parameter is overriding the count parameter. We are not using the PBS extensions, but I think the extension handler assumes that we are using these extensions if it parses an <extensions> block when we are using the PBS resource manager (the SPRUCE JobManager for the UC/ANL cluster is derived from the PBS JobManager). We've removed the use of totalprocesses in the SPRUCE JobManager and this correctly runs the number of expected processes (but this makes the PBS extensions unavailable or may not make them function correctly). If needed, we can supply the XML job description we are using, various output files from a sample run, and some modifications we made to the ExtensionHandler.pm file that attempt to correct the issue. Is the behavior we are seeing expected?
I've fixed this in CVS for the 4.0 branch. The 4.2.x branch and trunk are unaffected by this.