Bugzilla – Bug 6084
pbs.pm does not recognize "C" state
Last modified: 2009-01-23 15:13:30
You need to log in before you can comment on or make changes to this bug.
We had to make a minor change to pbs.pm to accommodate the PBS install at Vanderbilt. There is a state "C" which can show up in qstat meaning the job is completed. Apparently with a regular installation of PBS, this state almost never shows up. However, there is an option to allow jobs to remain in "C" for some period of time. This option is used at Vanderbilt with a setting of one hour. This means that the job is not allowed to finish from a GRAM point of view until an hour after it's nominal end time. The patch to pbs.pm's poll() subroutine is simple enough, just return DONE if the job is seen in the "C" state. elsif(/R|E/) { $state = Globus::GRAM::JobState::ACTIVE; } elsif(/C/) { $state = Globus::GRAM::JobState::DONE; }
This change could also be interesting for the scheduler event generator parsing the pbs logs. Do you know how to configure pbs to allow jobs to remain in state "C" for a while?
found a webpage about how to configure pbs to keep jobs in state C ...
It seems that PBS logs a job to be done when it switches the status from R to C: The PBS-SEG treats such a job to be "Done". So in Gram4 a PBS configuration with keeping jobs in state C for a while is not a problem. I don't see a problem with this patch and suggest to apply it. Joe can't see immediate problems either. Probably it can't make it into 4.2.0 at this point, but 4.0.8 and 4.2.1 seems ok.
Hi Eric, I am not sure if you saw JP questions below. Can you confirm that a job's stdout/err has been transferred by PBS before the "C" state is detectable? If this is going to be put in 4.0.8, this will need to be confirmed. -Stu Begin forwarded message: From: JP Navarro <navarro@mcs.anl.gov> Date: July 18, 2008 4:51:06 PM CDT To: lrma-dev@globus.org Cc: gram-dev <gram-dev@globus.org>, Charles Bacon <bacon@mcs.anl.gov> Subject: [lrma-dev] Re: [gram-dev] Fwd: GRAM PBS Job Manager patch Since they've only been running the patch for a day, should we wait several weeks before integrating it so it's more thoroughly tested. Maybe put it on the TODO before the next GT 4.0.x release? Are there other aspects besides STDOUT/STDERR delivery and exit status that we might want to verify still work? Is job exist status currently fully reported to PreWS and WS GRAM clients? JP
No, I did not get the message (I'm not on the lists). Yes, STDERR and STDOUT are delivered just fine with this patch. As I understand it, the "C" state is just a place keeper so that a user knows their job did run and exit.
TeraGrid would like this fix. Let's get this committed for the next point release.
Committed to 4.0 branch (already in 4.2 and trunk)