| Summary: | Job description variable fault lost | ||
|---|---|---|---|
| Product: | GRAM | Reporter: | Joe Bester <bester@mcs.anl.gov> |
| Component: | wsrf managed execution job service | Assignee: | Peter Lane <lane@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | lane@mcs.anl.gov, madduri@mcs.anl.gov, smartin@mcs.anl.gov |
| Priority: | P3 | ||
| Version: | 3.9.5 | ||
| Target Milestone: | 4.0.1 | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Bug Depends on: | |||
| Bug Blocks: | 3348 | ||
Fix in gram_bug_2730_branch.
The fix for this does not quite work in the 4.0 branch. See test globus_wsrf_gram_scheduler_test case submit205 failures on fork and most other schedulers. Right now it looks like the resource's fault is set, then the state machine is run in the systemCancel state. This causes some strange behavior because the job wasn't submitted or registered to the JSM. The fault kind of trickles back to the user in a strange form if streaming is being used, but is otherwise lost. I think maybe instead it should just throw the ServiceLevelException fault to the client which called createManagedJob and not create the resource or use the state machine at all.
If I'm remembering this bug correctly, there are problems with just throwing an exception but I can't remember what exactly. This was why I fixed it like I did. I don't understand why the JSM has anything to do with this if the fault occurs before the job is submitted to the scheduler.
Fix in trunk and globus_4_0_branch. There was a bug in the cancel() method whereby the system cancel service data wasn't being set.