Bugzilla – Bug 4684
Loading persisted jobs with expired delegation resources causes stacktraces
Last modified: 2012-09-05 11:43:14
You need to
before you can comment on or make changes to this bug.
We've gotten messages like this a few times:
The trace looks like this:
2005-09-09 15:59:57,628 INFO exec.RunQueue [Thread-3,<clinit>:54] Starting
state machine with 16 run queues.
2005-09-09 15:59:59,274 ERROR delegation.DelegationUtil
[Thread-11,getDelegationResource:253] Error getting delegation resource
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2005-09-09 15:59:59,278 ERROR exec.RunQueue [Thread-11,run:162] Unable to
process state transition.
2005-09-09 16:00:00,777 INFO exec.ManagedExecutableJobHome
[Thread-3,recover:163] Recovered resource with ID
I contend that this should be an error message more like "Error recovering job
<UUID> because the delegated credential associated with it has expired. Either
refresh the credential at <EPR> or remove the file <wherever the persisted
resource is>." and there should be no delegation service stack trace shown.
Refreshing expired delegation EPR is not feasible. The lifetime of the
delegation resource (to which the EPR points to), is set to the lifetime of the
delegated credential. So if a the credential is not refreshed prior to
expiration the resource is deleted and the EPR is not valid any more.
Also, when the resource is deleted, the persisted file is removed. So the user
can neither refresh EPR nor is there a file to clean up. If GRAM still needs
the delegated credential to finish the job, it is going to fail.
I agree about not printing the trace and suggest we print a warning that the
credential associated with that recovered job has expired and attempt to finish
Stu and I have been talking about adding a section to GRAM documentation about
solutions to renewing the delegated credentials automatically, but not much we
can do at this point if the credential expires when the container is down.
Fair enough regarding the refresh - I think the main issue here is that GRAM
should be handling what to do in the "expired deleg resource + not expired job"
and not throwing scare-inducing ERROR stacktraces for a situation that comes up
My 2 cents. I have seen this one occasionally, but cannot remember what affect
this has on the submitted job. If this causes the job to not complete as
intended, then it should be a ERROR rather than a WARNING. If it has no
affect, then who cares and I agree that GRAM should handle it internally. As
for the stack traces, they should never occur unless it is a problem that was
NEVER anticipated. A meaningful ERROR message for anticipated conditions is
best with emphasis on "meaningful".
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: