Bugzilla – Bug 4684
Loading persisted jobs with expired delegation resources causes stacktraces
Last modified: 2012-09-05 11:43:14
You need to log in before you can comment on or make changes to this bug.
We've gotten messages like this a few times: http://www-unix.globus.org/mail_archive/gt4-friends/2005/09/msg00058.html The trace looks like this: 2005-09-09 15:59:57,628 INFO exec.RunQueue [Thread-3,<clinit>:54] Starting state machine with 16 run queues. 2005-09-09 15:59:59,274 ERROR delegation.DelegationUtil [Thread-11,getDelegationResource:253] Error getting delegation resource org.globus.wsrf.NoSuchResourceException at org.globus.delegation.service.DelegationResource.load(DelegationResource.java:395) at org.globus.delegation.service.DelegationHome.find(DelegationHome.java:53) at org.globus.delegation.DelegationUtil.getDelegationResource(DelegationUtil.java:251) at org.globus.delegation.DelegationUtil.registerDelegationListener(DelegationUtil.java:166) at org.globus.exec.service.utils.DelegatedCredential.getDelegatedCredential(DelegatedCredential.java:178) at org.globus.exec.service.utils.DelegatedCredential.getDelegatedCredential(DelegatedCredential.java:79) at org.globus.exec.service.job.ManagedJobResourceImpl.getStagingCredential(ManagedJobResourceImpl.java:476) at org.globus.exec.service.exec.StateMachine.processRestartState(StateMachine.java:682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:364) at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:158) 2005-09-09 15:59:59,278 ERROR exec.RunQueue [Thread-11,run:162] Unable to process state transition. 2005-09-09 16:00:00,777 INFO exec.ManagedExecutableJobHome [Thread-3,recover:163] Recovered resource with ID b8958da2-20bc-11da-b66f-00301bab9d00. I contend that this should be an error message more like "Error recovering job <UUID> because the delegated credential associated with it has expired. Either refresh the credential at <EPR> or remove the file <wherever the persisted resource is>." and there should be no delegation service stack trace shown.
Refreshing expired delegation EPR is not feasible. The lifetime of the delegation resource (to which the EPR points to), is set to the lifetime of the delegated credential. So if a the credential is not refreshed prior to expiration the resource is deleted and the EPR is not valid any more. Also, when the resource is deleted, the persisted file is removed. So the user can neither refresh EPR nor is there a file to clean up. If GRAM still needs the delegated credential to finish the job, it is going to fail. I agree about not printing the trace and suggest we print a warning that the credential associated with that recovered job has expired and attempt to finish the job. Stu and I have been talking about adding a section to GRAM documentation about solutions to renewing the delegated credentials automatically, but not much we can do at this point if the credential expires when the container is down.
Fair enough regarding the refresh - I think the main issue here is that GRAM should be handling what to do in the "expired deleg resource + not expired job" and not throwing scare-inducing ERROR stacktraces for a situation that comes up pretty regularly.
My 2 cents. I have seen this one occasionally, but cannot remember what affect this has on the submitted job. If this causes the job to not complete as intended, then it should be a ERROR rather than a WARNING. If it has no affect, then who cares and I agree that GRAM should handle it internally. As for the stack traces, they should never occur unless it is a problem that was NEVER anticipated. A meaningful ERROR message for anticipated conditions is best with emphasis on "meaningful".
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363