Bugzilla – Bug 5383
GRAM recovery: jobs with staging fail if they already passed the staging state.
Last modified: 2007-06-16 11:48:19
You need to log in before you can comment on or make changes to this bug.
The following explains what happens for stageIn, but is also valid for stageOut and fileCleanUp. Jobs with file staging fail in recovery if they are reloaded and already passed the state StageIn. Reason: In StateMachine.processRestartSate() a check is done whether a transferEndpoint is set in the job resource or is null. If one is set, then the transfer resource will be started. (A potentially repeatedly called start is catched and ignored) With audit logging in 4.0.5 we don't want to loose the transfer endpoints because we want to have them in the audit records. So we don't nullify them after a transfer is done. So in case of a restart of a resource there will be a transfer endpoint even if the job is already in state Submit and thus the transfer will be started. But since the job is already in state Submit the RFT resource has already been deleted in state StageInResponse and we get a NoSuchResource-Exception which causes the job to fail. Fix: add a new check in StateMachine.restart(): check if transferEndpoint is not equal to null AND the job resource is in state stageIn (accordingly for stageOut, fileCleanUp)
Committed the fix to TRUNK and the 4.0 branch