Bugzilla – Bug 5383
GRAM recovery: jobs with staging fail if they already passed the staging state.
Last modified: 2007-06-16 11:48:19
You need to
before you can comment on or make changes to this bug.
The following explains what happens for stageIn, but is also valid
for stageOut and fileCleanUp.
Jobs with file staging fail in recovery if they are reloaded and
already passed the state StageIn.
Reason: In StateMachine.processRestartSate() a check is done whether
a transferEndpoint is set in the job resource or is null. If one is
set, then the transfer resource will be started.
(A potentially repeatedly called start is catched and ignored)
With audit logging in 4.0.5 we don't want to loose the transfer
endpoints because we want to have them in the audit records. So we
don't nullify them after a transfer is done. So in case of a restart
of a resource there will be a transfer endpoint even if the job is
already in state Submit and thus the transfer will be started.
But since the job is already in state Submit the RFT resource has
already been deleted in state StageInResponse and we get a
NoSuchResource-Exception which causes the job to fail.
add a new check in StateMachine.restart():
check if transferEndpoint is not equal to null AND the job
resource is in state stageIn
(accordingly for stageOut, fileCleanUp)
Committed the fix to TRUNK and the 4.0 branch