Bugzilla – Bug 3599
Job can restart with internal state "Restart"
Last modified: 2005-07-27 09:42:11
You need to
before you can comment on or make changes to this bug.
There is a minor logic bug in the service code that can cause a job that is in
the process of restarting when the container crashes to endlessly try to
restart. Specifically, if the container crashes when restartInternalState is
set to something and internalState is set to "Restart", the code will copy
"Restart" to restartInternalState, obliterating the checkpointed internal state
and causing the state machine to continue to restart the job since the
checkpointed state appears to be "Restart".
My fix worked (hour long recoverability test completed successfully), awaiting
Fix in globus_4_0_branch. It seems this fix was already in the trunk, so no