Bugzilla – Bug 3599
Job can restart with internal state "Restart"
Last modified: 2005-07-27 09:42:11
You need to log in before you can comment on or make changes to this bug.
There is a minor logic bug in the service code that can cause a job that is in the process of restarting when the container crashes to endlessly try to restart. Specifically, if the container crashes when restartInternalState is set to something and internalState is set to "Restart", the code will copy "Restart" to restartInternalState, obliterating the checkpointed internal state and causing the state machine to continue to restart the job since the checkpointed state appears to be "Restart".
My fix worked (hour long recoverability test completed successfully), awaiting commit approval.
Fix in globus_4_0_branch. It seems this fix was already in the trunk, so no change there.