Bugzilla – Bug 5308
gwd doesn't recover AIDs and TIDs
Last modified: 2008-12-22 04:40:33
You need to log in before you can comment on or make changes to this bug.
If gwd crashes and then is asked to recover past jobs, it will loose all the information regarding Array and Task IDs.
More problems with job arrays recovery, received on the mailing list by Emir Imamagic <eimamagi@srce.hr> ---- we're using GridWay 5.3 in multiuser environment. We find job array functionality with PARAM variable extremely useful. However, we noticed several problems with recovery of job arrays. All other jobs recovered fine but in case of job arrays gwd service simply hangs. Last message in the log was: Recovering job 0. Recovering job 97. Transfer and execution MADs for users are started but also just hang. There is no useful message in job.logs. Also gwd doesn't react to TERM signal and has to be put down with KILL. Only way to make GridWay start again is to delete jobs from array. Also, in one case after we removed all jobs from an job array which was previously rescheduled GridWay managed to recover jobs from a second job array which wasn't rescheduled. However, we didn't have chance to reproduce this later so I can't confirm that this is a rule or just luck. Bigger problem is that in case when GridWay recovered jobs from job array, ID and PARAM values of all jobs were set to 0. So, even if recovered they were useless and we had to put them down. ------