Bugzilla – Bug 3342
Reliable Job Create isn't entirely reliable
Last modified: 2005-08-03 17:11:24
You need to log in before you can comment on or make changes to this bug.
I committed a reliable create test to the CVS trunk which attempts to submit a job with the same Job Description and Job ID multiple times. The test fails with the GT4 ManagedJobFactoryService sometimes. It looks like the single-creation logic is not threadsafe. I'll attach a patch I have which seems to fix this problem. I'd appreciate feedback from Peter before committing this. joe
Created an attachment (id=607) [details] reliable-create.diff Patch to fix reliable creation based on JobID.
Hmm. This sucks because I think it's going to degrade throughput tremendously. I believe this is where I removed a synch block before to improve throughput because it didn't seem necessary. That's primarily how we met our performance goal for throughput. I guess we'll have to think of something else for 4.2.
Can you run the performance test with and without this patch for comparison?
Sure. I'll put up a set of web pages and paste the link here when I'm done.
An alternative would be to have the code stick the job id into the resource home before initializing the resource and then do resource intialization outside of the lock. I'm not sure if that would create other troubles.
There would need to be some way of mapping the job ID to the job description object to avoid the seriously bad race condition where I might get my resource assigned to someone else's job ID if a simple queue were used. So instead of a queue of new job IDs have a hashtable of JD->ID entries.
Here are the throughput stats for 4.0.0: http://www-unix.mcs.anl.gov/~lane/Test-reports/Throughput/4.0.0/ Here is a sampling of throughput stats after applying the patch: http://www-unix.mcs.anl.gov/~lane/Test-reports/Throughput/bug_3342/ Fortunately the patch didn't seem to affect throughput at all, so I'm fine with seeing it committed. Joe, I'll reassign this back to you for you to close when it's comitted.
Patch comitted to trunk and globus_4_0_branch.