Bugzilla – Bug 3342
Reliable Job Create isn't entirely reliable
Last modified: 2005-08-03 17:11:24
You need to
before you can comment on or make changes to this bug.
I committed a reliable create test to the CVS trunk which attempts to submit a
job with the same Job Description and Job ID multiple times.
The test fails with the GT4 ManagedJobFactoryService sometimes. It looks like
the single-creation logic is not threadsafe. I'll attach a patch I have which
seems to fix this problem. I'd appreciate feedback from Peter before committing
Created an attachment (id=607) [details]
Patch to fix reliable creation based on JobID.
Hmm. This sucks because I think it's going to degrade throughput tremendously.
I believe this is where
I removed a synch block before to improve throughput because it didn't seem
primarily how we met our performance goal for throughput. I guess we'll have
to think of something
else for 4.2.
Can you run the performance test with and without this patch for comparison?
Sure. I'll put up a set of web pages and paste the link here when I'm done.
An alternative would be to have the code stick the job id into the resource
before initializing the resource and then do resource intialization outside of
the lock. I'm not sure if that would create other troubles.
There would need to be some way of mapping the job ID to the job description
object to avoid the
seriously bad race condition where I might get my resource assigned to someone
else's job ID if a
simple queue were used. So instead of a queue of new job IDs have a hashtable
of JD->ID entries.
Here are the throughput stats for 4.0.0:
Here is a sampling of throughput stats after applying the patch:
Fortunately the patch didn't seem to affect throughput at all, so I'm fine with
seeing it committed. Joe, I'll reassign this back to you for you to close when
Patch comitted to trunk and globus_4_0_branch.