Bugzilla – Bug 4452
job submission response is effected by java 1.5 thread processing
Last modified: 2008-02-04 11:24:29
You need to
before you can comment on or make changes to this bug.
The details to this problem can be read from this gram-user email thread.
There are suggested solutions to the problem that should be considered.
I'm just going to paste the last email on this thread since it's where all the
good meat is:
Apparently Java does not mandate any scheduling order for threads waiting
for monitor entrance. Hence, this order may vary between different JVM
implementations and operating systems.
In my environment, Java 1.5 queues up threads waiting for monitor entrance
in stack-order, which makes the implementation starvation-prone (if a
steady stream of threads are attempting to enter the monitor, the first
waiting thread will never be granted monitor entry and will hence be
starved out). In Java 1.4, however, threads seem to be queued in ("fair")
Consequently I restarted my container with Java 1.4 and I haven't (yet :)
observed any of the abnormal response times seen previously.
As a work-around Sun recommends the use of 'one of the excellent
"fair" lock constructs in java.util.concurrent.'
Maybe this approach should be taken by Globus developers in the future
since the synchronized construct, apparently, cannot be relied upon.
I have this problem also with Java 1.5 .... tried the following entropy fix:
but long delays still occur, even with simple jobs.
In general, the non-FIFO order gives better performance then FIFO order but it
can lead to starvation as mentioned. And using 'fair' locks also decrements the
performance. Although it is hard to say what overall effect a 'fair' lock would
have in this case (in terms of performance).
But, looking at the createManageJob() code, I think the synchronization there
is unnecessary. That is, home.create() does not need to be called under a
global (service) lock. It only needs to be called under a job-specific lock. If
the code was switched to job-specific lock then this problem would disappear
and the throughput of creating jobs would increase.
We have this targeting 4.2, but this may be important and significant enough to
look at in the 4.0.4 timeframe. What do you think? Can we try Jarek's
suggestion and see if it solves the problem?
Sure, it seems to be a small change. But it should be done with care and tested
like all threading issues.
Alan: In order for me to be able to check if thing improve: Could you describe
a bit more detailed what you mean by "abnormal response times" and
* Do you mean response time between job submission and getting the EPR back
or the time it takes for a job to be completely processed?
* Does this also happen when destroying a job or querying for resource
* This also occured with jobs without staging?
* Did it occur consistently or only sometimes?
* Did it occur during more or less sequential job submission too, or only
during concurrent job submission?
* What's the container load when it occured: only under heavy load with many
jobs or also with just a few jobs running?
(In reply to comment #5)
> Sure, it seems to be a small change. But it should be done with care and tested
> like all threading issues.
> Alan: In order for me to be able to check if thing improve: Could you describe
> a bit more detailed what you mean by "abnormal response times" and
> "long delays":
> * Do you mean response time between job submission and getting the EPR back
> or the time it takes for a job to be completely processed?
> * Does this also happen when destroying a job or querying for resource
I'd have to have a spceific scenario. As far as I can tell, destroying a job
happens fairly quickly, but my tests in this area have not been exhaustive.
After encountering problems in Java 1.5, I went back to 1.4.2 in production
> * This also occured with jobs without staging?
> * Did it occur consistently or only sometimes?
Consistently, in that any particular job was likely to encounter delays.
> * Did it occur during more or less sequential job submission too, or only
> during concurrent job submission?
I don't believe that concurrency was an issue.
> * What's the container load when it occured: only under heavy load with many
> jobs or also with just a few jobs running?
Occurred even with a light load on the host machine.
Agree with your comments on testing. How can we set up an organized test?
Note I have torn down the 1.5 test I was mounting, so would need to recreate
Alan, sorry for the delay. Before changing anything i wanted to see that
behaviour myself. Could you confirm that the numbers are roughly comparable
to what you experienced, please.
Here's what i did:
1. Built GT 4.0.4 two times: one with Suns Java 1.4.2_13, one with
2. Started the GT container and ran the stability test against it, i.e. created
a steady load of 5 jobs being processed by the container all the time (in
reality this may vary a bit). The jobs included file stage in and file stage
out, the executable was /bin/date.
3. Then i submitted 50 simple jobs sequentially to the GT container and
measured the time each one of them took. The values are measured in
Steps 2 and 3 had been done with Java 1.4 and 1.5 and Runtimes 1.4 and 1.5.
My execution environment is my (quite old) notebook:
Processor: mobile AMD Athlon(tm) XP-M 2500+
RAM: 512 MB
Find the script that measures the time of 50 jobs and the results attached.
All in all I experienced that jobs are processed faster when Java 1.4 was used
for compilation of the GT and as runtime, and faster when 1.5 is used. Some
jobs sometimes take a bit longer than others.
If you look at the attached overview table: Is this what you mean with
"long delays"? Can you confirm that these are approximately the timings you
I also realized that i have a bigger dispersion in the values if other
(resource-consuming) applications like Firefox or Thunderbird were running.
Created an attachment (id=1203) [details]
Time measurements of 50 sequential job submissions
Created an attachment (id=1204) [details]
script to submit 50 jobs and measure time for each job in seconds
Maybe that's not clear: I submitted the sequential 50 jobs while the stability
test, which created the steady load, was running.
We do not see a problem with threads between java 1.4 and 1.5. But there have
been other changes that removed synchronization that maybe have been the