Bug 4204 - WorkManagerImpl incorrectly implements the WorkManager.schedule(...) method
: WorkManagerImpl incorrectly implements the WorkManager.schedule(...) method
: Java WS Core
: unspecified
: PC All
: P3 major
: ---
Assigned To:
: 4203
  Show dependency treegraph
Reported: 2006-02-07 15:55 by
Modified: 2006-04-21 13:48 (History)

Patch against 4.0.2 source code (2.43 KB, patch)
2006-04-21 13:48, Jarek Gawor


You need to log in before you can comment on or make changes to this bug.

Description From 2006-02-07 15:55:00
WorkManager.schedule(...) must be implemented as a non-blocking method per the
interface specification.

Per the WorkManager.schedule(...) javadoc from the BEA website, the first two
lines read:

"Dispatches a Work asynchronously. The work is dispatched and the method returns

The WorkManagerImpl.schedule(...) implementation, however, blocks on the call
whenever the underlying thread pool is maxed. Code snippet:

public class WorkmanagerImpl implements WorkManager ... {
    private PooledExecutor pool;
    public WorkManagerImpl(int maxPoolSize)
        this.pool = new PooledExecutor(maxPoolSize);
    public synchronized WorkItem schedule(Work work, WorkListener listener)
                Thread thread = new Thread(new WorkWrapper(work, listener));
                thread.setName("Work" + thread.getName());
                this.pool.execute(new WorkWrapper(work, listener));

From PooledExecutor javadoc,

 Blocked execution policy
    If the maximum pool size or queue size is bounded, then it is possible for
incoming execute requests to block. There are four supported policies for
handling this problem, and mechanics (based on the Strategy Object pattern) to
allow others in subclasses:

    Run (the default)
        The thread making the execute request runs the task itself. This policy
helps guard against lockup.

Since WorkManagerImpl uses PooledExecutor's default Blocked execution policy,
when max threads is reached, the next invocation leads to a blocking call -- not
only that, but the Runnable passed to execute is executed on the calling thread.

With a service like DRS this results in a very long sequence of Work.run(...)
methods being invoked on the ServiceThread of the container, thus the entire
SOAP rpc blocks until X number of files are replicated across the Grid.
------- Comment #1 From 2006-02-07 16:03:27 -------
Off the top of my head, my suggestion for fixing this is to run a daemon thread
in the background with the sole purpose of waiting on a pending queue of Work,
when notified wakes up and attempts to this.pool.execute(nextWork). Change the
BlockedExecutionHandler to a Wait policy so that the thread waits until the next
thread is available from the pool. The WorkManagerImpl.schedule(...) method
(when work is not a daemon) just adds the Work to the pending queue, notifies
the schedule daemon, and returns immediately.

I'm sure there are other more creative ways to implement it also.
------- Comment #2 From 2006-02-13 17:14:34 -------
I committed a fix for this problem to trunk. Please test it out and let me 
know. The PoolExecutor is now configured with an internal queue to queue up the 
requests, so the schedule method will not block anymore but the queue can build 
up now.
------- Comment #3 From 2006-02-15 17:36:10 -------
I am not sure this is fixed. I updated my java core with the fix (just updated
the org.globus.wsrf.impl.work package). I attempted a concurrency test and a
standalong test. The DRS code was not changed. When the DRS is started it
schedules the DiscoverWork. But with the changed java ws core, the discover
works never get executed. Seems that the PooledExecutor just queues the work up
but never starts it. *REOPENING*
------- Comment #4 From 2006-02-15 22:19:55 -------
I do not see this behaviour in the tests. Can you do kill -QUIT to see thread 
------- Comment #5 From 2006-02-16 19:05:26 -------
I see what's happening. I have workers that schedule more workers and then 
wait for the 'child' (if you will) workers to finish by doing a wm.waitForAll
(childWorkers). So, since my worker does not relinquish its thread, I support 
the PooledExecutor can never schedule the 'child' worker.

Does that sound plausible based on your knowledge of how PooledExecutor should 

If so, what's the advice to container users? "Don't do that!" maybe? Or 
perhaps there's a way to configure the workmanager to grow indefinitely to 
avoid such conditions?

Here's the kill -QUIT output of one (of 10) such workers trapped.

"Thread-28" daemon prio=1 tid=0x8c411578 nid=0x3510 in Object.wait() 
        at java.lang.Object.wait(Native Method)
        - waiting on <0x4bacbf48> (a org.globus.wsrf.impl.work.WorkItemImpl)
        at java.lang.Object.wait(Object.java:474)
        at org.globus.wsrf.impl.work.WorkManagerImpl.waitForAll
        - locked <0x4bacbf48> (a org.globus.wsrf.impl.work.WorkItemImpl)
        at org.globus.replica.replicator.impl.DiscoverWork._queryCatalogs
        at org.globus.replica.replicator.impl.DiscoverWork.run
        at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run
        at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
        at java.lang.Thread.run(Thread.java:595)
------- Comment #6 From 2006-02-17 11:09:25 -------
I would view Work as independent processes, and I don't recommend doing 
synchronization between different Work instances. That can block the whole 
WorkManager as you exprienced. If you need to do some synchronization between 
Work instances, implement that Work that waits for other Work instances as a 
daemon Work (return true in isDeamon(), a new thread will be started 
specifically for that Work) and let the other independent Work instances queue 
up as normal. 
By default WorkManager is configured to use maximum of 5 threads. It can be 
configured to use more but I do think that there should be some limit on it.
------- Comment #7 From 2006-02-17 12:13:21 -------
This is closed to my satisfaction. Thanks for the quick fix.
------- Comment #8 From 2006-04-19 17:46:25 -------
So this fix wasn't committed to globus_4_0_branch?

If not, we're going to need a patch for our 4.0.2 users of DRS.
------- Comment #9 From 2006-04-21 13:46:49 -------
Yes, it looks like I forgot to check in the fix into globus_4_0_branch. Sorry.
I just committed the fix to globus_4_0_branch.
------- Comment #10 From 2006-04-21 13:48:18 -------
Created an attachment (id=930) [details]
Patch against 4.0.2 source code