Bugzilla – Bug 6139
Deadlock situation in job termination
Last modified: 2008-06-09 12:16:15
You need to log in before you can comment on or make changes to this bug.
There's a chance for deadlocks in job termination at the moment if jobs have staging. If a job is terminated in a staging response state (stageInResponse, stageOutResponse, fileCleanUpResponse) the currently running transfer has to be cancelled. This involves that the job resource is locked during that time, then the job is unregistered from the StagingListener, the subscription resource is destroyed, which involves a locking of the subscription resource behind the scenes, and finally the rft resource is destroyed. However, if a notification message from this transfer is caught by the StagingListener at the same time, the subscription resource is locked behind the scenes, and the executing thread waits to lock the job resource in StagingListener.deliver(). In rare situations the notification-sending-thread locks the subscription resource and tries to gather the lock of the job resource, and the transfer-cancellation-thread locks the job resource and tries to gather the lock of the subscription resource. Solution: To fix this, the thread who's responsible for sending the notification from RFT to Gram must not try to lock the job resource corresponding to this transfer. I added a single-threaded ExecutorService, to add those jobs back to the processing cycle, once a notification from RFT comes in that tells that a transfer finished. This removes the deadlock.