Bugzilla – Bug 5397
GRAM4 recovery of persisted job resources needs to be reviewed
Last modified: 2012-09-05 11:50:19
You need to
before you can comment on or make changes to this bug.
In certain situations connection to resources are lost and actions are
repeatedly done in recovery of persisted job resources during a container
The following describes what may happen to resources in state stageIn or
stageInResponse. The same is true for all states where RFT interacations
are involved (stageOut/stageOutResponse, fileCleanUp/fileCleanUpResponse).
A job was in state stageIn when the container was shutdown. The transfer
resource had been created, subscription for state changes was done and
the transfer had been started. The only thing that did not happen before the
shutdown was the change of the internal state to stageInResponse.
When this resource is recovered, it will be started again. This causes that
a RepeatedlyStartedFaultType is thrown by RFT which is ignored in GRAM4.
Then the resource is processed from scratch in state stageIn, i.e. a new
transfer resource will be created, a subscription takes place and the
new transfer will be started. Connection to the old resources is lost and
the transfer will be executed again.
This case may not happen too often, but it is not optimal.
Similar things happen under certain circumstances when the container gets
stopped and restarted and resources are in state stageInResponse.
The plan is to have a more fine-grained behavior: all of the following
actions happen during submitting a transfer request:
* create a transfer resource
* subscribe for state changes
* start the transfer
* create a notification consumer resource
In a recovery situation optimally only those actions that did not
get executed because the container went down should be repeated,
but not the submission of the transfer request as a whole.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: