Bugzilla – Bug 5803
Change of lifetime policy in gram4's job resources.
Last modified: 2008-04-04 07:53:51
You need to log in before you can comment on or make changes to this bug.
Currently a job resource does never expire if a client does not specify a lifetime. This and the fact that not all clients delete their jobs lead to growing persistence directories and admins are not able to determine which data may be wiped out and which may still be needed. TeraGrid admins and OSG would like to see this changed. There will be 2 new parameters in Gram4’s JNDI configuration: maxJobLifetime: max lifetime a client can specify in the initial job submission and in subsequent setTerminationTime calls timeToLiveAfterCompletion: amount of time a job resource keeps on existing after the job has been fully processed in case the client did not specify a job lifetime. Values are specified in seconds. A negative value means that there’s no limit. The parameters are exposed as resource properties of the factory resources to enable a client to query the values of the configuration parameters. Scenarios for clients: 1. A client does not specify a lifetime: The job does not expire until it is fully processed. After that the lifetime will be set to (now + timeToLiveAfterCompletion). By this it is guaranteed that a job runs to completion (including fileStageOut and fileCleanUp) and a client has the ability to query the status of a job for a while before it will be removed. 2. The client specifies a lifetime: The job will definitely be killed and removed when the lifetime expires regardless of the status of the job. A client can however extend the lifetime (restricted by maxJobLifetime). To restrict clients that want to set the lifetime until the year 2040 or so an admin can (need not) specify a limit in the configuration parameter maxJobLifetime What we ignore in this approach: Jobs that had been submitted without a lifetime and are in a hold state forever, i.e. don't ever finish processing, will never expire. This seems to be an exceptional case. An admin will however be able to identify these jobs from the persistence data. For now we don't assume that this is a real problem. Implementation is almost complete.
if the client does not set a lifetime: ideally would be to set the RP terminationTime to null then. Java WS Core interprets this as infinite. Unfortunately this causes problems with Axis in a recovery situation: The xsd:dateTime field cannot be deserialized if it's null (Axis CalendarDeserializer: // validate fixed portion of format if (source == null || source.length() == 0) { throw new NumberFormatException( Messages.getMessage("badDateTime00")); } ) For now i set the lifetime to (now + 1000 years) which is maybe a bit hackish, but it works and does the job.