Bugzilla – Bug 5611
GramJob API changes to improve performace and efficiency
Last modified: 2012-09-05 13:38:57
You need to log in before you can comment on or make changes to this bug.
GramJob API Changes 1. Motivation The current GramJob API has limitations when a client wants to efficiently submit a large amount of jobs to WS-GRAM. These limitations are all related to subscribing for job state changes. In a simple scenario where the client leaves subscribing for job state changes to GramJob, a NotificationConsumerManager (NCM) is started up per job. In large job submissions this approach can get inefficient. A more efficient approach, which is e.g. followed by Condor-G, is to start up just one NCM per application and manage the EPRs of the subscription resources outside of GramJob intances. However there is currently no way for a client to use GramJob API for subscribing on the create call and get the EPR of the subscription resource after job creation. Because of this clients have to subscribe for job state changes in an extra WS call and cannot leverage the "subscribeOnCreate" capability in the GramJob API. Small API changes are needed to enable a more efficient use of GramJob in large job submissions. 2. Proposed changes for GramJob 2.1 New getter method getNotificationProducerEPR() This method enables a client to get the EPR of the subscription resource if a GramJob instance subscribed for job state changes on the create call. 2.2 New setter method setNotificationProduerEPR() This method enables a client to set the EPR of a subscription resource. By this a client could use GramJob's cancel method to also destroy a correspondent subscription resource. 2.3 Behavior change in destroy() Currently a correspondent subscription resource will only be destroyed by a GramJob instance if it listens for notifications of the job itself and started up its own NCM. The destruction of a subscription resource should not depend on the fact whether a NCM has been started up internally or not. The new pattern for job destruction would be: 1. Destroy the subscription resource if an EPR of tha resource had been set in the GramJob instance. 2. Remove the NotificationConsumer resource from the NCM if the GramJob instance had caused the creation of one and if the NCM is up. 3. Stop the NCM from listening for notifications if it's listening 4. Shutdown the GramJob instance specific NCM if it's up. If the client sets the EPR of a notificationProducer in a GramJob instance it is responsible that this EPR really corresponds to the job. Otherwise there's the risk that a subscription resource will be deleted that does not belong to the job. In the worst case one could set an EPR of a subscription resource that does not exist or belongs to a different user. 3. GramJob API test client Currently, the GramJob API is tested using the throughput tester and condor-g. A separate test client is needed to verify these new changes. This test client should be well documented and serve as an example for clients using the GramJob API. The WS GRAM user guide should add a description and link to this new test client. 4. Deliverables - Updated GramJob API in the 4.0 branch - New GramJob API test client - Updated Gram user guide online documentation
Things changed a bit for 4.2: A client can subscribe on create by setting the EPR of the notification consumer resource it created separately by calling GramJob.setNotificationConsumerEPR(). This EPR is then used when creating the subscribe request as part of the job creation input. However, a client does not need to get the EPR of the subscription resource because subscription resources don't need to be explicitly destroyed, neither by GramJob nor by any other means on the client-side. All subscription resources for a job are destroyed automatically on the server-side when the correspondent job resource goes away. Thus the methods [gs]etNotificationProducerEPR() and the proposed change in destroy() are obsolete for 4.2. However, we should try to put this change into 4.0.8 for the 4.0 series. Also: i learned in the meantime that only one NCM is started up, and not one NCM per job, as i originally thought. This had been the case in older versions of Java WS Core, but has been improved in the meantime.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363