Bugzilla – Bug 5611
GramJob API changes to improve performace and efficiency
Last modified: 2012-09-05 13:38:57
You need to
before you can comment on or make changes to this bug.
GramJob API Changes
The current GramJob API has limitations when a client wants to efficiently
submit a large amount of jobs to WS-GRAM. These limitations are all related to
subscribing for job state changes. In a simple scenario where the client
leaves subscribing for job state changes to GramJob, a
NotificationConsumerManager (NCM) is started up per job. In large job
submissions this approach can get inefficient. A more efficient approach,
which is e.g. followed by Condor-G, is to start up just one NCM per application
and manage the EPRs of the subscription
resources outside of GramJob intances. However there is currently no way for a
client to use GramJob API for subscribing on the create call and get the EPR of
the subscription resource after job creation. Because of this clients have to
subscribe for job state changes in an extra WS call and cannot leverage the
"subscribeOnCreate" capability in the GramJob API. Small API changes are
needed to enable a more efficient use of GramJob in large job submissions.
2. Proposed changes for GramJob
2.1 New getter method getNotificationProducerEPR()
This method enables a client to get the EPR of the subscription resource if a
GramJob instance subscribed for job state changes on the create call.
2.2 New setter method setNotificationProduerEPR()
This method enables a client to set the EPR of a subscription resource. By
this a client could use GramJob's cancel method to also destroy a correspondent
2.3 Behavior change in destroy()
Currently a correspondent subscription resource will only be destroyed by a
GramJob instance if it listens for notifications of the job itself and started
up its own NCM. The destruction of a subscription resource should not depend on
the fact whether a NCM has been started up internally or not.
The new pattern for job destruction would be:
1. Destroy the subscription resource if an EPR of tha resource had been set
in the GramJob instance.
2. Remove the NotificationConsumer resource from the NCM if the GramJob
instance had caused the creation of one and if the NCM is up.
3. Stop the NCM from listening for notifications if it's listening
4. Shutdown the GramJob instance specific NCM if it's up.
If the client sets the EPR of a notificationProducer in a GramJob
instance it is responsible that this EPR really corresponds to the job.
Otherwise there's the risk that a subscription resource will be deleted that
does not belong to the job. In the worst case one could set an EPR of a
subscription resource that does not exist or belongs to a different user.
3. GramJob API test client
Currently, the GramJob API is tested using the throughput tester and condor-g.
A separate test client is needed to verify these new changes. This test client
should be well documented and serve as an example for clients using the GramJob
API. The WS GRAM user guide should add a description and link to this new test
- Updated GramJob API in the 4.0 branch
- New GramJob API test client
- Updated Gram user guide online documentation
Things changed a bit for 4.2:
A client can subscribe on create by setting the EPR of the notification
consumer resource it created separately by calling
GramJob.setNotificationConsumerEPR(). This EPR is then used when creating
the subscribe request as part of the job creation input.
However, a client does not need to get the EPR of the subscription resource
because subscription resources don't need to be explicitly destroyed, neither
by GramJob nor by any other means on the client-side. All subscription
resources for a job are destroyed automatically on the server-side when the
correspondent job resource goes away.
Thus the methods [gs]etNotificationProducerEPR() and the proposed change
in destroy() are obsolete for 4.2.
However, we should try to put this change into 4.0.8 for the 4.0 series.
Also: i learned in the meantime that only one NCM is started up, and not
one NCM per job, as i originally thought. This had been the case in
older versions of Java WS Core, but has been improved in the meantime.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: