Bugzilla – Bug 6043
Confusing JobID in timeout message
Last modified: 2012-09-05 11:43:47
You need to
before you can comment on or make changes to this bug.
globusrun-ws prints a job id after a job is submitted, but then if the job
times out, the error message contains a different job id value than that used
as the JobID element in the createManagedJob input message. This makes for odd
reading. Also, it's more difficult to look in container logs to figure out what
is going on with a particular job.
% globusrun-ws -submit -term +00:01 -c /bin/sleep 121
Job ID: uuid:f3f77c82-12c9-11dd-8881-0017f2cb7d49
Termination time: 04/25/2008 13:18 GMT
Current job state: Active
Current job state: Failed
globusrun-ws: Job failed: The lifetime of job
f4460960-12c9-11dd-a497-c1270f93a800 expired. Resource will be in read-only
mode and will be removed after all cleanup steps are completed.
What globusrun-ws prints seems to be the client-side generated uuid.
In the epr that comes back from MJFS.createManagedJob you get the
uuid that has been created by the Gram and which is used in subsequent
loggings and messages on the server-side.
Maybe globusrun-ws should print the uuid that comes back from the service,
since the client-side generated one is only used for that reliable job
submission thing, where a client can resubmit a job with the same client-side
generated uuid if no answer came back on the createManagedJob() call and be
sure that no new job is started if the first one is still active.
The element is called JobID though in the XML schema.
In my opinion, the client isn't really supposed to know about the internals of
the service implementation to be able to make sense out of the fault message.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: