Bug 6004 - adapt globusrun-ws to handle new faults and change of RP fault to array
: adapt globusrun-ws to handle new faults and change of RP fault to array
Status: RESOLVED FIXED
: GRAM
wsrf gram clients
: alpha
: Macintosh All
: P3 blocker
: 4.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-04-10 22:48 by
Modified: 2008-04-14 16:08 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-04-10 22:48:23
[Code/schemas can be found in CVS in branch ws-gram-faultArray]

1. RP fault is an array now

Reason: more than one error that don't depend on each other and thus
should not be chained can happen during job processing (e.g. invalid
executable, invalid fileCleanUp)
Currently we are overwriting a fault in that case or skip the second one.
With an array of faults we can provide all informations to the client.

Would be good if globusrun-ws could print the messages of all faults,
and print all fault causes in case the -pft option was specified.

The change to array should have impact on RP queries of RP fault and on
notification message handling (RP fault is part of the notification
message in case of an error)

2. New fault types (managed_job_faults.xsd):

JobResourceExpiredFaultType
StagingTerminateFaultType
LocalResourceManagerJobTerminateFaultType
DelegatedCredentialDestroyFaultType
(all in namespace http://www.globus.org/namespaces/2008/03/gram/job/faults)

These fault types are stored in the RP fault if they occur and they
can be part of a notification message.

Special handling in globusrun-ws is maybe only needed for
JobResourceExpiredFaultType. globusrun-ws should probably not try
to terminate a job (as it normally does) in case it gets a notification
message with that fault, because the job resource is destroyed in the
server-side.
------- Comment #1 From 2008-04-11 00:44:54 -------
typo in CVS branch name: correct name is ws-gram-faultarray
------- Comment #2 From 2008-04-11 04:11:07 -------
I forgot to mention that the new fault type
DelegatedCredentialDestroyFaultType can also be thrown
by the terminate method (job termination)
(see terminate_managed_job_provider_port_type.wsdl)
------- Comment #3 From 2008-04-11 15:01:06 -------
Committed code to that branch to handle multiple faults. Can you try this with
the tests?

Joe
------- Comment #4 From 2008-04-11 15:47:56 -------
will test it later.
------- Comment #5 From 2008-04-11 23:34:41 -------
I ran a few commands with a job description that causes two faults:
invalid executable and an invalid fileCleanUp element.

1. normal submission
#####################
[martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml 
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:8a6419f4-083f-11dd-b093-0013d4c3b957
Termination time: 04/12/3008 03:21 GMT
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element failureFileCleanUp. 

Would maybe be good if both error messages are reported (also the invalid 
executable)


2) print the fault types on the command-line
#############################################
[martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml -pft
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:943051e6-083f-11dd-9508-0013d4c3b957
Termination time: 04/12/3008 03:21 GMT
Fault Type:
{http://www.globus.org/namespaces/2008/03/gram/job/faults}InvalidPathFaultType
Entity: line 95: parser error : Opening and ending tag mismatch: init line 0
and ns01:ErrorCode
</ns01:ErrorCode><ns01:Description
ns03:type="ns01:DescriptionType">org.globus.r
                 ^
xmlTextWriterWriteDocCallback : XML error 76 !
I/O error : flush error
xmlTextWriterWriteDocCallback : XML error 76 !
Fault Type:
{http://www.globus.org/namespaces/2008/03/gram/job/faults}StagingFaultType
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element failureFileCleanUp. 

Both fault types are printed, but only one error message is printed


3) Submit in batch mode and query for status
#############################################
[martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml -b -o epr
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:9b30affe-083f-11dd-b3d4-0013d4c3b957
Termination time: 04/12/3008 03:21 GMT
[martin@osg-test1 tmp]$ globusrun-ws -status -j epr
Assertion !gram_fault && "Unexpected duplicate entry" failed in file
globus_i_query.c at line 193
Aborted

Looks like query does not check for an array yet
------- Comment #6 From 2008-04-12 23:26:33 -------
maybe worth mentioning that globusrun-ws handles jobs ok that have just one
fault
------- Comment #7 From 2008-04-13 10:53:12 -------
shoot, no, i was wrong in comment 6: StagingFaultTypes seem to cause problems:

When i submit a job with an invalid fileCleanUp and no other fault
and the -pft option i get

[martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithStaging.xml -pft
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:a0dff124-0968-11dd-9207-0013d4c3b957
Termination time: 04/13/3008 14:48 GMT
Current job state: StageIn
Current job state: Active
Current job state: StageOut
Current job state: CleanUp
Entity: line 94: parser error : Opening and ending tag mismatch: init line 0
and ns01:ErrorCode
</ns01:ErrorCode><ns01:Description
ns03:type="ns01:DescriptionType">org.globus.r
                 ^
xmlTextWriterWriteDocCallback : XML error 76 !
I/O error : flush error
xmlTextWriterWriteDocCallback : XML error 76 !
Fault Type:
{http://www.globus.org/namespaces/2008/03/gram/job/faults}StagingFaultType
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.


Looks like there's a problem with either the StagingFaultType or 
globusrun-ws parsing that fault type
------- Comment #8 From 2008-04-14 14:50:26 -------
Committed a patch to wsrf/c/messaging/source to fix the xml errors you mention.
Added support for the -query and -monitor modes to use multiple faults. Also
made the print outs when faults occur print all of them all. 
------- Comment #9 From 2008-04-14 16:08:57 -------
works fine for me. closing