Bugzilla – Bug 6004
adapt globusrun-ws to handle new faults and change of RP fault to array
Last modified: 2008-04-14 16:08:57
You need to log in before you can comment on or make changes to this bug.
[Code/schemas can be found in CVS in branch ws-gram-faultArray] 1. RP fault is an array now Reason: more than one error that don't depend on each other and thus should not be chained can happen during job processing (e.g. invalid executable, invalid fileCleanUp) Currently we are overwriting a fault in that case or skip the second one. With an array of faults we can provide all informations to the client. Would be good if globusrun-ws could print the messages of all faults, and print all fault causes in case the -pft option was specified. The change to array should have impact on RP queries of RP fault and on notification message handling (RP fault is part of the notification message in case of an error) 2. New fault types (managed_job_faults.xsd): JobResourceExpiredFaultType StagingTerminateFaultType LocalResourceManagerJobTerminateFaultType DelegatedCredentialDestroyFaultType (all in namespace http://www.globus.org/namespaces/2008/03/gram/job/faults) These fault types are stored in the RP fault if they occur and they can be part of a notification message. Special handling in globusrun-ws is maybe only needed for JobResourceExpiredFaultType. globusrun-ws should probably not try to terminate a job (as it normally does) in case it gets a notification message with that fault, because the job resource is destroyed in the server-side.
typo in CVS branch name: correct name is ws-gram-faultarray
I forgot to mention that the new fault type DelegatedCredentialDestroyFaultType can also be thrown by the terminate method (job termination) (see terminate_managed_job_provider_port_type.wsdl)
Committed code to that branch to handle multiple faults. Can you try this with the tests? Joe
will test it later.
I ran a few commands with a job description that causes two faults: invalid executable and an invalid fileCleanUp element. 1. normal submission ##################### [martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:8a6419f4-083f-11dd-b093-0013d4c3b957 Termination time: 04/12/3008 03:21 GMT Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element failureFileCleanUp. Would maybe be good if both error messages are reported (also the invalid executable) 2) print the fault types on the command-line ############################################# [martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml -pft Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:943051e6-083f-11dd-9508-0013d4c3b957 Termination time: 04/12/3008 03:21 GMT Fault Type: {http://www.globus.org/namespaces/2008/03/gram/job/faults}InvalidPathFaultType Entity: line 95: parser error : Opening and ending tag mismatch: init line 0 and ns01:ErrorCode </ns01:ErrorCode><ns01:Description ns03:type="ns01:DescriptionType">org.globus.r ^ xmlTextWriterWriteDocCallback : XML error 76 ! I/O error : flush error xmlTextWriterWriteDocCallback : XML error 76 ! Fault Type: {http://www.globus.org/namespaces/2008/03/gram/job/faults}StagingFaultType Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element failureFileCleanUp. Both fault types are printed, but only one error message is printed 3) Submit in batch mode and query for status ############################################# [martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithCleanup.xml -b -o epr Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:9b30affe-083f-11dd-b3d4-0013d4c3b957 Termination time: 04/12/3008 03:21 GMT [martin@osg-test1 tmp]$ globusrun-ws -status -j epr Assertion !gram_fault && "Unexpected duplicate entry" failed in file globus_i_query.c at line 193 Aborted Looks like query does not check for an array yet
maybe worth mentioning that globusrun-ws handles jobs ok that have just one fault
shoot, no, i was wrong in comment 6: StagingFaultTypes seem to cause problems: When i submit a job with an invalid fileCleanUp and no other fault and the -pft option i get [martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobWithStaging.xml -pft Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:a0dff124-0968-11dd-9207-0013d4c3b957 Termination time: 04/13/3008 14:48 GMT Current job state: StageIn Current job state: Active Current job state: StageOut Current job state: CleanUp Entity: line 94: parser error : Opening and ending tag mismatch: init line 0 and ns01:ErrorCode </ns01:ErrorCode><ns01:Description ns03:type="ns01:DescriptionType">org.globus.r ^ xmlTextWriterWriteDocCallback : XML error 76 ! I/O error : flush error xmlTextWriterWriteDocCallback : XML error 76 ! Fault Type: {http://www.globus.org/namespaces/2008/03/gram/job/faults}StagingFaultType Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileCleanUp. Looks like there's a problem with either the StagingFaultType or globusrun-ws parsing that fault type
Committed a patch to wsrf/c/messaging/source to fix the xml errors you mention. Added support for the -query and -monitor modes to use multiple faults. Also made the print outs when faults occur print all of them all.
works fine for me. closing