Bugzilla – Bug 5959
globusrun-ws error in job with invalid staging request
Last modified: 2008-05-14 07:55:03
You need to log in before you can comment on or make changes to this bug.
The following error happens to me if i submit a job with a staging request where the file to be staged does not exist. Not sure if this happens only in my environment. [martin@osg-test1 tmp]$ globusrun-ws -submit -S -f jobMitFileStaging.xml Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:35d8cd2a-fc2e-11dc-8a9d-0013d4c3b957 Termination time: 03/27/3008 18:47 GMT Current job state: StageIn *** glibc detected *** double free or corruption (fasttop): 0x08a81e20 *** Aborted
This happens only with local invocations from gram to rft if a transfer fails. The fault in the failed-notification message from rft to gram looks a bit odd, and cannot be serialized or cloned: "java.io.IOException: Non nillable element 'timestamp' is null." I tried some things, and the first time it cannot be serialized is in ResourcePropertyTopic.set(..) after RFT sets the fault. I found a while ago that those nillable error message are due to Axis being more restrict than in 4.0
Additional info on this: It does not happen when submitting using the java GlobusRun program You can duplicate it by: Turn on local invocation on the gram service Submitting a job with a staging error like the description below. <job> <executable>/bin/sleep</executable> <argument>0</argument> <stdout>${GLOBUS_USER_HOME}/stdout</stdout> <stderr>${GLOBUS_USER_HOME}/stderr</stderr> <fileStageIn> <transfer> <sourceUrl>gsiftp://osg-test1.unl.edu:2811/bin/foobar</sourceUrl> <destinationUrl>gsiftp://osg-test1.unl.edu:2811/tmp/myEcho</destinationUrl> </transfer> </fileStageIn> <fileStageOut> <transfer> <sourceUrl>gsiftp://osg-test1.unl.edu:2811/tmp/myEcho</sourceUrl> <destinationUrl>gsiftp://osg-test1.unl.edu:2811/tmp/myEcho.sik</destinationUrl> </transfer> </fileStageOut> </job>
The I receive when I try to do this contains an invalid XML fault notification message: <?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:wsa="http://www.w3.org/2005/08/addressing"> <soapenv:Header> <wsa:MessageID>uuid:e0e5b710-0d62-11dd-bf9f-f2294a2cccec</wsa:MessageID> <wsa:To>https://localhost:38004/wsrf/services/NotificationConsumerService</wsa:To> <wsa:Action>http://docs.oasis-open.org/wsn/bw-2/NotificationConsumer/NotifyRequest</wsa:Action> <From xmlns="http://www.w3.org/2005/08/addressing"> <Address>http://www.w3.org/2005/08/addressing/anonymous</Address> </From> <ns06:ResourceID xmlns:ns04="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns05="http://www.w3.org/2001/XMLSchema" xmlns:ns06="http://docs.oasis-open.org/wsn/bw-2/provider" ns04:type="ns05:string" wsa:IsReferenceParameter="true">df49358a-0d62-11dd-8df2-0017f2cb7d49</ns06:ResourceID> </soapenv:Header> <soapenv:Body> <Notify xmlns="http://docs.oasis-open.org/wsn/b-2"> <NotificationMessage xmlns:ns1="http://docs.oasis-open.org/wsn/b-2" xsi:type="ns1:NotificationMessageHolderType"> <Topic xmlns:ns2="http://www.globus.org/namespaces/2008/03/gram/job" Dialect="http://docs.oasis-open.org/wsn/t-1/TopicExpression/Simple">ns2:stateChangeInformation</Topic> <ProducerReference> <wsa:Address>https://mactop.local:38000/wsrf/services/ManagedJobFactoryService</wsa:Address> <wsa:ReferenceParameters> <ns3:ResourceID xmlns:ns3="http://www.globus.org/namespaces/2008/03/gram/job">df73fea0-0d62-11dd-bf9f-f2294a2cccec</ns3:ResourceID> </wsa:ReferenceParameters> </ProducerReference> <Message> <ns4:stateChangeNotificationMessage xmlns:ns4="http://www.globus.org/namespaces/2008/03/gram/job"> <ns5:state xmlns:ns5="http://www.globus.org/namespaces/2008/03/gram/job/types">Failed</ns5:state> <ns6:fault xmlns:ns6="http://www.globus.org/namespaces/2008/03/gram/job/faults" xsi:type="ns6:FaultResourcePropertyType"> <ns6:stagingFault> <ns7:Timestamp xmlns:ns7="http://docs.oasis-open.org/wsrf/bf-2">2008-04-18T16:17:00.928Z</ns7:Timestamp> <ns8:Originator xmlns:ns8="http://docs.oasis-open.org/wsrf/bf-2"> <wsa:Address>https://mactop.local:38000/wsrf/services/ManagedJobFactoryService</wsa:Address> <wsa:ReferenceParameters> <ns4:ResourceID>df73fea0-0d62-11dd-bf9f-f2294a2cccec</ns4:ResourceID> </wsa:ReferenceParameters> </ns8:Originator> <ns9:Description xmlns:ns9="http://docs.oasis-open.org/wsrf/bf-2">Staging error for RSL element fileStageIn. </ns9:Description> <ns10:FaultCause xmlns:ns10="http://docs.oasis-open.org/wsrf/bf-2" xsi:type="ns10:BaseFaultType"> <ns10:Timestamp>2008-04-18T16:17:00.928Z</ns10:Timestamp> <ns10:ErrorCode dialect="http://www.globus.org/fault/stacktrace"> at sun.reflect.GeneratedConstructorAccessor146.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at java.lang.Class.newInstance0(Class.java:350) at java.lang.Class.newInstance(Class.java:303) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:530) at org.globus.exec.utils.FaultUtils.createStagingFault(FaultUtils.java:386) at org.globus.exec.service.exec.processing.handler.StagingHandler.processCompletedStagingJob(StagingHandler.java:101) at org.globus.exec.service.exec.processing.handler.StageInResponseStateHandler.process(StageInResponseStateHandler.java:76) at org.globus.exec.service.exec.processing.handler.InternalStateHandler.processInternalState(InternalStateHandler.java:49) at org.globus.exec.service.exec.processing.StateMachine.processInternalState(StateMachine.java:118) at org.globus.exec.service.exec.processing.StateProcessingTask.run(StateProcessingTask.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:613) </ns10:ErrorCode> <ns10:Description>org.globus.exec.generated.StagingFaultType</ns10:Description> <ns10:FaultCause xmlns:ns11="http://www.globus.org/namespaces/2004/10/rft" xsi:type="ns11:TransferFaultType"> <ns10:Timestamp>2008-04-18T16:17:00.928Z</ns10:Timestamp> <ns10:FaultCause xsi:type="ns10:BaseFaultType"> <ns10:Timestamp>2008-04-18T16:17:00.928Z</ns10:Timestamp> <ns10:ErrorCode dialect="http://www.globus.org/fault/stacktrace"> at sun.reflect.GeneratedConstructorAccessor104.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at java.lang.Class.newInstance0(Class.java:350) at java.lang.Class.newInstance(Class.java:303) at org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:105) at org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:91) at org.globus.rft.generated.TransferFaultType.getDeserializer(TransferFaultType.java:99) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.axis.encoding.DeserializationContext.getDeserializerForClass(DeserializationContext.java:510) at org.apache.axis.encoding.ser.BeanDeserializer.onStartChild(BeanDeserializer.java:302) at org.apache.axis.encoding.DeserializationContext.startElement(DeserializationContext.java:1035) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227) at org.globus.wsrf.encoding.ObjectDeserializer.toObject(ObjectDeserializer.java:118) at org.globus.wsrf.encoding.ObjectDeserializer.getFirstAsObject(ObjectDeserializer.java:216) at org.globus.exec.service.exec.processing.StagingListener.deliver(StagingListener.java:183) at org.globus.wsrf.impl.SimpleSubscriptionTopicListener.notify(SimpleSubscriptionTopicListener.java:230) at org.globus.wsrf.impl.SimpleSubscriptionTopicListener.topicChanged(SimpleSubscriptionTopicListener.java:120) at org.globus.wsrf.impl.SimpleTopic.topicChanged(SimpleTopic.java:205) at org.globus.wsrf.impl.SimpleTopic.notify(SimpleTopic.java:112) at org.globus.wsrf.impl.ResourcePropertyTopic.fireNotification(ResourcePropertyTopic.java:213) at org.globus.wsrf.impl.ResourcePropertyTopic.set(ResourcePropertyTopic.java:241) at org.globus.transfer.reliable.service.TransferWork.setRequestStatus(TransferWork.java:264) at org.globus.transfer.reliable.service.TransferWork.statusChanged(TransferWork.java:218) at org.globus.transfer.reliable.service.TransferWork.processStates(TransferWork.java:583) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:852) at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:376) at java.lang.Thread.run(Thread.java:613) </ns10:ErrorCode> <ns10:Description>org.globus.rft.generated.TransferFaultType</ns10:Description> </ns10:FaultCause> <Timestamp xmlns="" xsi:type="xsd:dateTime">2008-04-18T16:17:00.914Z</Timestamp> <Originator xmlns="" xmlns:ns3="http://www.w3.org/2005/08/addressing" xsi:type="ns3:EndpointReferenceType"> <Address xmlns="" xsi:type="ns3:AttributedURIType">https://mactop.local:38000/wsrf/services/ReliableFileTransferFactoryService</Address> <ReferenceParameters xmlns="" xsi:type="ns3:ReferenceParametersType"> <ns2:TransferKey xmlns:ns2="http://www.globus.org/namespaces/2004/10/rft" xsi:type="xsd:string">27</ns2:TransferKey> </ReferenceParameters> <Metadata xmlns="" xsi:type="ns3:MetadataType" xsi:nil="true"/> </Originator> <ErrorCode xmlns="" dialect="http://www.globus.org/fault/stacktrace">Error updating Permissions of a file/garbage/echo. Caused by org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End. at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:415) at org.globus.ftp.FTPClient.mlst(FTPClient.java:616) at org.globus.transfer.reliable.service.TransferClient.updatePermissions(TransferClient.java:639) at org.globus.transfer.reliable.service.TransferClient.transfer(TransferClient.java:711) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:819) at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:376) at java.lang.Thread.run(Thread.java:613)</ErrorCode> <Description xmlns="">Error updating Permissions of a file/garbage/echo [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]]</Description> <FaultCause xmlns="" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" soapenc:arrayType="xsd:anyType[1]" xsi:type="soapenc:Array"> <FaultCause xmlns="" xmlns:ns4="http://docs.oasis-open.org/wsrf/bf-2" xsi:type="ns4:BaseFaultType"> <Timestamp xmlns="" xsi:type="xsd:dateTime">2008-04-18T16:17:00.915Z</Timestamp> <Originator xmlns="" xmlns:ns5="http://www.w3.org/2005/08/addressing" xsi:type="ns5:EndpointReferenceType" xsi:nil="true"/> <ErrorCode xmlns="" xsi:nil="true"/> <Description xmlns="">Error updating Permissions of a file/garbage/echo [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]]</Description> <FaultCause xmlns="" xsi:type="ns4:BaseFaultType"> <Timestamp xmlns="" xsi:type="xsd:dateTime">2008-04-18T16:17:00.915Z</Timestamp> <Originator xmlns="" xmlns:ns6="http://www.w3.org/2005/08/addressing" xsi:type="ns6:EndpointReferenceType" xsi:nil="true"/> <ErrorCode xmlns="" dialect="http://www.globus.org/fault/stacktrace">Error updating Permissions of a file/garbage/echo. Caused by org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End. at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:415) at org.globus.ftp.FTPClient.mlst(FTPClient.java:616) at org.globus.transfer.reliable.service.TransferClient.updatePermissions(TransferClient.java:639) at org.globus.transfer.reliable.service.TransferClient.transfer(TransferClient.java:711) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:819) at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:376) at java.lang.Thread.run(Thread.java:613)</ErrorCode> <Description xmlns="">org.globus.transfer.reliable.service.exception.RftException</Description> <FaultCause xmlns="" xsi:type="xsd:anyType" xsi:nil="true"/> </FaultCause> </FaultCause> </FaultCause> </ns10:FaultCause> </ns10:FaultCause> <ns6:stateWhenFailureOccurred>StageIn</ns6:stateWhenFailureOccurred> <ns6:gt2ErrorCode>0</ns6:gt2ErrorCode> <ns6:attribute>fileStageIn</ns6:attribute> </ns6:stagingFault> </ns6:fault> <ns12:exitCode xmlns:ns12="http://www.globus.org/namespaces/2008/03/gram/job/types">0</ns12:exitCode> <ns13:holding xmlns:ns13="http://www.globus.org/namespaces/2008/03/gram/job/types">false</ns13:holding> </ns4:stateChangeNotificationMessage> </Message> </NotificationMessage> </Notify> </soapenv:Body> </soapenv:Envelope>
Note at one point there are elements like: <Timestamp xmlns="" xsi:type="xsd:dateTime">2008-04-18T16:17:00.914Z</Timestamp> which contain no namespace and don't belong at that part of the fault type sequence.
Reassigning this as a job service bug.
Mike D'Arcy did a change in Java WS Core that fixed this issue.
Here's what he did: "I think I may have found a solution to this. Attached is a diff whichshows the change required. Here it is in a nutshell: in local invocation mode, SimpleSubscriptionTopicListener.notify() method needs to create a MessageContext before serializing the notification message and calling the registered callback's deliver() method. Doing this causes the correct TypeMappingRegistry to be found, and the serialization errors disappear. In addition to the above code change, BaseFaultTypeFaultCause needs to have a typemapping in client-config.wsdd. This is because the core API that I used to create a MC (ContainerConfig.getContext()) uses an Axis client engine rather than a server engine and thus reads from client-config.wsdd rather than server-config.wsdd."