| Summary: | Missing OverallStatus object in notification | ||
|---|---|---|---|
| Product: | RFT | Reporter: | Peter Lane <lane@mcs.anl.gov> |
| Component: | RFT | Assignee: | Ravi Madduri <madduri@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | ogsa-bugs@globus.org |
| Priority: | P3 | ||
| Version: | development | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Bug Depends on: | |||
| Bug Blocks: | 2988 | ||
| Attachments: |
container.log
synchronized some portion of start call |
||
How old is your checkout ? I fixed a possible NPE on march 17th and this seems similar to what I got then.
Actually looking at the line numbers in the trace it looks like start is called twice on a resource.
Unless there is an issue with uuids not being unique or something, it's not really possible for start to be getting called multple times. It always does a create right before it does the start in the same method. But anyway, I'll update and see if it helps.
I updated RFT last night and retried the test this morning. I am still seeing this error.
Container log is here: http://www-unix.mcs.anl.gov/~lane/container.log_2986_1
Created an attachment (id=542) [details]
container.log
container log with just RFT debug messages
Created an attachment (id=543) [details]
synchronized some portion of start call
This has'nt given me the error with 4 clients submitting transfers. LEt me know
how it works out for you
The patch worked for 16 condor jobs. I can stress it more tomorrow if need be.
My latest fix for this seems to work fine.
I'm still getting errors resulting from missing OverallStatus even though I'm not seeing RepeatedlyStartedFault. I'll attach a container log with RFT debuggin turned on.
Can I see the container logs for this ? And also how you were able to reproduce this bug ?
Here's a container log with RFT debugging turned on. I killed the container as soon as I noticed the missing overall status error from GRAM. http://www-unix.mcs.anl.gov/~lane/container.log_2986_2
I don't see any error in this log.. Can you may be tell me the use case you ran so i can reproduce it myself ?
I ran the throughput tester with 50 parallel clients submitting jobs with the
following fileStageIn
directive:
<transfer>
<sourceUrl>gsiftp://logan:2811/tmp/empty_dir/</sourceUrl>
<destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</
destinationUrl>
</transfer>
<transfer>
<sourceUrl>gsiftp://logan:2811/bin/echo</sourceUrl>
<destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/my_echo</
destinationUrl>
</transfer>
After clearing out some other bugs, I am able to consistently reproduce this with as little as 4 parallel clients.
Can you attach a container log with this ? I ran a long test with 6 clients submitting rft requests in parallel for ~5 hours and did'nt see this bug once.
This looks like my fault. The missing status object was from the StagingListener, not the actual notification message from RFT (actually, there wasn't any message yet). So basically I was trying to access the StagignListener before it actually got a notification.
I'm seeing cases when a notification from RFT comes in without an OverallStatus object. I'm having trouble pinning it down since there's no fault if there's no status object. I am seeing the following two errors in the log, though, so I presume they are related: 2005-03-21 13:03:08,118 ERROR service.ReliableFileTransferImpl [Thread-6,start:111] Error in start java.rmi.RemoteException: Error in startnull; nested exception is: at org.globus.transfer.reliable.service.ReliableFileTransferResource.start(ReliableFileTransferResource.java:519) at org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:102) at sun.reflect.GeneratedMethodAccessor417.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:662) at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:393) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:297) Caused by: at org.globus.transfer.reliable.service.ReliableFileTransferResource.start(ReliableFileTransferResource.java:512) ------------------------------ 2005-03-21 13:45:55,084 ERROR service.URLExpander [Thread-266,changeDir:275] Error expanding a directory URL org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused changing directory (error code 1) [Nested exception message: Custom message: Unexpected reply: 550 /home/lane/Development/Globus/working/ws-gram/service/java/test/unit/out.22: Not a directory.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 550 /home/lane/Development/Globus/working/ws-gram/service/java/test/unit/out.22: Not a directory. at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:310) at org.globus.ftp.FTPClient.changeDir(FTPClient.java:180) at org.globus.transfer.reliable.service.URLExpander.changeDir(URLExpander.java:269) at org.globus.transfer.reliable.service.URLExpander.<init>(URLExpander.java:93) at org.globus.transfer.reliable.service.TransferClient.transfer(TransferClient.java:640) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:620) at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:342) at java.lang.Thread.run(Thread.java:534)