Bugzilla – Bug 2999
NPE in startTransfers()
Last modified: 2005-04-08 16:12:36
You need to log in before you can comment on or make changes to this bug.
I got an NPE at some point with a GRAM job burst load of 50: java.lang.NullPointerException at org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:117) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:662) at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:393) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:297) Caused by: java.lang.NullPointerException at org.globus.transfer.reliable.service.ReliableFileTransferImpl.startTransfers(ReliableFileTransferImpl.java:314) at org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:106)
Any thing new on this bug ? container logs ?
I haven't been able to reproduce this again so far. I guess you can resolve it until I have a better way of reproducing it.
Created an attachment (id=547) [details] container.log I'm able to reproduce this fairly consistently now. I'm attaching the latest container log.
Created an attachment (id=548) [details] container.log Here's a better log with RFT debuggin. I had to go up to around 38 parallel job clients before I consistently experienced this. I also added an extra debugging message before the NPE that indicates that getTopicList() is null.
I committed a fix for this. let me know how it works out.
I'm still getting an NPE, though it looks unrelated: http://www-unix.mcs.anl.gov/~lane/container.log_2999_1
Created an attachment (id=554) [details] Patch to fix NPE Can you please try this patch and let me know if it works thanks
No deal. Got four of these this last time: 2005-03-29 15:17:40,332 ERROR service.ReliableFileTransferResource [Thread-955,storeSubscriptions:293] topic list is null in RFT resource 2005-03-29 15:17:40,349 ERROR service.ReliableFileTransferImpl [Thread-955,start:116] Error in start java.lang.NullPointerException at org.globus.transfer.reliable.service.ReliableFileTransferImpl.startTransfers(ReliableFileTransferImpl.java:314) at org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:106) ...
This seems to be fixed by last commit as Peter was not able to reproduce it with the test he was able to reproduce before.
I was able to reproduce this again using the GRAM throughput tester with a load of 1 and parallelism of 64.
A new log can be found here: http://www-unix.mcs.anl.gov/~lane/container.log_2999_1
Sorry, the new log is here: http://www-unix.mcs.anl.gov/~lane/container.log_2999_2
A possible fix in trunk
Still getting it after updating.
i just ran a throughput test with load 1 and parallelism 64, I dont see this npe. This is the command i used: $GLOBUS_LOCATION/test/globus_wsrf_gram_service_java_test_throughput/gram-throughput-tester -- load 1 --parallelism 64 --test-duration 5 --delegate true --rsl condor_sim.xml and condor_sim.xml is like this: <job> <executable>my_echo</executable> <directory>${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</directory> <argument>12</argument> <argument>abc</argument> <argument>34</argument> <argument>pdscaex_instr_GrADS_grads23_28919.cfg</argument> <argument>pgwynnel was here</argument> <environment> <name>PI</name> <value>3.141</value> </environment> <environment> <name>GLOBUS_DUROC_SUBJOB_INDEX</name> <value>0</value> </environment> <stdout>stdout</stdout> <stderr>stderr</stderr> <fileStageIn> <transfer> <sourceUrl>gsiftp://promptu:2811/tmp/empty_dir/</sourceUrl> <destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</destinationUrl> </transfer> <transfer> <sourceUrl>gsiftp://promptu:2811/bin/echo</sourceUrl> <destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/my_echo</ destinationUrl> </transfer> </fileStageIn> <!-- <fileStageOut> <transfer> <sourceUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/stdout</sourceUrl> <destinationUrl>gsiftp://promptu:2811/${GLOBUS_USER_HOME}/ stdout.${THROUGHPUT_TESTER_JOB_ID}</destinationUrl> </transfer> <transfer> <sourceUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/stderr</sourceUrl> <destinationUrl>gsiftp://promptu:2811/${GLOBUS_USER_HOME}/ stderr.${THROUGHPUT_TESTER_JOB_ID}</destinationUrl> </transfer> </fileStageOut> --> <fileCleanUp> <deletion> <file>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</file> </deletion> </fileCleanUp> </job>
During my testing sometimes I see this error: java.lang.RuntimeException: Couldn't obtain a delegated credential. at org.globus.exec.service.job.ManagedJobResourceImpl.getJobCredential(ManagedJobResourceImpl.java: 412) at org.globus.exec.service.exec.ManagedExecutableJobResource.initSecurity(ManagedExecutableJobResour ce.java:339) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize(ManagedExecutableJobResource .java:183) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState(ManagedExecutableJo bResource.java:148) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeWholeState(PersistentM anagedExecutableJobResource.java:145) at org.globus.exec.service.utils.PersistenceHelper.load(PersistenceHelper.java:143) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load(PersistentManagedExecuta bleJobResource.java:254) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad(ResourceHomeImpl.java: 235) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:270) at org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:255) at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java: 156) at org.globus.exec.service.factory.ManagedJobFactoryResource$1RecoveryThread.run(ManagedJobFactory Resource.java:153)
Any theories on how that would affect RFT?
It does'nt affect RFT.. I was just pointing out to keep track of it
Fix in the trunk. Did'nt get NPE for 3 runs of 64 parallel jobs.. beat the shit out of this and lemme know
I'm fine closing this bug.