Bug 2999 - NPE in startTransfers()
: NPE in startTransfers()
Status: RESOLVED FIXED
: RFT
RFT
: development
: PC Linux
: P3 critical
: 4.0
Assigned To:
:
:
:
: 2988 3095
  Show dependency treegraph
 
Reported: 2005-03-23 17:17 by
Modified: 2005-04-08 16:12 (History)


Attachments
container.log (19.84 KB, text/plain)
2005-03-25 19:20, Peter Lane
Details
container.log (127.58 KB, text/plain)
2005-03-25 19:44, Peter Lane
Details
Patch to fix NPE (1.15 KB, patch)
2005-03-29 14:38, Ravi Madduri
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-03-23 17:17:47
I got an NPE at some point with a GRAM job burst load of 50:

    java.lang.NullPointerException
    at
org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:117)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
    at org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:104)
    at
org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:39)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:379)
    at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
    at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
    at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:94)
    at
org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
    at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
    at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
    at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
    at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
    at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
    at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
    at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:662)
    at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:393)
    at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:124)
    at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:297)
Caused by: java.lang.NullPointerException
    at
org.globus.transfer.reliable.service.ReliableFileTransferImpl.startTransfers(ReliableFileTransferImpl.java:314)
    at
org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:106)
------- Comment #1 From 2005-03-24 10:16:26 -------
Any thing new on this bug ? container logs ? 
------- Comment #2 From 2005-03-24 10:42:12 -------
I haven't been able to reproduce this again so far.  I guess you can resolve it
until I have a better way of 
reproducing it.
------- Comment #3 From 2005-03-25 19:20:48 -------
Created an attachment (id=547) [details]
container.log

I'm able to reproduce this fairly consistently now.  I'm attaching the latest
container log.
------- Comment #4 From 2005-03-25 19:44:22 -------
Created an attachment (id=548) [details]
container.log

Here's a better log with RFT debuggin.	I had to go up to around 38 parallel
job clients before I consistently experienced this.  I also added an extra
debugging message before the NPE that indicates that getTopicList() is null.
------- Comment #5 From 2005-03-28 19:05:08 -------
I committed a   fix for this. let me know how it works out.
------- Comment #6 From 2005-03-29 14:15:59 -------
I'm still getting an NPE, though it looks unrelated:

http://www-unix.mcs.anl.gov/~lane/container.log_2999_1
------- Comment #7 From 2005-03-29 14:38:06 -------
Created an attachment (id=554) [details]
Patch to fix NPE

Can you please	try this patch and let me know if it works
thanks
------- Comment #8 From 2005-03-29 16:33:21 -------
No deal. Got four of these this last time:

2005-03-29 15:17:40,332 ERROR service.ReliableFileTransferResource
[Thread-955,storeSubscriptions:293] topic list is null in RFT resource
2005-03-29 15:17:40,349 ERROR service.ReliableFileTransferImpl
[Thread-955,start:116] Error in start
java.lang.NullPointerException
    at
org.globus.transfer.reliable.service.ReliableFileTransferImpl.startTransfers(ReliableFileTransferImpl.java:314)
    at
org.globus.transfer.reliable.service.ReliableFileTransferImpl.start(ReliableFileTransferImpl.java:106)
...
------- Comment #9 From 2005-03-31 11:28:58 -------
This seems to be fixed by last commit as Peter was not able to reproduce it
with the test he was able to 
reproduce before. 
------- Comment #10 From 2005-04-06 18:20:05 -------
I was able to reproduce this again using the GRAM throughput tester with a load
of 1 and parallelism of 64.
------- Comment #11 From 2005-04-06 18:21:43 -------
A new log can be found here:

http://www-unix.mcs.anl.gov/~lane/container.log_2999_1
------- Comment #12 From 2005-04-06 18:34:51 -------
Sorry, the new log is here:

http://www-unix.mcs.anl.gov/~lane/container.log_2999_2
------- Comment #13 From 2005-04-07 00:32:09 -------
A possible fix in trunk
------- Comment #14 From 2005-04-07 11:19:10 -------
Still getting it after updating.
------- Comment #15 From 2005-04-07 11:54:42 -------
 i just ran a throughput test with load 1 and parallelism 64, I dont see this
npe. This is the command i 
used:
$GLOBUS_LOCATION/test/globus_wsrf_gram_service_java_test_throughput/gram-throughput-tester
--
load 1 --parallelism 64 --test-duration 5 --delegate true --rsl condor_sim.xml
 and condor_sim.xml is like this:
 <job>
    <executable>my_echo</executable>
    <directory>${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</directory>
    <argument>12</argument>
    <argument>abc</argument>
    <argument>34</argument>
    <argument>pdscaex_instr_GrADS_grads23_28919.cfg</argument>
    <argument>pgwynnel was here</argument>
    <environment>
        <name>PI</name>
        <value>3.141</value>
    </environment>
    <environment>
        <name>GLOBUS_DUROC_SUBJOB_INDEX</name>
        <value>0</value>
    </environment>
    <stdout>stdout</stdout>
    <stderr>stderr</stderr>
    <fileStageIn>
        <transfer>
            <sourceUrl>gsiftp://promptu:2811/tmp/empty_dir/</sourceUrl>

<destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</destinationUrl>
        </transfer>
        <transfer>
            <sourceUrl>gsiftp://promptu:2811/bin/echo</sourceUrl>

<destinationUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/my_echo</
destinationUrl>
        </transfer>
    </fileStageIn>
    <!--
    <fileStageOut>
        <transfer>

<sourceUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/stdout</sourceUrl>

<destinationUrl>gsiftp://promptu:2811/${GLOBUS_USER_HOME}/
stdout.${THROUGHPUT_TESTER_JOB_ID}</destinationUrl>
        </transfer>
        <transfer>

<sourceUrl>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/stderr</sourceUrl>

<destinationUrl>gsiftp://promptu:2811/${GLOBUS_USER_HOME}/
stderr.${THROUGHPUT_TESTER_JOB_ID}</destinationUrl>
        </transfer>
    </fileStageOut>
    -->
    <fileCleanUp>
        <deletion>
           
<file>file:///${GLOBUS_SCRATCH_DIR}/${THROUGHPUT_TESTER_JOB_ID}/</file>
        </deletion>
    </fileCleanUp>
</job>
------- Comment #16 From 2005-04-07 15:26:14 -------
During my testing sometimes I see this error:
java.lang.RuntimeException: Couldn't obtain a delegated credential.
        at 
org.globus.exec.service.job.ManagedJobResourceImpl.getJobCredential(ManagedJobResourceImpl.java:
412)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initSecurity(ManagedExecutableJobResour
ce.java:339)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initialize(ManagedExecutableJobResource
.java:183)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState(ManagedExecutableJo
bResource.java:148)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeWholeState(PersistentM
anagedExecutableJobResource.java:145)
        at org.globus.exec.service.utils.PersistenceHelper.load(PersistenceHelper.java:143)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load(PersistentManagedExecuta
bleJobResource.java:254)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad(ResourceHomeImpl.java:
235)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:270)
        at org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:255)        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:
156)
        at 
org.globus.exec.service.factory.ManagedJobFactoryResource$1RecoveryThread.run(ManagedJobFactory
Resource.java:153)
------- Comment #17 From 2005-04-07 15:37:52 -------
Any theories on how that would affect RFT?
------- Comment #18 From 2005-04-07 15:40:56 -------
It does'nt affect RFT.. I was just pointing out to keep track of it
------- Comment #19 From 2005-04-07 16:58:01 -------
Fix in the trunk. Did'nt get NPE for 3 runs of 64 parallel jobs.. beat the shit
out of this and lemme know
------- Comment #20 From 2005-04-08 12:18:32 -------
I'm fine closing this bug.