<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugzilla.globus.org/bugzilla/bugzilla.dtd">

<bugzilla version="3.2.3"
          urlbase="http://bugzilla.globus.org/bugzilla/"
          maintainer="bacon@mcs.anl.gov"
>

    <bug>
          <bug_id>3095</bug_id>
          
          <creation_ts>2005-04-06 22:38</creation_ts>
          <short_desc>Jobs held in condor-g with error Globus error: Staging error for RSL element</short_desc>
          <delta_ts>2005-04-12 00:43:27</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>GRAM</product>
          <component>wsrf managed job factory service</component>
          <version>development</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          
          <priority>P1</priority>
          <bug_severity>blocker</bug_severity>
          <target_milestone>4.0</target_milestone>
          <dependson>2999</dependson>
    
    <dependson>3110</dependson>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Gaurang Mehta">gmehta@isi.edu</reporter>
          <assigned_to name="Peter Lane">lane@mcs.anl.gov</assigned_to>
          <cc>gawor@mcs.anl.gov</cc>
    
    <cc>jfrey@cs.wisc.edu</cc>
    
    <cc>lane@mcs.anl.gov</cc>
    
    <cc>madduri@mcs.anl.gov</cc>
    
    <cc>meder@mcs.anl.gov</cc>
    
    <cc>rynge@isi.edu</cc>
    
    <cc>smartin@mcs.anl.gov</cc>

      

      
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-06 22:38:14</bug_when>
            <thetext>I have several jobs in a run from condor-g which got held with Globus error:
Staging error for RSL element.

smarty.isi.edu 112% condor_q -hold


-- Submitter: smarty.isi.edu : &lt;128.9.72.26:42312&gt; : smarty.isi.edu
 ID      OWNER           HELD_SINCE HOLD_REASON
 584.14  gmehta          4/6  18:42 Globus error: Staging error for RSL element
 584.17  gmehta          4/6  18:40 Globus error: Staging error for RSL element
 584.29  gmehta          4/6  18:36 Globus error: Staging error for RSL element
 584.37  gmehta          4/6  18:36 Globus error: Staging error for RSL element
 584.46  gmehta          4/6  18:37 Globus error: Staging error for RSL element
 584.49  gmehta          4/6  18:32 Globus error: Staging error for RSL element
 584.51  gmehta          4/6  18:42 Globus error: Staging error for RSL element
 584.57  gmehta          4/6  18:35 Globus error: Staging error for RSL element
 584.59  gmehta          4/6  18:33 Globus error: Staging error for RSL element
 584.61  gmehta          4/6  18:42 Globus error: Staging error for RSL element
 584.62  gmehta          4/6  18:41 Globus error: Staging error for RSL element
 584.66  gmehta          4/6  18:35 Globus error: Staging error for RSL element
 584.73  gmehta          4/6  18:38 Globus error: Staging error for RSL element
 584.79  gmehta          4/6  18:30 Globus error: Staging error for RSL element
 584.80  gmehta          4/6  18:42 Globus error: Staging error for RSL element
 584.86  gmehta          4/6  18:31 Globus error: Staging error for RSL element
 584.87  gmehta          4/6  18:33 Globus error: Staging error for RSL element
 584.92  gmehta          4/6  18:33 Globus error: Staging error for RSL element
 584.95  gmehta          4/6  18:31 Globus error: Staging error for RSL element
 584.96  gmehta          4/6  18:32 Globus error: Staging error for RSL element
 584.100 gmehta          4/6  18:35 Globus error: Staging error for RSL element

21 jobs; 0 idle, 0 running, 21 held


the job id&apos;s in gram are 

uuid:588682c0-a6fc-11d9-beeb-9dd03c6063d1
584.29
uuid:5887bb40-a6fc-11d9-9f4d-9dd03c6063d1
584.37
uuid:5887e250-a6fc-11d9-9f4d-9dd03c6063d1
584.46
uuid:58852330-a6fc-11d9-8260-9dd03c6063d1
584.49
uuid:588d87a0-a6fc-11d9-b6d3-9dd03c6063d1
584.51
uuid:58860d90-a6fc-11d9-9eea-9dd03c6063d1
584.57
uuid:588486f0-a6fc-11d9-ab88-9dd03c6063d1
584.59
uuid:588ceb60-a6fc-11d9-99b5-9dd03c6063d1
584.61
uuid:588bd9f0-a6fc-11d9-b4de-9dd03c6063d1
584.62
uuid:58860d90-a6fc-11d9-9e4b-9dd03c6063d1
584.66
uuid:58883070-a6fc-11d9-811b-9dd03c6063d1
584.73
uuid:5883c3a0-a6fc-11d9-8ab5-9dd03c6063d1
584.79
uuid:588c0100-a6fc-11d9-a7ae-9dd03c6063d1
584.80
uuid:5884d510-a6fc-11d9-8260-9dd03c6063d1
584.86
uuid:588438d0-a6fc-11d9-ab88-9dd03c6063d1
584.87
uuid:58876d20-a6fc-11d9-94e8-9dd03c6063d1
584.92
uuid:5884fc20-a6fc-11d9-8260-9dd03c6063d1
584.95
uuid:58845fe0-a6fc-11d9-ab88-9dd03c6063d1
584.96
uuid:5881a0c0-a6fc-11d9-b0aa-9dd03c6063d1
584.100

The server used was built from cvs as of 3pm today PST.

the logs are at http://www.isi.edu/~gmehta/gt4bugs/containerlog-3 and
http://www.isi.edu/~gmehta/gt4bugs/gridmanagerlog-3</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-06 23:14:40</bug_when>
            <thetext>I find job  58845fe0-a6fc-11d9-ab88-9dd03c6063d1 interesting. It goes from being
in FileCleanUpResponse at:

2005-04-06 18:30:15,509 DEBUG service.TransferWork
[Thread-4923,processStates:465] [Request 10042, Transfer 17249] processing state
for transfer of
gsiftp://columbus.isi.edu:2811/nfs/asd2/gmehta/job_58845fe0-a6fc-11d9-ab88-9dd03c6063d1/
 -&gt;  null
2005-04-06 18:30:25,711 DEBUG exec.StagingListener [Thread-4567,deliver:162]
Current transfer counts for job
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=58845fe0-a6fc-11d9-ab88-9dd03c6063d1:
 finishedCount: 1
activeCount: 0
failedCount: 0
restartedCount: 0
pendingCount: 0
canceledCount: 0



To suddenly start all over again:

2005-04-06 18:31:15,141 DEBUG
ManagedExecutableJobResource.58845fe0-a6fc-11d9-ab88-9dd03c6063d1
[Thread-23,initialize:175] at initHoldState()
2005-04-06 18:31:15,205 DEBUG
ManagedExecutableJobResource.58845fe0-a6fc-11d9-ab88-9dd03c6063d1
[Thread-23,initHoldState:308] Setting hold state to StageIn
2005-04-06 18:31:15,222 DEBUG
ManagedExecutableJobResource.58845fe0-a6fc-11d9-ab88-9dd03c6063d1
[Thread-23,initialize:181] at initSecurity()
2005-04-06 18:31:15,895 DEBUG
ManagedExecutableJobResource.58845fe0-a6fc-11d9-ab88-9dd03c6063d1
[Thread-23,initialize:189] at initVariableMap()



One hint might be that there the job is loaded from the persistance cache in
between the states above:

2005-04-06 18:31:10,674 DEBUG utils.PersistenceHelper [Thread-23,load:135]
loading resource
org.globus.exec.service.exec.PersistentManagedExecutableJobResource@1a2ec71 of
key
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=58845fe0-a6fc-11d9-ab88-9dd03c6063d1</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-07 11:49:54</bug_when>
            <thetext>There&apos;s nothing wrong with the resource being loaded inbetween states.  We use
soft references to avoid having to store all resources in memory constantly. The
init messages you are seeing don&apos;t mean that the job was restared.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-07 13:03:58</bug_when>
            <thetext>Are you sure full debugging is turned on?  I&apos;m not seeing any errors in the
gridmanager log.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-07 13:19:22</bug_when>
            <thetext>Yes or so i think.

My condor config file says
GRIDMANAGER_DEBUG       = D_COMMAND D_SECONDS D_FULLDEBUG
I could remove the first two and just keep D_FULLDEBUG
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Jaime Frey">jfrey@cs.wisc.edu</who>
            <bug_when>2005-04-07 13:24:19</bug_when>
            <thetext>The values listed in GRIDMANAGER_DEBUG are additive. D_FULLDEBUG won&apos;t be
ignored because of the first two values. For some reason, the gahp server is
either not receiving or not printing a full stack trace for the error.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-07 13:50:49</bug_when>
            <thetext>At least some of these are due to bug 2999 from the looks of it.  I recently
reopend that bug since I was seeing it with higher client parallelism tests.

I don&apos;t think RFT should be printing all those URLExpeander ERROR messages.  I
don&apos;t think that is a fatal condition, but just triggers RFT to create the
directory if it doesn&apos;t exist.  It clutters up the log file considerably.

I&apos;m also seeing a bunch of errors trying to register with the index service. 
These shouldn&apos;t be causing any problems, but it would be nice if you could get
that installed so that the warnings don&apos;t show up.

I&apos;m seeing some instances of a fault stack trace that indicates that it can&apos;t
load the resource persistence data.  Too bad it doesn&apos;t show up in the
gridmanager logs.  It would be helpful to see the full error.  I dont&apos; have a
clue what&apos;s going on here.

I&apos;m seeing a number of fault stack traces that indicate that the staging
listener is null.  This is puzzeling.  I&apos;ll have to think about this one.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-07 14:03:59</bug_when>
            <thetext>I don&apos;t understand how this is related to 2999 ? Gaurang, are you submitting lot of concurrent jobs ? </thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Jaime Frey">jfrey@cs.wisc.edu</who>
            <bug_when>2005-04-07 14:05:55</bug_when>
            <thetext>Peter, about printing stack traces in the gridmanager log: Could you take a look
at the deliver() method of JobListener in the gahp server code? That&apos;s where the
job status notification get delivered. What code would I use to extract the full
fault text?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-07 14:40:38</bug_when>
            <thetext>Yes i submitted 100 jobs. but at any given time there were only 20 running or so
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-07 17:48:22</bug_when>
            <thetext>http://www.isi.edu/~rynge/bug-3095/container-log-20050407-1.txt

Job a56ec3d0-a7b1-11d9-aac5-a74aced762de stuck in the stagein state. I can&apos;t
even see a RFT resource being created.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-07 18:02:35</bug_when>
            <thetext>More jobs stuck. This was a run of 200 jobs. 57 got stuck. Some with the Staging
error for RSL element. Some with Unspecified gridmanager error

The list of condor and gram id&apos;s are

CONDOR ID : 590.12
uuid:0df166e0-a7af-11d9-ad7a-a32477e53e93
CONDOR ID : 590.14
uuid:0df0caa0-a7af-11d9-b526-a32477e53e93
CONDOR ID : 590.15
uuid:0deef5e0-a7af-11d9-b62b-a32477e53e93
CONDOR ID : 590.16
uuid:0dee59a0-a7af-11d9-80aa-a32477e53e93
CONDOR ID : 590.17
uuid:0ded6f40-a7af-11d9-9b16-a32477e53e93
CONDOR ID : 590.18
uuid:0debc190-a7af-11d9-b82c-a32477e53e93
CONDOR ID : 590.19
uuid:0dead730-a7af-11d9-ad4a-a32477e53e93
CONDOR ID : 590.22
uuid:0df118c0-a7af-11d9-aa43-a32477e53e93
CONDOR ID : 590.23
uuid:0df0caa0-a7af-11d9-b034-a32477e53e93
CONDOR ID : 590.24
uuid:0def1cf0-a7af-11d9-b62b-a32477e53e93
CONDOR ID : 590.25
uuid:0dee59a0-a7af-11d9-bef8-a32477e53e93
CONDOR ID : 590.27
uuid:0debc190-a7af-11d9-8915-a32477e53e93
CONDOR ID : 590.28
uuid:0deafe40-a7af-11d9-ad4a-a32477e53e93
CONDOR ID : 590.29
uuid:0de92980-a7af-11d9-b836-a32477e53e93
CONDOR ID : 590.33
uuid:0def4400-a7af-11d9-b62b-a32477e53e93
CONDOR ID : 590.34
uuid:0dee80b0-a7af-11d9-bef8-a32477e53e93
CONDOR ID : 590.36
uuid:0debe8a0-a7af-11d9-8915-a32477e53e93
CONDOR ID : 590.38
uuid:0de95090-a7af-11d9-b836-a32477e53e93
CONDOR ID : 590.39
uuid:0de88d40-a7af-11d9-b534-a32477e53e93
CONDOR ID : 590.40
uuid:0df13fd0-a7af-11d9-ad7a-a32477e53e93
CONDOR ID : 590.41
uuid:0df0f1b0-a7af-11d9-8a8b-a32477e53e93
CONDOR ID : 590.42
uuid:0def4400-a7af-11d9-b526-a32477e53e93
CONDOR ID : 590.43
uuid:0deea7c0-a7af-11d9-bef8-a32477e53e93
CONDOR ID : 590.46
uuid:0deafe40-a7af-11d9-bd3f-a32477e53e93
CONDOR ID : 590.47
uuid:0de977a0-a7af-11d9-b836-a32477e53e93
CONDOR ID : 590.48
uuid:0de88d40-a7af-11d9-b8f9-a32477e53e93
CONDOR ID : 590.50
uuid:0df0f1b0-a7af-11d9-a6e1-a32477e53e93
CONDOR ID : 590.51
uuid:0df07c80-a7af-11d9-b526-a32477e53e93
CONDOR ID : 590.52
uuid:0deea7c0-a7af-11d9-a31f-a32477e53e93
CONDOR ID : 590.53
uuid:0dede470-a7af-11d9-8bd5-a32477e53e93
CONDOR ID : 590.54
uuid:0dec36c0-a7af-11d9-8915-a32477e53e93
CONDOR ID : 590.55
uuid:0deb2550-a7af-11d9-bd3f-a32477e53e93
CONDOR ID : 590.56
uuid:0de99eb0-a7af-11d9-b836-a32477e53e93
CONDOR ID : 590.57
uuid:0de8b450-a7af-11d9-b8f9-a32477e53e93
CONDOR ID : 590.58
uuid:0de7f100-a7af-11d9-8bb4-a32477e53e93
CONDOR ID : 590.60
uuid:0df0a390-a7af-11d9-b526-a32477e53e93
CONDOR ID : 590.61
uuid:0deeced0-a7af-11d9-a31f-a32477e53e93
CONDOR ID : 590.62
uuid:0dede470-a7af-11d9-80aa-a32477e53e93
CONDOR ID : 590.63
uuid:0dec36c0-a7af-11d9-9b16-a32477e53e93
CONDOR ID : 590.67
uuid:0de81810-a7af-11d9-8bb4-a32477e53e93
CONDOR ID : 590.69
uuid:0de61c40-a7af-11d9-868a-a32477e53e93
CONDOR ID : 590.70
uuid:0deef5e0-a7af-11d9-a31f-a32477e53e93
CONDOR ID : 590.72
uuid:0dec5dd0-a7af-11d9-9b16-a32477e53e93
CONDOR ID : 590.73
uuid:0deb7370-a7af-11d9-bd3f-a32477e53e93
CONDOR ID : 590.75
uuid:0de8db60-a7af-11d9-af98-a32477e53e93
CONDOR ID : 590.80
uuid:0dee3290-a7af-11d9-80aa-a32477e53e93
CONDOR ID : 590.81
uuid:0dec84e0-a7af-11d9-9b16-a32477e53e93
CONDOR ID : 590.82
uuid:0deb7370-a7af-11d9-b82c-a32477e53e93
CONDOR ID : 590.85
uuid:0de83f20-a7af-11d9-b534-a32477e53e93
CONDOR ID : 590.89
uuid:0de5a710-a7af-11d9-9940-a32477e53e93
CONDOR ID : 590.91
uuid:0deb9a80-a7af-11d9-b82c-a32477e53e93
CONDOR ID : 590.92
uuid:0dead730-a7af-11d9-9fe9-a32477e53e93
CONDOR ID : 590.93
uuid:0de92980-a7af-11d9-af98-a32477e53e93
CONDOR ID : 590.94
uuid:0de86630-a7af-11d9-b534-a32477e53e93
CONDOR ID : 590.98
uuid:0de5a710-a7af-11d9-b57d-a32477e53e93
CONDOR ID : 590.101
uuid:0de531e0-a7af-11d9-a5f7-a32477e53e93
CONDOR ID : 590.102
uuid:0de4e3c0-a7af-11d9-837f-a32477e53e93
CONDOR ID : 590.130
uuid:0de4bcb0-a7af-11d9-837f-a32477e53e93

the logs are at http://www.isi.edu/~gmehta/gt4bugs/containerlog-4 and
http://www.isi.edu/~gmehta/gt4bugs/gridmanagerlog-4

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Jaime Frey">jfrey@cs.wisc.edu</who>
            <bug_when>2005-04-07 18:22:41</bug_when>
            <thetext>The &quot;Unspecified gridmanager error&quot; messages are due to the gahp-server not
returning the fault message that accompanies a Failed status when it does an
active query of the job status. I&apos;ll correct this.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-07 18:55:00</bug_when>
            <thetext>the container log is now moved to
http://www.isi.edu/~gmehta/gt4bugs/containerlog-4.bz2 (8MB)</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-07 18:58:23</bug_when>
            <thetext>As for comment #10, it looks to me like the job is never being released from
it&apos;s hold.  Was there any indication by the client that a connection was refused
or something?  I can&apos;t really tell anything else from the log as to why this
happened.

Once I get a fresh GT install, I&apos;ll update some more debug info generation to
associate with the job resource (release() for sure).  Hopefully this will help
a little to track down what&apos;s actually going on.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-07 19:55:07</bug_when>
            <thetext>Jaime, RE comment #8:

There is a very verbose method you can use (Mats may hate me for recommending this):

org.globus.exec.utils.FaultUtils.faultToString().

Just pass it the fault and it will give you a String with lots &apos;o information
including the stack trace.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-07 20:33:45</bug_when>
            <thetext>Two jobs out of 200 stuck.

one with HoldReason = &quot;Globus error: Staging error for RSL element fileStageOut.&quot;

condor ID 594.174 | GRAM ID uuid:60f20550-a7c5-11d9-abd6-a06732691b8d

one is shown running in Condor and condor_q says it is in stageout.

Condor ID 594.116 | GRAM ID uuid:60f70e60-a7c5-11d9-b8fd-a06732691b8d

The output of both these jobs has been actually staged back correctly though to
the submit file.

Interesting thing to note is even 594.116 failed with RSL element fileStageOut
error but condor is showing it as running and in the STageOUT condition whereas
the other one with the same condition is in hold state and shows the error
correctly on command line.

LOGS
http://www.isi.edu/~gmehta/gt4bugs/containerlog-5.bz2
http://www.isi.edu/~gmehta/gt4bugs/gridmanagerlog-5.bz2  


</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Jaime Frey">jfrey@cs.wisc.edu</who>
            <bug_when>2005-04-07 22:49:20</bug_when>
            <thetext>Interesting behavior on your job 594.116. The Failed callback arrived while the gridmanager was in the 
middle of an active query, so it ignored it. When the query finally returned, it said StageOut. Later 
queries also returned StageOut.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 05:11:33</bug_when>
            <thetext>I figured out that RFT faults weren&apos;t being chained properly.  I had to
deserialize some stuff manually from the RFT faults and explictly add them onto
the GRAM fault cause list.  This significantly improves error reporting when RFT
is involved.  You will need to update the utils package to take advantage of this.

Jaime, I don&apos;t remember if I mentioned this or not (it&apos;s 4am and I&apos;m kind of
spaced out), but I attempted to fix the missing fault issue you had mentioned. 
Let me know if it&apos;s still happeing after updating the service code.

Ok, see y&apos;all in a few hours. :)</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Sam Meder">meder@mcs.anl.gov</who>
            <bug_when>2005-04-08 07:48:08</bug_when>
            <thetext>Should the RFT fault chaining issue be a bug against RFT or is there a reason
why RFT can&apos;t do it correctly?

/Sam</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Sam Meder">meder@mcs.anl.gov</who>
            <bug_when>2005-04-08 07:49:50</bug_when>
            <thetext>Or is it just a issue with how GRAM (and core fault utils?) treated those faults?

/Sam</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-08 10:18:06</bug_when>
            <thetext>Also please update RFT and make sure you are using latest trunk in your tests.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 10:34:25</bug_when>
            <thetext>I&apos;m not sure what the cause of it is.  I emailed Jarek last night asking him if he knew what the deal was.  
The issue is that I get faults with failtDetails fields containing {(&quot;&quot;, faultData), (AXIS_NS, exceptionName), 
(AXIS_NS, hostname)} elements instead of {(AXIS_NS, stackTrace)} or something like that.  The 
FaultHelper method that converts that stuff into FaultCause elements doesn&apos;t recognized faultData and 
so I have to bring that out manually.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 11:23:56</bug_when>
            <thetext>Here&apos;s the stack trace of the staging error I was getting repeatedly from RFT
that is now intelligible (please forgive the SOAP message overhead):

&lt;ns5:fault
xmlns:ns5=&quot;http://www.globus.org/namespaces/2004/10/gram/job/faults&quot;&gt;&lt;ns5:stagingFault&gt;&lt;ns6:Timestamp
xmlns:ns6=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd&quot;&gt;2005-04-08T09:53:33.239Z&lt;/ns6:Timestamp&gt;&lt;ns7:Originator
xmlns:ns7=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd&quot;&gt;&lt;ns2:Address
xmlns:ns2=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;&gt;https://192.168.0.101:8443/wsrf/services/ManagedJobFactoryService&lt;/ns2:Address&gt;&lt;ns3:ReferenceProperties
xmlns:ns3=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;&gt;&lt;ns1:ResourceID&gt;8b773270-a813-11d9-87c5-9f77b3daaa08&lt;/ns1:ResourceID&gt;&lt;/ns3:ReferenceProperties&gt;&lt;ns4:ReferenceParameters
xmlns:ns4=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;/&gt;&lt;/ns7:Originator&gt;&lt;ns8:Description
xmlns:ns8=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd&quot;&gt;Staging
error for RSL element fileStageIn.&lt;/ns8:Description&gt;&lt;ns9:FaultCause
xmlns:ns9=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd&quot;&gt;&lt;ns9:Timestamp&gt;2005-04-08T09:53:33.239Z&lt;/ns9:Timestamp&gt;&lt;ns9:ErrorCode
dialect=&quot;http://www.globus.org/fault/stacktrace&quot;&gt;
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:479)
        at org.globus.exec.utils.FaultUtils.createStagingFault(FaultUtils.java:357)
        at
org.globus.exec.service.exec.StateMachine.createStagingFault(StateMachine.java:2634)
        at
org.globus.exec.service.exec.StateMachine.processStageInState(StateMachine.java:687)
        at sun.reflect.GeneratedMethodAccessor302.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at
org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:357)
        at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:151)
&lt;/ns9:ErrorCode&gt;&lt;ns9:Description&gt;org.globus.exec.generated.StagingFaultType&lt;/ns9:Description&gt;&lt;/ns9:FaultCause&gt;&lt;ns10:FaultCause
xmlns:ns10=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd&quot;
xmlns:ns11=&quot;http://docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification-1.2-draft-01.xsd&quot;
xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
xsi:type=&quot;ns11:InvalidTopicExpressionFaultType&quot;&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:34.121Z&lt;/ns10:Timestamp&gt;&lt;ns10:FaultCause
xsi:type=&quot;ns11:InvalidTopicExpressionFaultType&quot;&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:31.979Z&lt;/ns10:Timestamp&gt;&lt;ns10:Originator&gt;&lt;ns6:Address
xmlns:ns6=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;&gt;https://192.168.0.101:8443/wsrf/services/ReliableFileTransferService&lt;/ns6:Address&gt;&lt;ns7:ReferenceProperties
xmlns:ns7=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;&gt;&lt;ns1:TransferKey
soapenv:mustUnderstand=&quot;0&quot;
xmlns:ns1=&quot;http://www.globus.org/namespaces/2004/10/rft&quot;
xmlns:soapenv=&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;&gt;18172&lt;/ns1:TransferKey&gt;&lt;/ns7:ReferenceProperties&gt;&lt;ns8:ReferenceParameters
xmlns:ns8=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;/&gt;&lt;/ns10:Originator&gt;&lt;ns10:Description&gt;Failed
to resolve topic expression to a set of concrete
topics&lt;/ns10:Description&gt;&lt;ns10:FaultCause&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:31.979Z&lt;/ns10:Timestamp&gt;&lt;ns10:ErrorCode
dialect=&quot;http://www.globus.org/fault/stacktrace&quot;&gt;at
org.globus.wsrf.impl.notification.SubscribeHelper.subscribe(SubscribeHelper.java:257)
        at
org.globus.wsrf.impl.notification.SubscribeProvider.subscribe(SubscribeProvider.java:90)
        at sun.reflect.GeneratedMethodAccessor338.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at
org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
        at
org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:104)
        at
org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:94)
        at
org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
        at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:662)
        at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:393)
        at
org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:124)
        at
org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:297)&lt;/ns10:ErrorCode&gt;&lt;ns10:Description&gt;org.oasis.wsn.InvalidTopicExpressionFaultType&lt;/ns10:Description&gt;&lt;/ns10:FaultCause&gt;&lt;ns10:FaultCause&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:31.980Z&lt;/ns10:Timestamp&gt;&lt;ns10:FaultCause&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:31.982Z&lt;/ns10:Timestamp&gt;&lt;ns10:ErrorCode
dialect=&quot;http://www.globus.org/fault/stacktrace&quot;&gt;java.lang.NullPointerException
        at
org.globus.wsrf.impl.notification.SubscribeHelper.subscribe(SubscribeHelper.java:243)
        at
org.globus.wsrf.impl.notification.SubscribeProvider.subscribe(SubscribeProvider.java:90)
        at sun.reflect.GeneratedMethodAccessor338.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at
org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
        at
org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:104)
        at
org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:94)
        at
org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
        at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:662)
        at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:393)
        at
org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:124)
        at
org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:297)&lt;/ns10:ErrorCode&gt;&lt;ns10:Description&gt;java.lang.NullPointerException&lt;/ns10:Description&gt;&lt;/ns10:FaultCause&gt;&lt;/ns10:FaultCause&gt;&lt;ns10:FaultCause&gt;&lt;ns10:Timestamp&gt;2005-04-08T09:53:34.122Z&lt;/ns10:Timestamp&gt;&lt;ns10:ErrorCode
dialect=&quot;http://www.globus.org/fault/stacktrace&quot;&gt;
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at
org.apache.axis.encoding.ser.BeanDeserializer.&amp;lt;init&amp;gt;(BeanDeserializer.java:90)
        at
org.apache.axis.encoding.ser.BeanDeserializer.&amp;lt;init&amp;gt;(BeanDeserializer.java:76)
        at
org.oasis.wsn.InvalidTopicExpressionFaultType.getDeserializer(InvalidTopicExpressionFaultType.java:76)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at
org.apache.axis.encoding.DeserializationContext.getDeserializerForClass(DeserializationContext.java:510)
        at
org.globus.wsrf.encoding.ObjectDeserializationContext.setDeserializer(ObjectDeserializationContext.java:108)
        at
org.globus.wsrf.encoding.ObjectDeserializationContext.init(ObjectDeserializationContext.java:117)
        at
org.globus.wsrf.encoding.ObjectDeserializationContext.&amp;lt;init&amp;gt;(ObjectDeserializationContext.java:78)
        at
org.globus.wsrf.encoding.ObjectDeserializer.toObject(ObjectDeserializer.java:53)
        at org.globus.exec.utils.FaultUtils.addFaultData(FaultUtils.java:582)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:522)
        at org.globus.exec.utils.FaultUtils.createStagingFault(FaultUtils.java:357)
        at
org.globus.exec.service.exec.StateMachine.createStagingFault(StateMachine.java:2634)
        at
org.globus.exec.service.exec.StateMachine.processStageInState(StateMachine.java:687)
        at sun.reflect.GeneratedMethodAccessor302.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at
org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:357)
        at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:151)
&lt;/ns10:ErrorCode&gt;&lt;ns10:Description&gt;org.oasis.wsn.InvalidTopicExpressionFaultType&lt;/ns10:Description&gt;&lt;/ns10:FaultCause&gt;&lt;/ns10:FaultCause&gt;&lt;/ns10:FaultCause&gt;&lt;ns5:stateWhenFailureOccurred&gt;StageIn&lt;/ns5:stateWhenFailureOccurred&gt;&lt;ns5:command&gt;StageIn&lt;/ns5:command&gt;&lt;ns5:gt2ErrorCode&gt;0&lt;/ns5:gt2ErrorCode&gt;&lt;ns5:attribute&gt;fileStageIn&lt;/ns5:attribute&gt;&lt;ns5:source
xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
xsi:nil=&quot;true&quot;/&gt;&lt;ns5:destination
xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
xsi:nil=&quot;true&quot;/&gt;&lt;/ns5:stagingFault&gt;&lt;/ns5:fault&gt;&lt;ns12:exitCode
xmlns:ns12=&quot;http://www.globus.org/namespaces/2004/10/gram/job/types&quot;&gt;0&lt;/ns12:exitCode&gt;&lt;ns13:holding
xmlns:ns13=&quot;http://www.globus.org/namespaces/2004/10/gram/job/types&quot;&gt;false&lt;/ns13:holding&gt;&lt;/ns1:stateChangeNotificationMessage&gt;</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 12:12:16</bug_when>
            <thetext>I just updated for the 2999 fix and it didn&apos;t eliminate the exception from
comment #23.  Looking at the cause of the NPE in SubscribeHelper, it looks like
the RFT resource&apos;s getTopicList() method is returning null at the time of some
of the subscriptions.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 12:26:15</bug_when>
            <thetext>I created a separate bug for the issue described in comment #24 for easier tracking.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-08 18:28:34</bug_when>
            <thetext>Is anybody still getting &quot;Staging error for RSL element&quot; messages?  If not, this
bug should be close.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Gaurang Mehta">gmehta@isi.edu</who>
            <bug_when>2005-04-08 18:33:59</bug_when>
            <thetext>i am still waiting for the server to be rebuild so that i can test.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-12 00:43:27</bug_when>
            <thetext>Looks like this is fixed with the latest CVS trunk.</thetext>
          </long_desc>
      
      

    </bug>

</bugzilla>