<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugzilla.globus.org/bugzilla/bugzilla.dtd">

<bugzilla version="3.2.3"
          urlbase="http://bugzilla.globus.org/bugzilla/"
          maintainer="bacon@mcs.anl.gov"
>

    <bug>
          <bug_id>3130</bug_id>
          
          <creation_ts>2005-04-11 00:25</creation_ts>
          <short_desc>RFT looses transfer stats in the middle of a transfer</short_desc>
          <delta_ts>2005-04-11 14:15:30</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>RFT</product>
          <component>RFT</component>
          <version>development</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mats Rynge">rynge@isi.edu</reporter>
          <assigned_to name="Ravi Madduri">madduri@mcs.anl.gov</assigned_to>
          <cc>lane@mcs.anl.gov</cc>
    
    <cc>ogsa-bugs@globus.org</cc>

      

      
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-11 00:25:43</bug_when>
            <thetext>This problem shows up as a GRAM fault when GRAM can&apos;t find the executable after
a stagein: Invalid file permissions on executable 


2005-04-10 22:00:43,719 DEBUG service.TransferWork
    [Thread-5865,statusChanged:171] [Request 125606, Transfer 223075] setting
    overall rp to (finished: 1, active: 0, failed: 0, restarted: 0, pending:
    2, cancelled: 0)
2005-04-10 22:00:49,645 DEBUG service.TransferWork
    [Thread-5913,statusChanged:171] [Request 125606, Transfer 223076] setting
    overall rp to (finished: 0, active: 0, failed: 0, restarted: 0, pending:
    0, cancelled: 0)
2005-04-10 22:00:49,652 DEBUG service.TransferWork
    [Thread-5913,setRequestStatus:225] [Request 125606, Transfer 223076]
    setting status RP to Done

Somehow the stats are lost between the two first lines.

I will attach a full log.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-11 00:29:52</bug_when>
            <thetext>Full log:

http://www.isi.edu/~rynge/bug-3130/container-log-20050410-1.txt</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-11 00:32:21</bug_when>
            <thetext>Looks like &quot;destroy&quot; is being called on the resource before all the transfers are done. </thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-11 00:33:00</bug_when>
            <thetext>Actually, it looks like the transfer is getting removed in the middle of the
transfer:

2005-04-10 22:00:44,496 DEBUG service.ReliableFileTransferResource
    [Thread-5839,remove:746] Removing the resource with id: 125606</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 10:10:33</bug_when>
            <thetext>The status ought to go to Cancelled or something if GRAM is prematurely
destroying the resource, right?  So this shouldn&apos;t show up as a GRAM fault but
an RFT fault that says the transfer was cancelled.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 10:14:50</bug_when>
            <thetext>Is it possible that with the polling GRAM is doing the status is fetched before
any OverallStatus values are set (i.e. all counts are zero)?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mats Rynge">rynge@isi.edu</who>
            <bug_when>2005-04-11 10:25:41</bug_when>
            <thetext>The delete is happening way too early. At that point, the directory has been
expanded (that is the 1 finished), but no transfers has actually taken place (2
pending). A guess is that there might be something wrong with the logic of
detemining if the transfer is done in the new staging listener.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 11:29:43</bug_when>
            <thetext>According to my logs, the polling is returning an OverallStatus with all counts
zero before it destroys the transfer.  Notice the timestamps of the following
two excerpts.  The first is the GetResourcePropertyResponse message from the
poll, and the second is what it does with the OverallStatus object after it returns:

-------------------------------------------------------------------------------
2005-04-10 23:01:08,999 DEBUG handlers.MessageLoggingHandler
[Thread-23,logMessage:48] SOAPEnvelope: &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;soapenv:Envelope xmlns:soapenv=&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;
xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot;
xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
xmlns:wsa=&quot;http://schemas.xmlsoap.org/ws/2004/03/addressing&quot;&gt;&lt;soapenv:Header&gt;&lt;wsa:MessageID
soapenv:mustUnderstand=&quot;0&quot;&gt;uuid:b77f34a0-aa46-11d9-8fd2-c6f981d513e6&lt;/wsa:MessageID&gt;&lt;wsa:To
soapenv:mustUnderstand=&quot;0&quot;&gt;http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous&lt;/wsa:To&gt;&lt;wsa:Action
soapenv:mustUnderstand=&quot;0&quot;&gt;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties/GetResourcePropertyResponse&lt;/wsa:Action&gt;&lt;wsa:From
soapenv:mustUnderstand=&quot;0&quot;
xmlns:ns1=&quot;http://www.globus.org/namespaces/2004/10/rft&quot;&gt;&lt;wsa:Address&gt;https://192.168.0.101:8443/wsrf/services/ReliableFileTransferService&lt;/wsa:Address&gt;&lt;wsa:ReferenceProperties&gt;&lt;ns1:TransferKey
soapenv:mustUnderstand=&quot;0&quot;&gt;23204&lt;/ns1:TransferKey&gt;&lt;/wsa:ReferenceProperties&gt;&lt;/wsa:From&gt;&lt;wsa:RelatesTo
RelationshipType=&quot;wsa:Reply&quot;
soapenv:mustUnderstand=&quot;0&quot;&gt;uuid:b3841230-aa46-11d9-8fd2-c6f981d513e6&lt;/wsa:RelatesTo&gt;&lt;/soapenv:Header&gt;&lt;soapenv:Body&gt;&lt;GetResourcePropertyResponse
xmlns=&quot;http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd&quot;&gt;&lt;ns1:OverallStatusProperty
xmlns:ns1=&quot;http://www.globus.org/namespaces/2004/10/rft&quot;&gt;&lt;ns1:transfersFinished&gt;0&lt;/ns1:transfersFinished&gt;&lt;ns1:transfersRestarted&gt;0&lt;/ns1:transfersRestarted&gt;&lt;ns1:transfersFailed&gt;0&lt;/ns1:transfersFailed&gt;&lt;ns1:transfersActive&gt;0&lt;/ns1:transfersActive&gt;&lt;ns1:transfersPending&gt;0&lt;/ns1:transfersPending&gt;&lt;ns1:transfersCancelled&gt;0&lt;/ns1:transfersCancelled&gt;&lt;/ns1:OverallStatusProperty&gt;&lt;/GetResourcePropertyResponse&gt;&lt;/soapenv:Body&gt;&lt;/soapenv:Envelope&gt;

-------------------------------------------------------------------------------
2005-04-10 23:01:09,212 DEBUG exec.StateMachine
[Thread-23,processStageInResponseState:809] failedCount: 0
2005-04-10 23:01:09,212 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2334] Entering processCompletedStagingJob()
2005-04-10 23:01:09,213 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2362] totalTransferCount: 0
2005-04-10 23:01:09,213 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2363] finishedCount: 0
2005-04-10 23:01:09,213 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2364] failedCount: 0
2005-04-10 23:01:09,213 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2365] canceledCount: 0
2005-04-10 23:01:09,213 DEBUG exec.StateMachine
[Thread-23,processCompletedStagingJob:2373] Transfer finished successufully.
2005-04-10 23:01:09,219 DEBUG utils.StagingHelper
[Thread-23,destroyTransfer:239]
[resourceKey:bc2d96f0-aa45-11d9-a357-b72315c75a36] Entering destroyTransfer()</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-11 11:36:51</bug_when>
            <thetext>I wonder if poll happened when the overallStatus is being constructed from the database. May be you 
should check for all zeros and try polling again ?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 11:45:00</bug_when>
            <thetext>Yeah, that&apos;s pretty much what I was thinking.  I was going to do this earlier
when I asked you about how the DB is initialized by you said it couldn&apos;t happen. :)

Since the StateMachine ignores null OverallStatus objects from the poll method,
I can just check for that condition in the method and return null.  The transfer
job would then be polled again in 5 minutes or so if the notification didn&apos;t
come in before then.

So the question is, should RFT be doing something to assure that this state
doesn&apos;t occur?  If it&apos;s absolutely impossible then we can close this bug and
assume my workaround is a fix.  It still think it&apos;s a dangerous value to be
returning.  I&apos;d much prefer a null object.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-11 11:48:52</bug_when>
            <thetext>Well .. I can create the object after I get results from the database instead of before. so you will get a 
old/valid value if you poll in between </thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 13:08:51</bug_when>
            <thetext>Ravi, the patch you gave me to fix this seems to work.  If you concur we can
close the bug and you can commit the patch.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Ravi Madduri">madduri@mcs.anl.gov</who>
            <bug_when>2005-04-11 13:35:00</bug_when>
            <thetext>Ok. I will commit the patch i gave you.  If you can rerun your test for sanity and let me know I will close 
this bug</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Peter Lane">lane@mcs.anl.gov</who>
            <bug_when>2005-04-11 14:10:26</bug_when>
            <thetext>I re-ran the test over lunch. Looks good.</thetext>
          </long_desc>
      
      

    </bug>

</bugzilla>