Bug 4225 - error deleting a directory
: error deleting a directory
Status: RESOLVED FIXED
: RFT
RFT
: 4.0.1
: Macintosh All
: P3 normal
: 4.1
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-02-16 11:02 by
Modified: 2006-03-02 21:09 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-02-16 11:02:24
From the GRAM automated testing - http://skynet-login.isi.edu/gram-testing/

There are often failures with the 4 0 branch.  All the failures I looked at seem to be the same problem 
with deleting the unique job dir.  It works most of the time for almost all of the jobs.

The cleanup directive is for a directory.  RFT just passes this directive as an RDEL command to the 
gridftp server.  Maybe, the gridftp has a bug, while walking the dir and deleting the files?  Maybe, RFT is 
sending the command twice?

    <fileCleanUp>
        <maxAttempts>5</maxAttempts>
        <deletion>
            <file>file:///scratch/rynge/jobs/${THROUGHPUT_TESTER_JOB_ID}/</file>
        </deletion>
    </fileCleanUp>


Here is an example:
    http://skynet-login.isi.edu/gram-testing/test-details.php?
uuid=9beccc60-5174-4d4e-851c-7a6e4c5fc814

From the container log>>>>>

2006-02-16 01:09:08,420 ERROR service.TransferWork [Thread-77,run:720] Terminal transfer error:
Error deleting a file
"/scratch/rynge/jobs/d315c560-9ecb-11da-bc17-c96358c48c34/sleep.sh" [Caused by: Server refused 
performing the request. Custom message: Server refused deleting file (error code 1) [Nested exception 
message:  Custom message: Unexpected reply: 500-Command failed : System error in unlink: No such 
file or directory
500-A system call failed: No such file or directory
500 End.]]
Error deleting a file
"/scratch/rynge/jobs/d315c560-9ecb-11da-bc17-c96358c48c34/sleep.sh"
. Caused by
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: 
Server refused deleting file (error code 1) [Nested exception message:  Custom message: Unexpected 
reply: 500-Command failed : System error in unlink: No such file or directory
500-A system call failed: No such file or directory
500 End.].  Nested exception is
org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom message: Unexpected reply: 500-
Command failed : System error in unlink: No such file or directory
500-A system call failed: No such file or directory
500 End.
    at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:333)
    at org.globus.ftp.FTPClient.deleteFile(FTPClient.java:258)
    at org.globus.transfer.reliable.service.DeleteClient.delete(DeleteClient.java:189)
    at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:684)
    at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:345)
    at java.lang.Thread.run(Thread.java:534)
------- Comment #1 From 2006-02-16 11:57:55 -------
RFT is complaining that the server couldn't delete the sleep.sh file 
specifically.  The server wouldn't report any particular file back in the error 
message if there was a problem with the RDEL, so perhaps rft is attempting to 
delete sleep.sh seperately?  I would expect that RFT would only send the RDEL 
on the parent directory and that would handle everything.
------- Comment #2 From 2006-02-16 12:05:03 -------
Subject: Re:  error deleting a directory

What version of gridftp servers is the test invoking for deletion ? RFT 
will send RDEL only if the server supports RDEL. Else it would send 
MLST, make a list of files and call delete on each one of them.

------- Comment #3 From 2006-02-17 15:46:25 -------
From one of the gram debug container log files, 
>>>>>
fileStageIn destinationUrl after: gsiftp://skynet-7.isi.edu:2811/scratch/rynge/jobs/
5864c970-9ecf-11da-8c83-ab803b611239/sleep.sh
<<<<<

 it looks like the GT container gridftp server is "skynet-7.isi.edu"

[macdaddy:~] smartin% telnet skynet-7.isi.edu 2811
Trying 128.9.233.17...
Connected to skynet-7.isi.edu.
Escape character is '^]'.
220 skynet-7.isi.edu GridFTP Server 2.1 (gcc32, 1122653280-63) ready.

------- Comment #4 From 2006-02-17 15:59:27 -------
k, thats a 4.0.1 server, Ravi says rdel should have been used...

gimme logs:

start the gridftp server with '-d all -l <logfile>'
and the env var 'GLOBUS_GRIDFTP_SERVER_FILE_DEBUG=ALL,<debuglogfile>'

an rft log wouldn't hurt as well.
------- Comment #5 From 2006-02-17 16:00:03 -------
Subject: Re:  error deleting a directory

Now I guess I need to see the gridftp logs, if they are available.

------- Comment #6 From 2006-02-21 00:43:04 -------
There is a possibility that this is related to bug 3840. Look at transfer 10:

[Thread-68,getDeleteClient:382] [Request 5, Transfer 10] deleting
gsiftp://viz-1.isi.edu:2811/scratch/rynge/jobs/e5e50980-4633-11da-8b4c-d7d91db7639c/sleep.sh

Note that sleep.sh was not explicitly mentioned for deletion in the rsl, and
that it looks like RFT (and/or GRAM) is confusing a stagein request for a
deletion request.
------- Comment #7 From 2006-02-21 01:16:29 -------
Test with deletion failure and gridftp logs:

http://skynet-login.isi.edu/gram-testing/test-details.php?uuid=2dabb058-7735-4de2-a85f-fba8fb7e3c1b
------- Comment #8 From 2006-02-21 01:55:01 -------
The gridftp log agrees with the 3840 theory... for the particular work dir with 
the DELE X/sleep.sh failure, the order of operations involving it are:

mkdir X
dele X/sleep.sh (fail)
stor X/.ignoreme
rdel X

one way or another, that DELE makes no sense on a freshly created (within 
seconds) dir.
------- Comment #9 From 2006-02-21 02:00:03 -------
Subject: Re:  error deleting a directory

Mats
Will it be possible for you to repeat this test 
:http://skynet-login.isi.edu/gram-testing/test-details.php?uuid=2dabb058-7735-4de2-a85f-fba8fb7e3c1b 
with RFT logging turned on ?

------- Comment #10 From 2006-02-21 09:04:40 -------
Looks like test 70 has the same problem occuring, and RFT logging is enabled
for
that one.
------- Comment #11 From 2006-03-02 19:06:15 -------
My fix to trunk seems to fix this bug. Merged the fix to the release branch and
waiting for results.
------- Comment #12 From 2006-03-02 21:09:03 -------
Looks like this is fixed now in release and head. closing the bug