Bugzilla – Bug 4225
error deleting a directory
Last modified: 2006-03-02 21:09:03
You need to log in before you can comment on or make changes to this bug.
From the GRAM automated testing - http://skynet-login.isi.edu/gram-testing/ There are often failures with the 4 0 branch. All the failures I looked at seem to be the same problem with deleting the unique job dir. It works most of the time for almost all of the jobs. The cleanup directive is for a directory. RFT just passes this directive as an RDEL command to the gridftp server. Maybe, the gridftp has a bug, while walking the dir and deleting the files? Maybe, RFT is sending the command twice? <fileCleanUp> <maxAttempts>5</maxAttempts> <deletion> <file>file:///scratch/rynge/jobs/${THROUGHPUT_TESTER_JOB_ID}/</file> </deletion> </fileCleanUp> Here is an example: http://skynet-login.isi.edu/gram-testing/test-details.php? uuid=9beccc60-5174-4d4e-851c-7a6e4c5fc814 From the container log>>>>> 2006-02-16 01:09:08,420 ERROR service.TransferWork [Thread-77,run:720] Terminal transfer error: Error deleting a file "/scratch/rynge/jobs/d315c560-9ecb-11da-bc17-c96358c48c34/sleep.sh" [Caused by: Server refused performing the request. Custom message: Server refused deleting file (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End.]] Error deleting a file "/scratch/rynge/jobs/d315c560-9ecb-11da-bc17-c96358c48c34/sleep.sh" . Caused by org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused deleting file (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500- Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End. at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:333) at org.globus.ftp.FTPClient.deleteFile(FTPClient.java:258) at org.globus.transfer.reliable.service.DeleteClient.delete(DeleteClient.java:189) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:684) at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:345) at java.lang.Thread.run(Thread.java:534)
RFT is complaining that the server couldn't delete the sleep.sh file specifically. The server wouldn't report any particular file back in the error message if there was a problem with the RDEL, so perhaps rft is attempting to delete sleep.sh seperately? I would expect that RFT would only send the RDEL on the parent directory and that would handle everything.
Subject: Re: error deleting a directory What version of gridftp servers is the test invoking for deletion ? RFT will send RDEL only if the server supports RDEL. Else it would send MLST, make a list of files and call delete on each one of them.
From one of the gram debug container log files, >>>>> fileStageIn destinationUrl after: gsiftp://skynet-7.isi.edu:2811/scratch/rynge/jobs/ 5864c970-9ecf-11da-8c83-ab803b611239/sleep.sh <<<<< it looks like the GT container gridftp server is "skynet-7.isi.edu" [macdaddy:~] smartin% telnet skynet-7.isi.edu 2811 Trying 128.9.233.17... Connected to skynet-7.isi.edu. Escape character is '^]'. 220 skynet-7.isi.edu GridFTP Server 2.1 (gcc32, 1122653280-63) ready.
k, thats a 4.0.1 server, Ravi says rdel should have been used... gimme logs: start the gridftp server with '-d all -l <logfile>' and the env var 'GLOBUS_GRIDFTP_SERVER_FILE_DEBUG=ALL,<debuglogfile>' an rft log wouldn't hurt as well.
Subject: Re: error deleting a directory Now I guess I need to see the gridftp logs, if they are available.
There is a possibility that this is related to bug 3840. Look at transfer 10: [Thread-68,getDeleteClient:382] [Request 5, Transfer 10] deleting gsiftp://viz-1.isi.edu:2811/scratch/rynge/jobs/e5e50980-4633-11da-8b4c-d7d91db7639c/sleep.sh Note that sleep.sh was not explicitly mentioned for deletion in the rsl, and that it looks like RFT (and/or GRAM) is confusing a stagein request for a deletion request.
Test with deletion failure and gridftp logs: http://skynet-login.isi.edu/gram-testing/test-details.php?uuid=2dabb058-7735-4de2-a85f-fba8fb7e3c1b
The gridftp log agrees with the 3840 theory... for the particular work dir with the DELE X/sleep.sh failure, the order of operations involving it are: mkdir X dele X/sleep.sh (fail) stor X/.ignoreme rdel X one way or another, that DELE makes no sense on a freshly created (within seconds) dir.
Subject: Re: error deleting a directory Mats Will it be possible for you to repeat this test :http://skynet-login.isi.edu/gram-testing/test-details.php?uuid=2dabb058-7735-4de2-a85f-fba8fb7e3c1b with RFT logging turned on ?
Looks like test 70 has the same problem occuring, and RFT logging is enabled for that one.
My fix to trunk seems to fix this bug. Merged the fix to the release branch and waiting for results.
Looks like this is fixed now in release and head. closing the bug