Bug 3568 - list function terminates after 200 files
: list function terminates after 200 files
Status: RESOLVED FIXED
: GridFTP
GridFTP
: 4.0.0
: Macintosh All
: P3 normal
: ---
Assigned To:
:
:
: 3543
:
  Show dependency treegraph
 
Reported: 2005-07-15 10:05 by
Modified: 2005-07-28 11:25 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-07-15 10:05:58
This error needs to be completed by several people. 

a) Kaizar must post a link to his program to test this
b) Mike H. must complete the test for just jglobus
c) we need exact names and versions of the gridftp servers with port numbers
d) the test needs to be rerun on gridftp servers that Bill specifies to us

The reason for uploading the error in this incomplete state is to assure that
others are aware that we are evaluating a performance issue and that this work
is completed prior to the 4.0.1 release.

I will classify this error as critical and not as a blocker.

===============

First analysis based on the software included in Java CoG Kit 4.

the information I got is somewhat incomplete: 
  what is the version of the server, prort number, machine,  so we can replicate
this in other frameworks and exclude its a Java CoG Kit issue.


The gridftp server does not return files more than 300 with mlsd.
With 200 files it returns fine. But with 300 files it gives a "wait timeout".
Hence it fails for some number between 200 and 300 files.

I see no way of going to 5000 files ;)

The test output for 1, 101, and 201 files respectively are as follows:

# of Files	Time in Secs
==========	============
1       	3.254
101     	8.059
201     	15.587

Error after 200 files is as follows:

DEBUG [org.globus.cog.abstraction.examples.execution.Test]  - Status of ListTask
:Failed
Job failed:
org.globus.cog.abstraction.impl.file.GeneralException: Could not get list of
files in /home/amin/gridftp-test/from server
        at
org.globus.cog.abstraction.impl.file.gridftp.FileResourceImpl.list(FileResourceImpl.java:96)
        at
org.globus.cog.abstraction.impl.file.TaskHandlerImpl.execute(TaskHandlerImpl.java:247)
        at
org.globus.cog.abstraction.impl.file.TaskHandlerImpl.submit(TaskHandlerImpl.java:221)
        at
org.globus.cog.abstraction.impl.file.TaskHandlerImpl.submit(TaskHandlerImpl.java:134)
        at
org.globus.cog.abstraction.impl.common.task.FileOperationTaskHandler.submit(FileOperationTaskHandler.java:48)
        at
org.globus.cog.abstraction.impl.common.task.GenericTaskHandler.submit(GenericTaskHandler.java:51)
        at
org.globus.cog.abstraction.impl.common.taskgraph.TaskGraphHandlerImpl.submitExecutableObject(TaskGraphHandlerImpl.java:177)
        at
org.globus.cog.abstraction.impl.common.taskgraph.TaskGraphHandlerImpl.handleDependents(TaskGraphHandlerImpl.java:517)
        at
org.globus.cog.abstraction.impl.common.taskgraph.TaskGraphHandlerImpl.statusChanged(TaskGraphHandlerImpl.java:131)
        at
org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:193)
        at
org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:201)
        at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:309)
        at org.globus.gram.GramJob.setStatus(GramJob.java:179)
        at org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:171)
        at java.lang.Thread.run(Thread.java:534)
Caused by: org.globus.ftp.exception.ServerException: Reply wait timeout. (error
code 4)
        at
org.globus.ftp.vanilla.FTPControlChannel.waitFor(FTPControlChannel.java:213)
        at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:109)
bash-2.05b$
------- Comment #1 From 2005-07-15 13:29:01 -------
Another report in addition to 3543?  I would suggest that any tests use servers 
from the current trunk or globus_4_0_branch (2.20 and 2.1, respectively), as 
they contain performance increases as noted in 3543.

Also, please be careful with statements such as:
"The gridftp server does not return files more than 300 with mlsd.
With 200 files it returns fine. But with 300 files it gives a "wait timeout".
Hence it fails for some number between 200 and 300 files."

Any timeouts or failure to wait for results is squarely a client issue... the 
only thing the server is capable of failing to do in this case is return 
results within the clients timeout period.
------- Comment #2 From 2005-07-15 14:14:52 -------
> I would suggest that any tests use servers 
> from the current trunk or globus_4_0_branch (2.20 and 2.1, respectively), as 
> they contain performance increases as noted in 3543.

Are any such servers running anywhere, or would we have to compile our own?
------- Comment #3 From 2005-07-15 14:23:00 -------
The test should use the same machine(s) you initially noticed the problem on, 
so I say compile your own.  I have a server set up on pitcairn.mcs.anl.gov:9000 
if you want to use that (note that the machine still has a host cert for 
wiggum.mcs.anl.gov so you'll have to change the expected subject to use that 
one).  I'll try to also get one set up on wiggum.mcs.anl.gov:9000.
------- Comment #4 From 2005-07-15 15:01:13 -------
to comment #1 from Mike. We should make sure that the documentation explicitly
mentions this and shows an example on how to modify he values. E.g. users will
take the default configureation and measure this. Is this already in the
documentation?
------- Comment #5 From 2005-07-15 16:30:45 -------
The server is up at wiggum.mcs.anl.gov:9000.  This is what will eventually be 
released in 4.0.1 assuming no more changes.
------- Comment #6 From 2005-07-28 11:25:52 -------
Verified ok per email thread.