Bug 2827 - Fail to start RFT resource when submitting (large?) request
: Fail to start RFT resource when submitting (large?) request
Status: ASSIGNED
: RFT
RFT
: development
: PC Linux
: P3 enhancement
: 4.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-02-28 19:20 by
Modified: 2006-03-15 15:10 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-02-28 19:20:54
(Just want to make this your problem too... ;-)

When creating an RFT resource via the command-line client, bin/rft, it chokes on
a request which contains 50,000 pairs of src/dest files to be transferred. The
only server-/container-side message that appeared to be logged at or near the
time of the rft invocation was a "java.lang.OutOfMemoryError" error. I can't
exactly tell if that was directly related or coincidental.

Here's the command and the console output:

schuler@ned-0:/sandbox/schuler$ /sandbox/globus/install/bin/rft -h ned-0.isi.edu
-r 9000 -file xfr.epr -f ./transfer.rfttestrun_3.xfr
Number of transfers in this request: 50000
Exception in thread "main" Error during startup processing. Caused by AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
 faultSubcode:
 faultString: java.net.SocketException: Connection reset
 faultActor:
 faultNode:
 faultDetail:
        {http://xml.apache.org/axis/}stackTrace:java.net.SocketException:
Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31)
        at
org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSIGssInputStream.java:58)
        at
org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readMsg(GSIGssInputStream.java:33)
        at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:75)
        at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:49)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:201)
        at
org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket(HTTPSender.java:545)
        at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:140)
        at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2726)
        at org.apache.axis.client.Call.invoke(Call.java:2709)
        at org.apache.axis.client.Call.invoke(Call.java:2385)
        at org.apache.axis.client.Call.invoke(Call.java:2308)
        at org.apache.axis.client.Call.invoke(Call.java:1765)
        at
org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874)
        at
org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204)
        at
org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92)
        at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34)
 
        {http://xml.apache.org/axis/}hostname:ned-0.isi.edu
 
java.net.SocketException: Connection reset
        at org.apache.axis.AxisFault.makeFault(AxisFault.java:101)
        at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:144)
        at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2726)
        at org.apache.axis.client.Call.invoke(Call.java:2709)
        at org.apache.axis.client.Call.invoke(Call.java:2385)
        at org.apache.axis.client.Call.invoke(Call.java:2308)
        at org.apache.axis.client.Call.invoke(Call.java:1765)
        at
org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874)
        at
org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204)
        at
org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92)
        at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31)
        at
org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSIGssInputStream.java:58)
        at
org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readMsg(GSIGssInputStream.java:33)
        at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:75)
        at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:49)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:201)
        at
org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket(HTTPSender.java:545)
        at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:140)
        ... 18 more
------- Comment #1 From 2005-02-28 19:26:58 -------
Can't attach this so here's the URL to the RFT transfer request file used in
this test.

This file contains a few options for the RFT transfer followed by 50,000 pairs
of source / destination file urls for the transfer.

http://www.isi.edu/~schuler/rft/transfer.rfttestrun_3.xfr
------- Comment #2 From 2005-03-01 13:01:11 -------
Well, the documented limit of number of transfers per request in 3.9.5 RFT is
~25K files. When you 
submit more than that it container/jvm runs out of memory
serializing/deserializing the request. I don't 
think this is a bug in RFT but a limitation of axis. You can try to start your
JVM with -Xmx options to 
increase the heap size from default (64M) to something more
------- Comment #3 From 2005-03-01 13:45:47 -------
I'm always glad to accept "RTFM" as a valid answer. ;-) In this case, however,
I
can't find that stated limit to the transfer request size in the logical place:

http://www-unix.globus.org/toolkit/docs/development/3.9.5/data/rft/user/#rft

That section ends with the following text.

"
Limitations

This command line client is very dumb and simple and does not do any
intelligent
parsing of various command line options and the options in the sample transfer
file. It works fine if used in the way documented here.
"

I did not come across that limitation under any of the troubleshooting or known
bugs sections of RFT docs. I did finally see something close under the
Performance section of the Quality page when downloading the report done on it.
(There it states that you got to ~21000.)

Having that information is good enough for me -- maybe it could be easier to
find though?
------- Comment #4 From 2005-03-01 13:55:55 -------
I ran another test with a request size of 25,000 files (I started this last
night before I left the office, unfortunately I was not able to see the
container.log output.) What I can tell you is that the rft invocation appear to
get going, but it basically froze up, unlike the 50k request size where it
instantly crashed. In this case the machine's CPU (I'm running client and server
on same box) jumped to ~99% and it didn't seem to do much after that for the
remainder of my time in the office.

Returning this morning, all I found was this on my client-side console:

schuler@ned-0:/sandbox/schuler$ /sandbox/globus/install/bin/rft -h ned-0.isi.edu
-r 9000 -file xfr.epr -f ./transfer.rfttestrun_4.xfr
Number of transfers in this request: 25000
Exception in thread "main" Error during startup processing. Caused by AxisFault
 faultCode: {http://xml.apache.org/axis/}HTTP
 faultSubcode:
 faultString: (0)null
 faultActor:
 faultNode:
 faultDetail:
        {}:return code:  0
 
        {http://xml.apache.org/axis/}HttpErrorCode:0
 
(0)null
        at
org.apache.axis.transport.http.HTTPSender.readFromSocket(HTTPSender.java:692)
        at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:141)
        at
org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2726)
        at org.apache.axis.client.Call.invoke(Call.java:2709)
        at org.apache.axis.client.Call.invoke(Call.java:2385)
        at org.apache.axis.client.Call.invoke(Call.java:2308)
        at org.apache.axis.client.Call.invoke(Call.java:1765)
        at
org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874)
        at
org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204)
        at
org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92)
        at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34)
------- Comment #5 From 2005-03-01 15:57:01 -------
In another test, I've been able to submit a transfer request of 20,000 items
and
it is able to create and start the resource. The CPU is not overtaxed but it
has
taken an hour at least to transfer a mere 316 files. Is something wrong about
my
transfer request file -- by that I mean the options I've set in the transfer
request (see below). Notice that concurrency and parallel streams are set to
"1". I've taken and used these from the sample transfer.xfr that appears in
"share/globus_wsrf_rft_client". The files are a measly 10 bytes in size. Also,
when I created and started an RFT resource programmatically (from DRS), it
performed very well. In that case, I didn't set any RFTOptions fields, so I
guess it took some defaults that work better than the following.

#true=binary false=ascii
true
#Block size in bytes
16000
#TCP Buffer size in bytes
16000
#Notpt (No thirdPartyTransfer)
false
#Number of parallel streams
1
#Data Channel Authentication (DCAU)
true
# Concurrency of the request
1
#Grid Subject name of the source gridftp server

#Grid Subject name of the destination gridftp server

#Transfer all or none of the transfers
false
#Maximum number of retries
10
#Source/Dest URL Pairs
... pairs begin ...
------- Comment #6 From 2005-03-01 23:27:38 -------
I agree that the limitation should have been more visible then it is right now.
that will be fixed. What is 
this other thing that you are talking about ? Are you concerned about
performance of RFT ? I just 
committed some perf improving stuff to RFT code in trunk. Please use that for
your testing.
------- Comment #7 From 2005-03-02 09:03:22 -------
Via the email gateway:
Docs have been updated to reflect the limits on command-line clients
------- Comment #8 From 2005-03-02 09:53:55 -------
Generally speaking, I'm not really concerned about the performance of RFT -- 
it was performing very well in my earlier integrated tests and I don't think 
there should be any problem there.

The problem that I described wrt to the 20k file request was that:
(1) It performed very very slowly -- which I'm _strongly_ inclined to believe 
has more to do with _my_ RFT options than some inherent RFT problem. I should 
test with some other options and see what happens -- this may not belong under 
this bug, but in some sense it is a continuation of the bigger objective of 
using RFT for large requests. As I said, I think this has to do with my usage, 
though I'd like to understand why and what I'm doing wrong. (I need to 
recreate this first before declaring this a repeatable problem!)
(2) Slightly more concerning is that the request stopped at file transfer 316 
and never made it past that (by "never" I mean that time at which I lost 
patience and killed it :). Again, I need to repeat this.
------- Comment #9 From 2005-03-02 10:30:02 -------
Ravi,

In your documentation you must say 21K limit with 64M heap! And point to or 
explain how to increase the heap size. 
------- Comment #10 From 2005-03-02 18:43:09 -------
Added that now.
------- Comment #11 From 2005-03-02 22:54:12 -------
I just finished a 21,000 file transfer without any problem (code latest from
trunk). Do you have any new 
updates from your test ? Also I have been doing a 580,000 file transfer (
though its a single directory 
that was expanded) for a week now ( using it for testing my changes so transfer
was'nt continuous) and 
i have;nt seen a serious problem yet.
------- Comment #12 From 2005-03-03 13:13:29 -------
Bumping up the max mem param for Java and re-running the 20k xfer test, it is
running much more smoothly. (However, it appears to be quite slow.)

For now, I'll consider 20k my ceiling. I'd probably like to see that reach 100k
but I understand it isn't fully in your control.
------- Comment #13 From 2005-03-03 13:19:00 -------
So we can close this bug ? ;-)
------- Comment #14 From 2005-03-07 14:35:38 -------
I was able to submit 30001 transfers with out running out of memory by not
sending bunch of options 
(they are defaulted to sensible values at the service)
------- Comment #15 From 2005-03-07 16:38:01 -------
Are you testing via the bin/rft tool? or with some other test class?
------- Comment #16 From 2005-03-09 09:52:34 -------
Added Jarek to the bug as I think this is more of a core issue than RFT
------- Comment #17 From 2005-03-09 15:29:49 -------
What is the issue still? I thought we answered all the questions. 
------- Comment #18 From 2005-03-09 15:54:57 -------
Right. The reason to keep this bug open is that we should probably make GT4 
RFT scale to more than 
21K transfers at one time.
------- Comment #19 From 2005-03-09 22:35:58 -------
Ravi, what is the goal here? How much more? 50K? 100K? The only thing we could 
do is minimize the overhead of Axis deserialization, etc. but there is always 
going to be some limit. So the answer at the end will always be the same - to 
increase the heap. 
------- Comment #20 From 2005-03-09 23:08:11 -------
Ultimately, the goal is unlimited.  The problem is that the entire request is
deserialized into a single object, and there is simply no need to do that.  If
we used a straight SAX parser, we could pull the request in a transfer at a
time, and  our only limit would be the database.  We have been avoiding this
problem by doing directory transfers, but in the case of RLS, they are
generating the list of transfers from the replica catalog and they dont all
reside in one directory, so that doesn't help them.

The following was a comment from Sam Meder in another email thread about this:

Using this sort of streaming mode will make WS-Security impossible, but
if you can live with that there is a option in axis to turn off SAX
event recording which may help a little: Try adding the attribute
streaming="on" to the RFT Factory/Service <service/> element in the
server-config.wsdd. We've never tried this though, so it may be buggy.
------- Comment #21 From 2005-03-09 23:51:09 -------
If the goal is/was unlimited then it cannot be achieved with existing Axis code 
base. Axis will create a DOM-like structure of the message no matter what. I 
think technically any JAX-RPC engine must be able to return such representation 
if asked to do so. So the only thing we can do for now is to lower the
overhead.
------- Comment #22 From 2005-03-10 17:10:53 -------
The target users of my component (replicator which calls rft) would like to 
make a request containing a list of requested replications (long list of file 
name pairs somehwat analogous to rft request) and be able to submit such a 
request with ~1M files (or more).

Currently, we can pass along a request to RFT of ~20,000 files. So we're far 
from that user requirement. My thought has been to chunk the request and 
submit X requests to reach ~1M. But at 20k it means X will be big and hence 
all the overhead of connections and security and so on.

I agree with Bill's "unlimited" expectation. However, setting my expectations 
a bit lower, I'd be happier (than I am now) if we could at least hit ~100,000 -
- with my happiness being directly proportional to our results against that 
expectation.
------- Comment #23 From 2005-03-13 23:51:46 -------
1) If you want to hit 100K start the client and server with ~300M heap!

2) There is a difference between RFT handling unlimited number of transfers vs. 
handling unlimited transfer request. I think handling of unlimited number of 
transfers is far more important then dealing with unlimited transfer request. 
As long as you can submit the 1 million requests in chunks I would argue that 
RFT is meeting its 'expectations'. And with connection caching the difference 
between 10 calls with 20K requests vs 1 call with 200K requests should be 
minimal (even with transport security).

3) Is there a requirements document for RFT? Was the unlimited goal ever 
defined or even clarified?
------- Comment #24 From 2006-03-01 10:25:06 -------
I am going to mark this bug as an enhancement as the only way i see we can fix 
it is to add new operations to RFT interface to add transfers to a resource 
after it is created.
------- Comment #25 From 2006-03-15 15:10:45 -------
No problem with the severity change. But as for the remedy, what I'd really 
like would be either (or both):

a) Ability to lookup the RFT resrouce and make local calls on it directly, or

b) Like the DRS, have the RFT resource read the request file directly (instead 
of passing all the file pairs in the SOAP API) by getting it via file://, 
ftp://, http://, or secure equivalents.