Bugzilla – Bug 2827
Fail to start RFT resource when submitting (large?) request
Last modified: 2006-03-15 15:10:45
You need to log in before you can comment on or make changes to this bug.
(Just want to make this your problem too... ;-) When creating an RFT resource via the command-line client, bin/rft, it chokes on a request which contains 50,000 pairs of src/dest files to be transferred. The only server-/container-side message that appeared to be logged at or near the time of the rft invocation was a "java.lang.OutOfMemoryError" error. I can't exactly tell if that was directly related or coincidental. Here's the command and the console output: schuler@ned-0:/sandbox/schuler$ /sandbox/globus/install/bin/rft -h ned-0.isi.edu -r 9000 -file xfr.epr -f ./transfer.rfttestrun_3.xfr Number of transfers in this request: 50000 Exception in thread "main" Error during startup processing. Caused by AxisFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException faultSubcode: faultString: java.net.SocketException: Connection reset faultActor: faultNode: faultDetail: {http://xml.apache.org/axis/}stackTrace:java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31) at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSIGssInputStream.java:58) at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readMsg(GSIGssInputStream.java:33) at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:75) at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:49) at java.io.BufferedInputStream.fill(BufferedInputStream.java:183) at java.io.BufferedInputStream.read(BufferedInputStream.java:201) at org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket(HTTPSender.java:545) at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:140) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165) at org.apache.axis.client.Call.invokeEngine(Call.java:2726) at org.apache.axis.client.Call.invoke(Call.java:2709) at org.apache.axis.client.Call.invoke(Call.java:2385) at org.apache.axis.client.Call.invoke(Call.java:2308) at org.apache.axis.client.Call.invoke(Call.java:1765) at org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874) at org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204) at org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92) at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34) {http://xml.apache.org/axis/}hostname:ned-0.isi.edu java.net.SocketException: Connection reset at org.apache.axis.AxisFault.makeFault(AxisFault.java:101) at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:144) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165) at org.apache.axis.client.Call.invokeEngine(Call.java:2726) at org.apache.axis.client.Call.invoke(Call.java:2709) at org.apache.axis.client.Call.invoke(Call.java:2385) at org.apache.axis.client.Call.invoke(Call.java:2308) at org.apache.axis.client.Call.invoke(Call.java:1765) at org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874) at org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204) at org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92) at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31) at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSIGssInputStream.java:58) at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readMsg(GSIGssInputStream.java:33) at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:75) at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:49) at java.io.BufferedInputStream.fill(BufferedInputStream.java:183) at java.io.BufferedInputStream.read(BufferedInputStream.java:201) at org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket(HTTPSender.java:545) at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:140) ... 18 more
Can't attach this so here's the URL to the RFT transfer request file used in this test. This file contains a few options for the RFT transfer followed by 50,000 pairs of source / destination file urls for the transfer. http://www.isi.edu/~schuler/rft/transfer.rfttestrun_3.xfr
Well, the documented limit of number of transfers per request in 3.9.5 RFT is ~25K files. When you submit more than that it container/jvm runs out of memory serializing/deserializing the request. I don't think this is a bug in RFT but a limitation of axis. You can try to start your JVM with -Xmx options to increase the heap size from default (64M) to something more
I'm always glad to accept "RTFM" as a valid answer. ;-) In this case, however, I can't find that stated limit to the transfer request size in the logical place: http://www-unix.globus.org/toolkit/docs/development/3.9.5/data/rft/user/#rft That section ends with the following text. " Limitations This command line client is very dumb and simple and does not do any intelligent parsing of various command line options and the options in the sample transfer file. It works fine if used in the way documented here. " I did not come across that limitation under any of the troubleshooting or known bugs sections of RFT docs. I did finally see something close under the Performance section of the Quality page when downloading the report done on it. (There it states that you got to ~21000.) Having that information is good enough for me -- maybe it could be easier to find though?
I ran another test with a request size of 25,000 files (I started this last night before I left the office, unfortunately I was not able to see the container.log output.) What I can tell you is that the rft invocation appear to get going, but it basically froze up, unlike the 50k request size where it instantly crashed. In this case the machine's CPU (I'm running client and server on same box) jumped to ~99% and it didn't seem to do much after that for the remainder of my time in the office. Returning this morning, all I found was this on my client-side console: schuler@ned-0:/sandbox/schuler$ /sandbox/globus/install/bin/rft -h ned-0.isi.edu -r 9000 -file xfr.epr -f ./transfer.rfttestrun_4.xfr Number of transfers in this request: 25000 Exception in thread "main" Error during startup processing. Caused by AxisFault faultCode: {http://xml.apache.org/axis/}HTTP faultSubcode: faultString: (0)null faultActor: faultNode: faultDetail: {}:return code: 0 {http://xml.apache.org/axis/}HttpErrorCode:0 (0)null at org.apache.axis.transport.http.HTTPSender.readFromSocket(HTTPSender.java:692) at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:141) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165) at org.apache.axis.client.Call.invokeEngine(Call.java:2726) at org.apache.axis.client.Call.invoke(Call.java:2709) at org.apache.axis.client.Call.invoke(Call.java:2385) at org.apache.axis.client.Call.invoke(Call.java:2308) at org.apache.axis.client.Call.invoke(Call.java:1765) at org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874) at org.globus.transfer.reliable.client.BaseRFTClient.createRFT(BaseRFTClient.java:204) at org.globus.transfer.reliable.client.ReliableFileTransferClient.main(ReliableFileTransferClient.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:92) at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:34)
In another test, I've been able to submit a transfer request of 20,000 items and it is able to create and start the resource. The CPU is not overtaxed but it has taken an hour at least to transfer a mere 316 files. Is something wrong about my transfer request file -- by that I mean the options I've set in the transfer request (see below). Notice that concurrency and parallel streams are set to "1". I've taken and used these from the sample transfer.xfr that appears in "share/globus_wsrf_rft_client". The files are a measly 10 bytes in size. Also, when I created and started an RFT resource programmatically (from DRS), it performed very well. In that case, I didn't set any RFTOptions fields, so I guess it took some defaults that work better than the following. #true=binary false=ascii true #Block size in bytes 16000 #TCP Buffer size in bytes 16000 #Notpt (No thirdPartyTransfer) false #Number of parallel streams 1 #Data Channel Authentication (DCAU) true # Concurrency of the request 1 #Grid Subject name of the source gridftp server #Grid Subject name of the destination gridftp server #Transfer all or none of the transfers false #Maximum number of retries 10 #Source/Dest URL Pairs ... pairs begin ...
I agree that the limitation should have been more visible then it is right now. that will be fixed. What is this other thing that you are talking about ? Are you concerned about performance of RFT ? I just committed some perf improving stuff to RFT code in trunk. Please use that for your testing.
Via the email gateway: Docs have been updated to reflect the limits on command-line clients
Generally speaking, I'm not really concerned about the performance of RFT -- it was performing very well in my earlier integrated tests and I don't think there should be any problem there. The problem that I described wrt to the 20k file request was that: (1) It performed very very slowly -- which I'm _strongly_ inclined to believe has more to do with _my_ RFT options than some inherent RFT problem. I should test with some other options and see what happens -- this may not belong under this bug, but in some sense it is a continuation of the bigger objective of using RFT for large requests. As I said, I think this has to do with my usage, though I'd like to understand why and what I'm doing wrong. (I need to recreate this first before declaring this a repeatable problem!) (2) Slightly more concerning is that the request stopped at file transfer 316 and never made it past that (by "never" I mean that time at which I lost patience and killed it :). Again, I need to repeat this.
Ravi, In your documentation you must say 21K limit with 64M heap! And point to or explain how to increase the heap size.
Added that now.
I just finished a 21,000 file transfer without any problem (code latest from trunk). Do you have any new updates from your test ? Also I have been doing a 580,000 file transfer ( though its a single directory that was expanded) for a week now ( using it for testing my changes so transfer was'nt continuous) and i have;nt seen a serious problem yet.
Bumping up the max mem param for Java and re-running the 20k xfer test, it is running much more smoothly. (However, it appears to be quite slow.) For now, I'll consider 20k my ceiling. I'd probably like to see that reach 100k but I understand it isn't fully in your control.
So we can close this bug ? ;-)
I was able to submit 30001 transfers with out running out of memory by not sending bunch of options (they are defaulted to sensible values at the service)
Are you testing via the bin/rft tool? or with some other test class?
Added Jarek to the bug as I think this is more of a core issue than RFT
What is the issue still? I thought we answered all the questions.
Right. The reason to keep this bug open is that we should probably make GT4 RFT scale to more than 21K transfers at one time.
Ravi, what is the goal here? How much more? 50K? 100K? The only thing we could do is minimize the overhead of Axis deserialization, etc. but there is always going to be some limit. So the answer at the end will always be the same - to increase the heap.
Ultimately, the goal is unlimited. The problem is that the entire request is deserialized into a single object, and there is simply no need to do that. If we used a straight SAX parser, we could pull the request in a transfer at a time, and our only limit would be the database. We have been avoiding this problem by doing directory transfers, but in the case of RLS, they are generating the list of transfers from the replica catalog and they dont all reside in one directory, so that doesn't help them. The following was a comment from Sam Meder in another email thread about this: Using this sort of streaming mode will make WS-Security impossible, but if you can live with that there is a option in axis to turn off SAX event recording which may help a little: Try adding the attribute streaming="on" to the RFT Factory/Service <service/> element in the server-config.wsdd. We've never tried this though, so it may be buggy.
If the goal is/was unlimited then it cannot be achieved with existing Axis code base. Axis will create a DOM-like structure of the message no matter what. I think technically any JAX-RPC engine must be able to return such representation if asked to do so. So the only thing we can do for now is to lower the overhead.
The target users of my component (replicator which calls rft) would like to make a request containing a list of requested replications (long list of file name pairs somehwat analogous to rft request) and be able to submit such a request with ~1M files (or more). Currently, we can pass along a request to RFT of ~20,000 files. So we're far from that user requirement. My thought has been to chunk the request and submit X requests to reach ~1M. But at 20k it means X will be big and hence all the overhead of connections and security and so on. I agree with Bill's "unlimited" expectation. However, setting my expectations a bit lower, I'd be happier (than I am now) if we could at least hit ~100,000 - - with my happiness being directly proportional to our results against that expectation.
1) If you want to hit 100K start the client and server with ~300M heap! 2) There is a difference between RFT handling unlimited number of transfers vs. handling unlimited transfer request. I think handling of unlimited number of transfers is far more important then dealing with unlimited transfer request. As long as you can submit the 1 million requests in chunks I would argue that RFT is meeting its 'expectations'. And with connection caching the difference between 10 calls with 20K requests vs 1 call with 200K requests should be minimal (even with transport security). 3) Is there a requirements document for RFT? Was the unlimited goal ever defined or even clarified?
I am going to mark this bug as an enhancement as the only way i see we can fix it is to add new operations to RFT interface to add transfers to a resource after it is created.
No problem with the severity change. But as for the remedy, what I'd really like would be either (or both): a) Ability to lookup the RFT resrouce and make local calls on it directly, or b) Like the DRS, have the RFT resource read the request file directly (instead of passing all the file pairs in the SOAP API) by getting it via file://, ftp://, http://, or secure equivalents.