| Summary: | memory leak in ftp_client when using connection caching and credential delegation | ||
|---|---|---|---|
| Product: | GridFTP | Reporter: | Andrei Hutanu <hutanu@zib.de> |
| Component: | GridFTP | Assignee: | Joseph M Link <link@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | major | CC: | allcock@mcs.anl.gov, dangulo@cs.uchicago.edu, keahey@mcs.anl.gov |
| Priority: | P2 | ||
| Version: | 3.2.0 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Attachments: |
Program reproducing the bug
added flush calls.. no effect valgrind output fixes leak in flush_url_state fix leak when using same host for source and dest |
||
Can you try using:
globus_result_t
globus_ftp_client_handleattr_set_cache_all(
globus_ftp_client_handleattr_t * attr,
globus_bool_t cache_all)
instead of
globus_ftp_client_handle_cache_url_state()
to see if that narrows this down? If it still occurrs, it would be useful if
you could provide a simplified test program that reproduces the problem.
Thank you,
Joe
Created an attachment (id=409) [details]
Program reproducing the bug
This should reproduce the problem. It's not the cleanest code but
the leak occurs in the loop starting at line 208.
Andrei
I am not seeing any runtime leaks (other than the couple caused by your strdups). All other memory is still being referenced. The majority of this memory will be freed when you call globus_module_deactivate. If this is to be a long running application, then you will need to manage the number of cached urls, as they do not 'timeout' themselves. You can still use globus_ftp_client_handleattr_set_cache_all() to cause everything to automatically be cached, but you'll need a max-cached and a timeout mechanism (look at globus_callback.h for periodic and delayed callbacks). Simply call globus_ftp_client_handle_flush_url_state() to stop caching a specific url. The cache mechanism only matches on scheme, host, and port. So, you'll need to strip all paths from the urls you transfer before adding them to your cache 'manager'. I just noticed a bug in the globus_ftp_client_handle_flush_url_state() call that could cause unpredictable behavior. Before using it, in globus_ftp_client_handle.c, at line 1710 change 'parsed_url' to 'searcher'. Let me know if this resolves your problem. Joe
Created an attachment (id=412) [details]
added flush calls.. no effect
globus_ftp_client_handle_flush_url_state doesn't help.
The multiple_copy method still eats up memory and the connections to the server
are being maintained.
Andrei
Can you confirm that you applied that change to the flush_url_state() function that I mentioned in the previous note. Joe
Also, a quick note. When you call flush_url_state() or handle_destroy(), the cached objects are destroyed asynchronously in the back ground. It is not likely that you'll see the memory drop for a few seconds as the destruction is done gracefully by sending the ftp quit, waiting for the response, etc. If this is a non-threaded build, then you wont see _any_ decrease until the next time the event system is polled, which in your app only happens when you call globus_cond_wait()... and since you're not explicity waiting for those background events (you cant), your app may actually complete before those things complete their shutdown. If you add a globus_module_deactivate_all() just before you main() exits, you will guarantee that all resources are reclaimed before that call returns. In any case, if you still think you're seeing leaks, can you run your app under valgrind or some other leak detecting tool? As I've said, I do not see any leaks when I run the code. Thanks, Joe
Created an attachment (id=413) [details]
valgrind output
Attached the valgrind report. Memory leaks are only a few. It seeems
that the majority of the allocated memory is still reachable.
Unfortunatelly,
globus_ftp_client_handle_flush_url_state(&(the_struct->ftp_handle),
URLs[i]);
on each URL used in a transfer set followed by the
handle destroy
globus_ftp_client_handle_destroy (&(the_struct->ftp_handle));
doesn't free that memory.
module_deactivate cannot be the only solution to free the memory allocated by
the operations since this is only a part of a multi-threaded application that
uses
the ftp module in other places.
You probably don't see the leaks because you are not giving the right
parameters to the application (list of source, destination URL pairs) -- my
fault sorry.
Something like (use more URL's to see the leaky behavior even with top):
./a.out
gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_3D_diagonal.xg
gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_3D_diagonal.xg
gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_maximum.xg
gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_maximum.xg
gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_minimum.xg
gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_minimum.xg
gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_norm1.xg
gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_norm1.xg
gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_norm2.xg
gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_norm2.xg
the build is multithreaded.
Andrei
Created an attachment (id=414) [details]
fixes leak in flush_url_state
Ok, now I see why you're seeing the leak and not me. You're using the same
host/port for the source and destination. This causes the cache to act
differently than it normally would since it only allows one entry per host/port
pair. It's also a rare case since there is no point (other than testing ;-))
in doing a transfer between two servers on the same machine.
I will track down and fix this leak. If you'd like to test this theory, you
can change the source hostname to the actual ip in every url you use and verify
that the leak is gone.
Also, your use of flush_state flushed out another leak that is fixed by the
attached patch.
Joe
Created an attachment (id=415) [details]
fix leak when using same host for source and dest
This should fix everything. An update package will be made available on our
advisories page soon.
I am writing a small method that copies multiple files between two GridFTP servers using the globus_ftp_client library version 1.10 In order to be efficient I was trying to use connection caching. When the caching is activated there are around 200Kb in leaks for around 20 files copied. Any thoughts ? Here is some code : <handles init with credential delegation> globus_mutex_init(&(my_task.mutex), GLOBUS_NULL); globus_cond_init(&(my_task.cond), GLOBUS_NULL); my_task.operation_error = (char*) globus_libc_malloc(sizeof(char)*3000); for (int i = 0; i < size / 2; i++) { if (i == 0) { globus_ftp_client_handle_cache_url_state (&ftp_handle, URLs[2*i]); globus_ftp_client_handle_cache_url_state (&ftp_handle, URLs[2*i + 1]); } my_task.operation_done = 0; #ifdef DEBUG printf("Copying from %s to %s\n", URLs[2*i], URLs[2*i + 1]); #endif result = globus_ftp_client_third_party_transfer(&ftp_handle), URLs[2*i], &source_ftp_attr, URLs[2*i + 1], &dest_ftp_attr, NULL, my_delete_callback_ts, (void*) &my_task); if (result != GLOBUS_SUCCESS) { retmsg = (char*) globus_libc_malloc(strlen(globus_object_printable_to_string(globus_error_get(result))) + 100); assert(retmsg); globus_libc_sprintf(retmsg, "300 Copy error: %s\n", globus_object_printable_to_string(globus_error_get(result))); goto return_retmsg; } globus_mutex_lock(&(my_task.mutex)); while(!my_task.operation_done) globus_cond_wait(&(my_task.cond), &(my_task.mutex)); globus_mutex_unlock(&(my_task.mutex)); if (strcmp(my_task.operation_error, "200 OK") != 0) { retmsg = strdup(my_task.operation_error); assert(retmsg); goto return_retmsg; } } retmsg = strdup("200 OK"); return_retmsg: globus_ftp_client_operationattr_destroy(&dest_ftp_attr); globus_ftp_client_operationattr_destroy(&source_ftp_attr); globus_ftp_client_handle_destroy (&ftp_handle); globus_ftp_client_handleattr_destroy(&ftp_handleattr); globus_libc_free(my_task.operation_error); globus_cond_destroy(&(my_task.cond)); globus_mutex_destroy(&(my_task.mutex)); return retmsg;