Bugzilla – Bug 1501
memory leak in ftp_client when using connection caching and credential delegation
Last modified: 2004-07-23 12:55:53
You need to log in before you can comment on or make changes to this bug.
I am writing a small method that copies multiple files between two GridFTP servers using the globus_ftp_client library version 1.10 In order to be efficient I was trying to use connection caching. When the caching is activated there are around 200Kb in leaks for around 20 files copied. Any thoughts ? Here is some code : <handles init with credential delegation> globus_mutex_init(&(my_task.mutex), GLOBUS_NULL); globus_cond_init(&(my_task.cond), GLOBUS_NULL); my_task.operation_error = (char*) globus_libc_malloc(sizeof(char)*3000); for (int i = 0; i < size / 2; i++) { if (i == 0) { globus_ftp_client_handle_cache_url_state (&ftp_handle, URLs[2*i]); globus_ftp_client_handle_cache_url_state (&ftp_handle, URLs[2*i + 1]); } my_task.operation_done = 0; #ifdef DEBUG printf("Copying from %s to %s\n", URLs[2*i], URLs[2*i + 1]); #endif result = globus_ftp_client_third_party_transfer(&ftp_handle), URLs[2*i], &source_ftp_attr, URLs[2*i + 1], &dest_ftp_attr, NULL, my_delete_callback_ts, (void*) &my_task); if (result != GLOBUS_SUCCESS) { retmsg = (char*) globus_libc_malloc(strlen(globus_object_printable_to_string(globus_error_get(result))) + 100); assert(retmsg); globus_libc_sprintf(retmsg, "300 Copy error: %s\n", globus_object_printable_to_string(globus_error_get(result))); goto return_retmsg; } globus_mutex_lock(&(my_task.mutex)); while(!my_task.operation_done) globus_cond_wait(&(my_task.cond), &(my_task.mutex)); globus_mutex_unlock(&(my_task.mutex)); if (strcmp(my_task.operation_error, "200 OK") != 0) { retmsg = strdup(my_task.operation_error); assert(retmsg); goto return_retmsg; } } retmsg = strdup("200 OK"); return_retmsg: globus_ftp_client_operationattr_destroy(&dest_ftp_attr); globus_ftp_client_operationattr_destroy(&source_ftp_attr); globus_ftp_client_handle_destroy (&ftp_handle); globus_ftp_client_handleattr_destroy(&ftp_handleattr); globus_libc_free(my_task.operation_error); globus_cond_destroy(&(my_task.cond)); globus_mutex_destroy(&(my_task.mutex)); return retmsg;
Can you try using: globus_result_t globus_ftp_client_handleattr_set_cache_all( globus_ftp_client_handleattr_t * attr, globus_bool_t cache_all) instead of globus_ftp_client_handle_cache_url_state() to see if that narrows this down? If it still occurrs, it would be useful if you could provide a simplified test program that reproduces the problem. Thank you, Joe
Created an attachment (id=409) [details] Program reproducing the bug This should reproduce the problem. It's not the cleanest code but the leak occurs in the loop starting at line 208. Andrei
I am not seeing any runtime leaks (other than the couple caused by your strdups). All other memory is still being referenced. The majority of this memory will be freed when you call globus_module_deactivate. If this is to be a long running application, then you will need to manage the number of cached urls, as they do not 'timeout' themselves. You can still use globus_ftp_client_handleattr_set_cache_all() to cause everything to automatically be cached, but you'll need a max-cached and a timeout mechanism (look at globus_callback.h for periodic and delayed callbacks). Simply call globus_ftp_client_handle_flush_url_state() to stop caching a specific url. The cache mechanism only matches on scheme, host, and port. So, you'll need to strip all paths from the urls you transfer before adding them to your cache 'manager'. I just noticed a bug in the globus_ftp_client_handle_flush_url_state() call that could cause unpredictable behavior. Before using it, in globus_ftp_client_handle.c, at line 1710 change 'parsed_url' to 'searcher'. Let me know if this resolves your problem. Joe
Created an attachment (id=412) [details] added flush calls.. no effect globus_ftp_client_handle_flush_url_state doesn't help. The multiple_copy method still eats up memory and the connections to the server are being maintained. Andrei
Can you confirm that you applied that change to the flush_url_state() function that I mentioned in the previous note. Joe
Also, a quick note. When you call flush_url_state() or handle_destroy(), the cached objects are destroyed asynchronously in the back ground. It is not likely that you'll see the memory drop for a few seconds as the destruction is done gracefully by sending the ftp quit, waiting for the response, etc. If this is a non-threaded build, then you wont see _any_ decrease until the next time the event system is polled, which in your app only happens when you call globus_cond_wait()... and since you're not explicity waiting for those background events (you cant), your app may actually complete before those things complete their shutdown. If you add a globus_module_deactivate_all() just before you main() exits, you will guarantee that all resources are reclaimed before that call returns. In any case, if you still think you're seeing leaks, can you run your app under valgrind or some other leak detecting tool? As I've said, I do not see any leaks when I run the code. Thanks, Joe
Created an attachment (id=413) [details] valgrind output Attached the valgrind report. Memory leaks are only a few. It seeems that the majority of the allocated memory is still reachable. Unfortunatelly, globus_ftp_client_handle_flush_url_state(&(the_struct->ftp_handle), URLs[i]); on each URL used in a transfer set followed by the handle destroy globus_ftp_client_handle_destroy (&(the_struct->ftp_handle)); doesn't free that memory. module_deactivate cannot be the only solution to free the memory allocated by the operations since this is only a part of a multi-threaded application that uses the ftp module in other places. You probably don't see the leaks because you are not giving the right parameters to the application (list of source, destination URL pairs) -- my fault sorry. Something like (use more URL's to see the leaky behavior even with top): ./a.out gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_3D_diagonal.xg gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_3D_diagonal.xg gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_maximum.xg gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_maximum.xg gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_minimum.xg gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_minimum.xg gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_norm1.xg gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_norm1.xg gsiftp://litchi.zib.de/data/gridlab/viz_input/short_herrmann_pi2/ADM-BS_Gx_norm2.xg gsiftp://litchi.zib.de/tmp/aa/ADM-BS_Gx_norm2.xg the build is multithreaded. Andrei
Created an attachment (id=414) [details] fixes leak in flush_url_state Ok, now I see why you're seeing the leak and not me. You're using the same host/port for the source and destination. This causes the cache to act differently than it normally would since it only allows one entry per host/port pair. It's also a rare case since there is no point (other than testing ;-)) in doing a transfer between two servers on the same machine. I will track down and fix this leak. If you'd like to test this theory, you can change the source hostname to the actual ip in every url you use and verify that the leak is gone. Also, your use of flush_state flushed out another leak that is fixed by the attached patch. Joe
Created an attachment (id=415) [details] fix leak when using same host for source and dest This should fix everything. An update package will be made available on our advisories page soon.