Bug 4605 - Possible memory leak in Globus RSL pre-WS
: Possible memory leak in Globus RSL pre-WS
Status: RESOLVED FIXED
: GRAM
gt2 RSL
: 4.0.2
: PC Linux
: P3 normal
: 4.0.3
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-07-18 07:21 by
Modified: 2007-02-14 14:51 (History)


Attachments
Small C file which reproduces the problem. (589 bytes, text/plain)
2006-07-18 07:28, Henrik Thostrup Jensen
Details
Circulating RSL memleak patch (2.07 KB, patch)
2006-07-18 07:56, Henrik Thostrup Jensen
Details
fixes a memory leak (52.37 KB, text/plain)
2006-07-20 12:43, Martin Feller
Details
Patch for Globus 4.0.2 which fixes the memory leak (2.65 KB, text/plain)
2006-07-23 16:55, Henrik Thostrup Jensen
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-07-18 07:21:00
Hi

We have some code which uses globus rsl pre-ws code, which leaks memory.
There is no gram network invovled. I've attached a small c code snippet which
reproduces the problem.
------- Comment #1 From 2006-07-18 07:25:49 -------
Valgrind stack trace of the leak.

==15646== 16384 bytes in 2 blocks are still reachable in loss record 447 of 487
==15646==    at 0x3414C59D: malloc (vg_replace_malloc.c:130)
==15646==    by 0x34860844: globus_libc_malloc (globus_libc.c:418)
==15646==    by 0x348633FD: globus_l_memory_create_list (globus_memory.c:123)
==15646==    by 0x348635A1: globus_memory_pop_node (globus_memory.c:180)
==15646==    by 0x34862EFC: globus_list_insert (globus_list.c:353)
==15646==    by 0x346FF6C0: globus_list_copy_reverse (globus_rsl.c:766)
==15646==    by 0x346FEE15: globus_rsl_copy_recursive (globus_rsl.c:391)
==15646==    by 0x34198727: XrslRelation::XrslRelation(_globus_rsl_t*)
(xrsl.cpp:94)
==15646==    by 0x3419ABC6: Xrsl::GetRelation(std::string const&)
(xrsl.cpp:375)
==15646==    by 0x341F97AB: Target::GetCputime(Xrsl) (stl_alloc.h:652)
==15646==    by 0x341F48B9: CpuTimeBroker::RelationCheck(Target&,
XrslRelation&) (standardbrokers.cpp:263) 
------- Comment #2 From 2006-07-18 07:28:34 -------
Created an attachment (id=1004) [details]
Small C file which reproduces the problem.
------- Comment #3 From 2006-07-18 07:29:17 -------
Btw. the stack trace is from Globus 2.4.
------- Comment #4 From 2006-07-18 07:34:38 -------
Since Stu and Joe are on vacation, I'm assigning this to Martin.
------- Comment #5 From 2006-07-18 07:40:24 -------
Almost forgot. The problem has been verified on 2.4 and 4.1.0
------- Comment #6 From 2006-07-18 07:41:19 -------
(In reply to comment #5)
> Almost forgot. The problem has been verified on 2.4 and 4.1.0
> 
Okay I need to read what I write: 2.4 and 4.0.2 has the problem. 4.1.0 is
untested.
------- Comment #7 From 2006-07-18 07:44:00 -------
I'm interested in hearing about this bug as well, so I'll add myself as cc, if
that's fine.
------- Comment #8 From 2006-07-18 07:53:59 -------
There is a globus RSL memleak patch in circulation. I tried it, but it does not
appear to fix my problems. I'm attaching it.
------- Comment #9 From 2006-07-18 07:56:08 -------
Created an attachment (id=1005) [details]
Circulating RSL memleak patch

This is a patch for fixing a memory leak in globus rsl. It might fix a memory
leak, but the problem stille occured when using it.
------- Comment #10 From 2006-07-18 11:21:45 -------
I just tried the attached example with valgrind and it
seems that the globus_list is affected too.
But I'm new to this library, so let me have a look :-)
------- Comment #11 From 2006-07-20 12:43:21 -------
Created an attachment (id=1006) [details]
fixes a memory leak

Seems like I found the error.

At least I can run the example attached in "Small C file which reproduces
the problem" without memory leaks now.

There are memory leaks in globus_rsl.c. That's the file where all functions
that work on an rsl type are defined. 
The problem are the calls to the function globus_list_copy_reverse(...)
which is defined in globus_rsl.c too and which creates a reverse copy of a
linked list.
The calls in globus_rsl.c (there are three calls) look like this:

    list = globus_list_copy_reverse(list);

This code works, but the problem with this is that the pointer to the new
list created by globus_list_copy_reverse(...) is stored in same pointer like
the list that is given as argument. This means that the pointer to the old
list is overwritten and thus the old list is not referenced anymore and lost
in memory.

I don't know if there are differences between our GT versions. That's why
I attached my new globus_rsl.c (source-trees/gram/rsl/source/globus_rsl.c)
as a "manual patch". You can find the changes when you search for
"MEMORY_LEAK_FIX".

I simply stored the new, reversed list in a new variable and free'd the
old one.
An alternative to this would be to add a new function to globus_list which
adds an element at the end of the list. With this there would be no need
to call globus_list_copy_reverse(...).


Change these things in the code, try it out and please tell me about
the results.


Additional information:
#######################

I extended the above example C-file by freeing the rsl at the end of the
program:
    globus_rsl_free_recursive(rsl);

The output of valgrind (valgrind-3.2.0) of the command

    valgrind --tool=memcheck --leak-check=yes memLeakTest

was the following:    

==16320== Memcheck, a memory error detector for x86-linux.
==16320== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==16320== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==16320== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==16320== For more details, rerun with: -v
==16320==
==16320==
==16320== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 1)
==16320== malloc/free: in use at exit: 16426 bytes in 2 blocks.
==16320== malloc/free: 10126 allocs, 10124 frees, 135104 bytes allocated.
==16320== For counts of detected errors, rerun with: -v
==16320== searching for pointers to 2 not-freed blocks.
==16320== checked 1664660 bytes.
==16320==
==16320== LEAK SUMMARY:
==16320==    definitely lost: 0 bytes in 0 blocks.
==16320==    possibly lost:   0 bytes in 0 blocks.
==16320==    still reachable: 16426 bytes in 2 blocks.
==16320==         suppressed: 0 bytes in 0 blocks.
==16320== Reachable blocks (those to which a pointer was found) are not shown.
==16320== To see them, rerun with: --show-reachable=yes



I wondered about the "still reachable: 16426 bytes in 2 blocks."


The explanation is probably:

A Parser generated by Flex and Bison is used to create the rsl from the
string in the function call
    rsl = globus_rsl_parse(job);

I created a very simple dummy scanner using flex without using any globus
code, started this program with valgrind too and the result was the same: 
    "still reachable: 16426 bytes in 2 blocks."

So I'm quite sure that the reason for this message can be found in flex.

Here is a discussion on flex and possible memory leaks:
    http://compilers.iecc.com/comparch/article/05-10-140


Martin
------- Comment #12 From 2006-07-21 03:21:03 -------
sorry, i forgot to say that I work with GT version 4.0.2.
My attachment "fixes a memory leak" is the modified file globus_rsl.c
from this GT version
------- Comment #13 From 2006-07-23 16:54:10 -------
This appears to fix memory leak. Excellent! Thanks for fixing this so quickly.

I've produced a patch for 4.0.2 (cleaned up some whitespace/tab noise from the
fixed file). I'll attach it. I assume this will go into the dev tree. 

Will there be an updated package to globus 4.0.2? It is not important for us as
we are aware of the problem and can now deal with it, but it has been a
showstopper for some, and it might be for others.
------- Comment #14 From 2006-07-23 16:55:18 -------
Created an attachment (id=1012) [details]
Patch for Globus 4.0.2 which fixes the memory leak

Based on Martins globus_rsl.c file. Cleaned up tab noise.
------- Comment #15 From 2006-07-24 08:56:13 -------
hm, i never imported something to CVS and don't have the rights to do so,
because i'm a newbie in the Globus team.
I'll ask some people how to deal with this ...
------- Comment #16 From 2006-07-26 13:35:56 -------
Joe,

Can you review test and commit this patch?  I think it should be committed to
4.0 branch and trunk.

Thanks,
Stu

Martin: nice job on the fix!
------- Comment #17 From 2006-07-31 10:01:08 -------
Looks good. I committed to 4.0 branch and trunk.