| Summary: | Failed assert in globus_l_xio_system_unregister_write on x86_64 platform | ||
|---|---|---|---|
| Product: | XIO | Reporter: | Rob S <schuler@isi.edu> |
| Component: | Globus XIO | Assignee: | John Bresnahan <bresnaha@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | major | CC: | allcock@mcs.anl.gov, annc@isi.edu, genaro@caos.uab.es, roy@cs.wisc.edu, skoranda@gravity.phys.uwm.edu, ysimmhan@cs.indiana.edu |
| Priority: | P3 | ||
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Attachments: | Tester that invokes RLSClient | ||
Created an attachment (id=1185) [details]
Tester that invokes RLSClient
This tester uses RLSClient to cause a XIO crash on x86_64 platforms (it does
not fail on 32bit platforms).
If you are interested in using this tester, I can help with the setup of RLS
(we would only need the client setup and can use one of my existing servers).
I can also get a core file from the LEAD team that encountered the failure.
For what it is worth, a LIGO user running a client tool on Cent OS 5 x86_64 is receiving the same error: t1094719808:p2904: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed [Thread System] invalid value passed to thread interface (EINVAL) The error is transient. The client tool simply opens up a GSI socket, writes a few bytes to it, and then reads what a remote server has sent down the wire. I have asked the user to run the tool with GLOBUS_XIO_GSI_DEBUG set to 127 and record the stdout.
I just noticed the same bug in one of the VDT nightly tests. It was running on
CentOS 5, x86. It is Globus 4.0.6.
We did:
> globus/bin/globus-rls-cli create foo bar \
rls://localhost:39281 >/tmp/vdt-run-tests.temp.24761
And we got:
t3075419024:p24764: Fatal error: [Thread System] GLOBUSTHREAD:
pthread_mutex_lock() failed
[Thread System] invalid value passed to thread interface (EINVAL)
Is this bug being investigated? It seems to be rather old now.
Thanks,
-alain
I attached gdb to a running client that was looping over and over again opening
a socket, sending some bytes to a server, reading the response, and then
closing the socket.
Here is the backtrace after the client aborts with the reported error:
(gdb) backtrace
#0 0x0000003237a30015 in raise () from /lib64/libc.so.6
#1 0x0000003237a31980 in abort () from /lib64/libc.so.6
#2 0x00002aaaae305029 in globus_silent_fatal () at globus_print.c:57
#3 0x00002aaaae30513b in globus_fatal (msg=0x2aaaae314abb "%s %s\n%s %s")
at globus_print.c:88
#4 0x00002aaaae309246 in globus_i_thread_report_bad_rc (rc=22,
message=0x2aaaae314eb0 "GLOBUSTHREAD: pthread_mutex_lock() failed\n")
at globus_thread_common.c:145
#5 0x00002aaaae30a1db in globus_mutex_lock (mut=0x2aaaae521d40)
at globus_thread_pthreads.c:823
#6 0x00002aaaae300fa0 in globus_object_free (object=0x1e0a37b0)
at globus_object.c:225
#7 0x00002aaaae2f3f3d in s_key_destructor_func (value=0x1e0a37b0)
at globus_error.c:289
#8 0x0000003238605919 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#9 0x00000032386061c3 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003237acd36d in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()
I ran the client with GLOBUS_XIO_GSI_DEBUG = 127. The debugging trace is available at http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/ldr_software/xiodebug.out.gz
I have a pretty good idea why this is happening. The mutex is being checked in a thread specific key destructor handler that potentially occurs after the mutex is destroyed. I think the easiest solution is to initialize the mutex once per process creation, and then never destroy it (just allow the OS to clean it up on unload). I will test a patch soon.
Subject: Re: Failed assert in globus_l_xio_system_unregister_write on x86_64 platform > I have a pretty good idea why this is happening. The mutex is being > checked in > a thread specific key destructor handler that potentially occurs > after the > mutex is destroyed. I think the easiest solution is to initialize > the mutex > once per process creation, and then never destroy it (just allow the > OS to > clean it up on unload). > > I will test a patch soon. That's great news! If you can share a patch against Globus 4.0.6, I can include it in a future VDT release. Thanks! -alain
This is fixed in the trunk and the 4.0 branch. It should be in the 4.0.7 release