<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugzilla.globus.org/bugzilla/bugzilla.dtd">

<bugzilla version="3.2.3"
          urlbase="http://bugzilla.globus.org/bugzilla/"
          maintainer="bacon@mcs.anl.gov"
>

    <bug>
          <bug_id>5004</bug_id>
          
          <creation_ts>2007-02-09 17:22</creation_ts>
          <short_desc>Failed assert in globus_l_xio_system_unregister_write on x86_64 platform</short_desc>
          <delta_ts>2008-03-27 09:52:47</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XIO</product>
          <component>Globus XIO</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          
          <priority>P3</priority>
          <bug_severity>major</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Rob S">schuler@isi.edu</reporter>
          <assigned_to name="John Bresnahan">bresnaha@mcs.anl.gov</assigned_to>
          <cc>allcock@mcs.anl.gov</cc>
    
    <cc>annc@isi.edu</cc>
    
    <cc>genaro@caos.uab.es</cc>
    
    <cc>roy@cs.wisc.edu</cc>
    
    <cc>skoranda@gravity.phys.uwm.edu</cc>
    
    <cc>ysimmhan@cs.indiana.edu</cc>

      

      
          <long_desc isprivate="0">
            <who name="Rob S">schuler@isi.edu</who>
            <bug_when>2007-02-09 17:22:54</bug_when>
            <thetext>I am moving this from an xio-user discussion to a bugzilla report. I have copied the mail below, which includes one from an RLS user to me (schuler).

--- emails below ---

A user of ours (of RLS) is running into a problem on a x86_64 machine running RedHat. There seems to be a failed globus_assert in a globus_l_xio_system_unregister_read(...) call. And another failed assert downstream from an xio call. See the assert messages below.

The failure is happening on the client side of RLS when making multiple rapid connections -&gt; transactions -&gt; disconnects with multiple threads running (though the threads are NOT sharing the client, they create their own).

They sent me a tester that depended on some of their own code, so I rewrote it. I tried it myself on a x86_64 running Suse 9. It failed with:

&quot;
...
t1095125312:p5917: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed

[Thread System] invalid value passed to thread interface (EINVAL)Aborted
&quot;

Not sure yet whether that&apos;s related or not.

On a 32bit installation I can run the tester with 100 threads and it completes successfully. On a 64 bit machine when running the tester it crashes with 10 threads (concurrent connections).

Is this a globus_xio problem on 64bit machines? Do you need more information?

thanks,
rob

-------- Original Message --------
Subject:     RE: RLS Client connection thro JNI causes JVM failure
Date:     Tue, 6 Feb 2007 20:22:08 -0500


Rob, I have more bugs and questions for you. I added connection pooling and stress tested RLS and found it to fail when
having multiple threads accessing the RLS service (using different RLSClient objects) simultaneously. I get two types of
errors, both when connecting, and both causing a JVM seg fault and core dump.
1. &quot;Assertion FD_ISSET(fd, globus_l_xio_system_write_fds) failed in file globus_xio_system_select.c at line 985.&quot;
2. &quot;java: md_rand.c:312: ssleay_rand_add: Assertion `md_c[1] == md_count[1]&apos; failed.&quot;

The gdb traces are below. I see that the first of these has been reported by another user on a 64 bit machine
(http://www.globus.org/mail_archive/discuss/2005/06/msg00046.html). Do you have any bug fixes for these? I&apos;m attaching
the test client I used. It starts failing with # of clients &gt; 1. If you need more installation information, let me or
Suresh know.

Thanks!

--Yogesh

&quot;Assertion FD_ISSET(fd, globus_l_xio_system_write_fds) failed in file globus_xio_system_select.c at line 985.&quot;

(gdb) where
#0  0x0000003e2642e21d in raise () from /lib64/tls/libc.so.6
#1  0x0000003e2642fa1e in abort () from /lib64/tls/libc.so.6
#2  0x0000002adcd1b3cb in globus_l_xio_system_unregister_write (fd=9) at globus_xio_system_select.c:985
#3  0x0000002adcd1f263 in globus_l_xio_system_handle_write (fd=9) at globus_xio_system_select.c:2002
#4  0x0000002adcd1fd82 in globus_l_xio_system_poll (user_args=0x0) at globus_xio_system_select.c:2249
#5  0x0000002addc183b1 in globus_callback_space_poll (timestop=0x40f5d350, space=-2) at globus_callback_nothreads.c:1430
#6  0x0000002adcac915e in rrpc_connect (host=0x2ad8dcf3b0 &quot;kilimanjaro.extreme.indiana.edu&quot;, port=39281,
attr=0x40f5f410, h=0x40118020, errmsg=0x40f5f450 &quot;&quot;, errmsglen=1024) at rpc.c:144
#7  0x0000002adcac8762 in connect1 (h=0x40117fc0, errmsg=0x40f5f450 &quot;&quot;, errlen=1024) at client.c:2462
#8  0x0000002adcac2f59 in globus_rls_client_connect (url=0x401176d0 &quot;rls://kilimanjaro.extreme.indiana.edu&quot;,
h=0x40f5f8b0) at client.c:204
#9  0x0000002adc9b8a51 in Java_org_globus_replica_rls_RLSClient_connect (env=0x2adc4f9110, obj=0x40f5f968,
jurl=0x40f5f960) at jni.c:106


&quot;java: md_rand.c:312: ssleay_rand_add: Assertion `md_c[1] == md_count[1]&apos; failed.&quot;
(gdb) where
#0  0x0000003e2642e21d in raise () from /lib64/tls/libc.so.6
#1  0x0000003e2642fa1e in abort () from /lib64/tls/libc.so.6
#2  0x0000003e26427ae1 in __assert_fail () from /lib64/tls/libc.so.6
#3  0x0000002ade091145 in ssleay_rand_add (buf=0x41966b00, num=144, add=2) at md_rand.c:312
#4  0x0000002ade091e91 in RAND_add (buf=0x41966a70, num=144, entropy=2) at rand_lib.c:151
#5  0x0000002add730326 in globus_gsi_sysconfig_dir_exists_unix (filename=0x2adc939d90 &quot;/u/drlead&quot;) at
globus_gsi_system_config.c:4600
#6  0x0000002add72fc75 in globus_gsi_sysconfig_get_home_dir_unix (home_dir=0x41966bd8) at
globus_gsi_system_config.c:4394
#7  0x0000002add730ee2 in globus_gsi_sysconfig_get_cert_dir_unix (cert_dir=0x41966c50) at
globus_gsi_system_config.c:4954
#8  0x0000002add1eb188 in globus_i_gsi_gssapi_init_ssl_context (minor_status=0x41966ce8, credential=0x2adc939e10,
anon_ctx=GLOBUS_I_GSI_GSS_DEFAULT_CONTEXT) at globus_i_gsi_gss_utils.c:2097
#9  0x0000002add1e99ad in globus_i_gsi_gss_create_cred (minor_status=0x41966d3c, cred_usage=1,
output_cred_handle_P=0x2adc937878, cred_handle=0x41966d40) at globus_i_gsi_gss_utils.c:1532
#10 0x0000002add1e9437 in globus_i_gsi_gss_cred_read (minor_status=0x41966da0, cred_usage=1, cred_handle=0x2adc937878,
desired_subject=0x0) at globus_i_gsi_gss_utils.c:1323
#11 0x0000002add1df784 in gss_acquire_cred (minor_status=0x41966f28, desired_name_P=0x0, time_req=4294967295,
desired_mechs=0x0, cred_usage=1, output_cred_handle_P=0x2adc937878, actual_mechs=0x0, time_rec=0x0)
   at acquire_cred.c:126
#12 0x0000002add1e777e in globus_i_gsi_gss_create_and_fill_context (minor_status=0x41966fd0,
context_handle_P=0x41966fd8, cred_handle=0x0, cred_usage=1, req_flags=2) at globus_i_gsi_gss_utils.c:291
#13 0x0000002add1e32c8 in gss_init_sec_context (minor_status=0x41967070, initiator_cred_handle=0x0,
context_handle_P=0x2adc275b08, target_name=0x2adc937510, mech_type=0x0, req_flags=2, time_req=0,    input_chan_bindings=0x0, input_token=0x0, actual_mech_type=0x2adc275b20, output_token=0x41967060,
ret_flags=0x2adc275af8, time_rec=0x2adc275afc) at init_sec_context.c:128
#14 0x0000002adce75140 in globus_l_xio_gsi_open_cb (op=0x2adc275750, result=0, user_arg=0x2adc275af0) at
globus_xio_gsi.c:1488
#15 0x0000002adce52841 in globus_l_xio_driver_open_op_kickout (user_arg=0x2adc275750) at globus_xio_driver.c:893
#16 0x0000002adce5fcd2 in globus_xio_driver_finished_open (in_dh=0x2adc275a10, in_op=0x2adc275750, in_res=0) at
globus_xio_pass.c:231
#17 0x0000002adce96348 in globus_l_xio_tcp_system_connect_cb (result=0, user_arg=0x2adc275cb0) at
globus_xio_tcp_driver.c:1797
#18 0x0000002adce6658c in globus_l_xio_system_kickout (user_arg=0x2adc938ae0) at globus_xio_system_select.c:1010
#19 0x0000002addd633b1 in globus_callback_space_poll (timestop=0x41967350, space=-2) at globus_callback_nothreads.c:1430
#20 0x0000002adcc1415e in rrpc_connect (host=0x2adc2749f0 &quot;kilimanjaro.extreme.indiana.edu&quot;, port=39281,
attr=0x41969410, h=0x2adc2749d0, errmsg=0x41969450 &quot;&quot;, errmsglen=1024) at rpc.c:144
#21 0x0000002adcc13762 in connect1 (h=0x2adc274970, errmsg=0x41969450 &quot;&quot;, errlen=1024) at client.c:2462
#22 0x0000002adcc0df59 in globus_rls_client_connect (url=0x2adc274940 &quot;rls://kilimanjaro.extreme.indiana.edu&quot;,
h=0x419698b0) at client.c:204
#23 0x0000002adca03a51 in Java_org_globus_replica_rls_RLSClient_connect (env=0x2adc210140, obj=0x41969968,
jurl=0x41969960) at jni.c:106</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Rob S">schuler@isi.edu</who>
            <bug_when>2007-02-09 17:27:43</bug_when>
            <thetext>Created an attachment (id=1185)
Tester that invokes RLSClient

This tester uses RLSClient to cause a XIO crash on x86_64 platforms (it does not fail on 32bit platforms).

If you are interested in using this tester, I can help with the setup of RLS (we would only need the client setup and can use one of my existing servers).

I can also get a core file from the LEAD team that encountered the failure.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Scott Koranda">skoranda@gravity.phys.uwm.edu</who>
            <bug_when>2008-03-11 21:23:20</bug_when>
            <thetext>For what it is worth, a LIGO user running a client tool on Cent OS 5 x86_64 is receiving the same error:

t1094719808:p2904: Fatal error: [Thread System] GLOBUSTHREAD:
pthread_mutex_lock() failed

[Thread System] invalid value passed to thread interface (EINVAL)

The error is transient. 

The client tool simply opens up a GSI socket, writes a few bytes to it, and then reads what a remote server has sent down the wire. 

I have asked the user to run the tool with GLOBUS_XIO_GSI_DEBUG set to 127 and record the stdout. </thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Alain Roy">roy@cs.wisc.edu</who>
            <bug_when>2008-03-14 09:01:31</bug_when>
            <thetext>I just noticed the same bug in one of the VDT nightly tests. It was running on CentOS 5, x86. It is Globus 4.0.6. 

We did:

&gt; globus/bin/globus-rls-cli create foo bar \
  rls://localhost:39281 &gt;/tmp/vdt-run-tests.temp.24761

And we got:

   t3075419024:p24764: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed
 
 [Thread System] invalid value passed to thread interface (EINVAL)

Is this bug being investigated? It seems to be rather old now. 

Thanks,
-alain
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Scott Koranda">skoranda@gravity.phys.uwm.edu</who>
            <bug_when>2008-03-17 16:34:57</bug_when>
            <thetext>I attached gdb to a running client that was looping over and over again opening a socket, sending some bytes to a server, reading the response, and then closing the socket. 

Here is the backtrace after the client aborts with the reported error:

(gdb) backtrace
#0  0x0000003237a30015 in raise () from /lib64/libc.so.6
#1  0x0000003237a31980 in abort () from /lib64/libc.so.6
#2  0x00002aaaae305029 in globus_silent_fatal () at globus_print.c:57
#3  0x00002aaaae30513b in globus_fatal (msg=0x2aaaae314abb &quot;%s %s\n%s %s&quot;)
    at globus_print.c:88
#4  0x00002aaaae309246 in globus_i_thread_report_bad_rc (rc=22,
    message=0x2aaaae314eb0 &quot;GLOBUSTHREAD: pthread_mutex_lock() failed\n&quot;)
    at globus_thread_common.c:145
#5  0x00002aaaae30a1db in globus_mutex_lock (mut=0x2aaaae521d40)
    at globus_thread_pthreads.c:823
#6  0x00002aaaae300fa0 in globus_object_free (object=0x1e0a37b0)
    at globus_object.c:225
#7  0x00002aaaae2f3f3d in s_key_destructor_func (value=0x1e0a37b0)
    at globus_error.c:289
#8  0x0000003238605919 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#9  0x00000032386061c3 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003237acd36d in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Scott Koranda">skoranda@gravity.phys.uwm.edu</who>
            <bug_when>2008-03-17 18:13:17</bug_when>
            <thetext>I ran the client with GLOBUS_XIO_GSI_DEBUG = 127. The debugging trace is available at

http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/ldr_software/xiodebug.out.gz</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="John Bresnahan">bresnaha@mcs.anl.gov</who>
            <bug_when>2008-03-18 16:57:04</bug_when>
            <thetext>I have a pretty good idea why this is happening.  The mutex is being checked in a thread specific key destructor handler that potentially occurs after the mutex is destroyed.  I think the easiest solution is to initialize the mutex once per process creation, and then never destroy it (just allow the OS to clean it up on unload).

I will test a patch soon.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Alain Roy">roy@cs.wisc.edu</who>
            <bug_when>2008-03-18 21:00:08</bug_when>
            <thetext>Subject: Re:  Failed assert in globus_l_xio_system_unregister_write
 on x86_64 platform

&gt; I have a pretty good idea why this is happening.  The mutex is being  
&gt; checked in
&gt; a thread specific key destructor handler that potentially occurs  
&gt; after the
&gt; mutex is destroyed.  I think the easiest solution is to initialize  
&gt; the mutex
&gt; once per process creation, and then never destroy it (just allow the  
&gt; OS to
&gt; clean it up on unload).
&gt;
&gt; I will test a patch soon.

That&apos;s great news! If you can share a patch against Globus 4.0.6, I  
can include it in a future VDT release.
Thanks!
-alain

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="John Bresnahan">bresnaha@mcs.anl.gov</who>
            <bug_when>2008-03-27 09:52:47</bug_when>
            <thetext>This is fixed in the trunk and the 4.0 branch.  It should be in the 4.0.7 release</thetext>
          </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>1185</attachid>
            <date>2007-02-09 17:27</date>
            <desc>Tester that invokes RLSClient</desc>
            <filename>RLSTest.java</filename>
            <type>text/x-java</type>
            <size>2277</size>
            <attacher>schuler@isi.edu</attacher>
            <data encoding="base64">aW1wb3J0IGphdmEudXRpbC4qOwoKaW1wb3J0IG9yZy5nbG9idXMucmVwbGljYS5ybHMuKjsKCnB1
YmxpYyBjbGFzcyBSTFNUZXN0IGltcGxlbWVudHMgUnVubmFibGUgewoKICAgIHByaXZhdGUgU3Ry
aW5nIHJsc3VybCA9IG51bGw7CgogICAgcHVibGljIFJMU1Rlc3QoU3RyaW5nIHVybCkgewogICAg
ICAgIHJsc3VybCA9IHVybDsKICAgIH0KCiAgICBwdWJsaWMgc3RhdGljIHZvaWQgbWFpbihTdHJp
bmdbXSBhcmdzKSB0aHJvd3MgRXhjZXB0aW9uIHsKICAgICAgICBpbnQgbnVtVGhyZWFkcyA9IGFy
Z3MubGVuZ3RoID4gMCA/IEludGVnZXIucGFyc2VJbnQoYXJnc1swXSkgOiA1OwogICAgICAgIFN0
cmluZyBybHN1cmwgPSBhcmdzLmxlbmd0aCA+IDEgPyBhcmdzWzFdIDogInJsczovL2xvY2FsaG9z
dDozOTI4MSI7CiAgICAgICAgTGlzdCB0aHJlYWRzID0gbmV3IEFycmF5TGlzdCgpOwogICAgICAg
IGZvcihpbnQgaT0wOyBpPG51bVRocmVhZHM7IGkrKyl7CiAgICAgICAgICAgIHRocmVhZHMuYWRk
KG5ldyBUaHJlYWQobmV3IFJMU1Rlc3QocmxzdXJsKSwgIlJMU19URVNUX1RIUkVBRC0iICsgaSkp
OwogICAgICAgICAgICBTeXN0ZW0ub3V0LnByaW50bG4oImNyZWF0ZWQgIiArICgoVGhyZWFkKXRo
cmVhZHMuZ2V0KGkpKS5nZXROYW1lKCkpOwogICAgICAgIH0KICAgICAgICBJdGVyYXRvciBpdGVy
ID0gdGhyZWFkcy5pdGVyYXRvcigpOwogICAgICAgIHdoaWxlIChpdGVyLmhhc05leHQoKSl7CiAg
ICAgICAgICAgIFRocmVhZCB0ID0gKFRocmVhZCkgaXRlci5uZXh0KCk7CiAgICAgICAgICAgIHQu
c3RhcnQoKTsKICAgICAgICAgICAgU3lzdGVtLm91dC5wcmludGxuKCJzdGFydGVkICIgKyB0Lmdl
dE5hbWUoKSk7CiAgICAgICAgfQogICAgfQoKICAgIHB1YmxpYyB2b2lkIHJ1bigpIHsKICAgICAg
ICBSTFNDbGllbnQgY2xpZW50ID0gbnVsbDsKICAgICAgICB0cnkgewogICAgICAgICAgICBkZWJ1
ZygiQXR0ZW1wdGluZyBjb25uZWN0aW9uLi4uIik7CiAgICAgICAgICAgIGNsaWVudCA9IG5ldyBS
TFNDbGllbnQodGhpcy5ybHN1cmwpOwogICAgICAgICAgICBkZWJ1ZygiICBDb25uZWN0ZWQiKTsK
ICAgICAgICAgICAgdGVzdFF1ZXJ5KGNsaWVudCk7CiAgICAgICAgfQogICAgICAgIGNhdGNoKEV4
Y2VwdGlvbiBlKSB7CiAgICAgICAgICAgIGRlYnVnKCJGYWlsZWQ6ICIgKyBlLmdldE1lc3NhZ2Uo
KSk7CiAgICAgICAgICAgIGUucHJpbnRTdGFja1RyYWNlKCk7CiAgICAgICAgfQogICAgICAgIGZp
bmFsbHkgewogICAgICAgICAgICBpZiAoY2xpZW50ICE9IG51bGwpIHsKICAgICAgICAgICAgICAg
IHRyeSB7CiAgICAgICAgICAgICAgICAgICAgZGVidWcoIkNsb3NpbmcgY29ubmVjdGlvbi4uLiIp
OwogICAgICAgICAgICAgICAgICAgIGNsaWVudC5DbG9zZSgpOwogICAgICAgICAgICAgICAgICAg
IGRlYnVnKCIgIENsb3NlZCIpOwogICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgY2F0
Y2ggKEV4Y2VwdGlvbiBlKSB7CiAgICAgICAgICAgICAgICB9CiAgICAgICAgICAgIH0KICAgICAg
ICB9CiAgICB9CgogICAgcHJpdmF0ZSB2b2lkIHRlc3RRdWVyeShSTFNDbGllbnQgY2xpKSB0aHJv
d3MgUkxTRXhjZXB0aW9uIHsKICAgICAgICBkZWJ1ZygidGVzdFF1ZXJ5Iik7CiAgICAgICAgUkxT
Q2xpZW50LkxSQyBscmMgPSBjbGkuZ2V0TFJDKCk7CiAgICAgICAgZGVidWcoIiAgdGVzdGluZyB3
aWxkY2FyZCBxdWVyeSIpOwogICAgICAgIEFycmF5TGlzdCByZXN1bHRzID0gbnVsbDsKICAgICAg
ICB0cnkgewogICAgICAgICAgICByZXN1bHRzID0gbHJjLmdldFBGTldDKCJsZm4qIik7CiAgICAg
ICAgfQogICAgICAgIGNhdGNoIChSTFNFeGNlcHRpb24gZSkgewogICAgICAgICAgICBkZWJ1Zygi
ICAiICsgZS5nZXRNZXNzYWdlKCkpOwogICAgICAgIH0KICAgICAgICBkZWJ1ZygiICB3aWxkY2Fy
ZCBxdWVyeSByZXR1cm5lZCAiCiAgICAgICAgICAgICsgKChyZXN1bHRzPT1udWxsKT8wOnJlc3Vs
dHMuc2l6ZSgpKSArICIgcmVzdWx0cyIpOwogICAgfQoKICAgIHByaXZhdGUgc3RhdGljIFN0cmlu
ZyBnZW5lcmF0ZUlEKCkgewogICAgICAgIFVVSUQgdXVpZCA9IFVVSUQucmFuZG9tVVVJRCgpOwog
ICAgICAgIHJldHVybiAidGVtcC0iICsgdXVpZC50b1N0cmluZygpOwogICAgfQoKICAgIHByaXZh
dGUgdm9pZCBkZWJ1ZyhTdHJpbmcgbXNnKSB7CiAgICAgICAgU3lzdGVtLm91dC5wcmludGxuKFRo
cmVhZC5jdXJyZW50VGhyZWFkKCkuZ2V0TmFtZSgpICsgIjogIiArIG1zZyk7CiAgICB9Cn0K
</data>        

          </attachment>
      

    </bug>

</bugzilla>