Bug 4689 - gram client crashes when credential changed during active connections
: gram client crashes when credential changed during active connections
Status: RESOLVED DUPLICATE of bug 4620
: GRAM
gt2 Gram client
: 4.0.2
: All All
: P3 normal
: 4.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-09-07 20:11 by
Modified: 2006-10-02 18:58 (History)


Attachments
Code that provokes the crash (4.64 KB, text/plain)
2006-09-07 20:12, Jaime Frey
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-09-07 20:11:22
I am able to reliably reproduce a crash of the pre-ws gram client code by
calling globus_gram_client_set_credentials() while a gram ping command is
active (initiated by globus_gram_client_register_ping()). It appears to be
caused by a bug in the globus_gram_protocol module and a probable bug somewhere
in globus_gram_protocol, globus_io, or globus_xio. I will post the code I'm
using to reproduce the crash.

Here's a stack trace of the crash point:
#0  0x0814cf32 in X509_STORE_get_by_subject (vs=0xbffff350, type=1, 
    name=0x8229648, ret=0xbffff290) at x509_lu.c:290
#1  0x0814d624 in X509_STORE_CTX_get1_issuer (issuer=0xbffff2e8, 
    ctx=0xbffff350, x=0x8246be8) at x509_lu.c:495
#2  0x0814837f in X509_verify_cert (ctx=0xbffff350) at x509_vfy.c:238
#3  0x080e5c37 in globus_gsi_callback_X509_verify_cert (context=0xbffff350, 
    arg=0x0) at globus_gsi_callback.c:378
#4  0x0810e852 in ssl_verify_cert_chain (s=0x82303e0, sk=0x82298a8)
    at ssl_cert.c:487
#5  0x080fe4b1 in ssl3_get_server_certificate (s=0x82303e0) at s3_clnt.c:833
#6  0x080fd242 in ssl3_connect (s=0x82303e0) at s3_clnt.c:275
#7  0x0810d2d1 in SSL_do_handshake (s=0x82303e0) at ssl_lib.c:1826
#8  0x08114046 in ssl_ctrl (b=0x822e7b0, cmd=101, num=0, ptr=0x0)
    at bio_ssl.c:417
#9  0x08123564 in BIO_ctrl (b=0x822e7b0, cmd=101, larg=0, parg=0x0)
    at bio_lib.c:324
#10 0x080d050b in globus_i_gsi_gss_handshake (minor_status=0xbffff704, 
    context_handle=0x822e8f8) at globus_i_gsi_gss_utils.c:822
#11 0x080cb650 in gss_init_sec_context (minor_status=0xbffff7a4, 
    initiator_cred_handle=0x82298a8, context_handle_P=0x8231010, 
    target_name=0x822df68, mech_type=0x0, req_flags=34, time_req=0, 
    input_chan_bindings=0x0, input_token=0xbffff790, 
    actual_mech_type=0x823101c, output_token=0xbffff798, ret_flags=0x8231004, 
    time_rec=0x8231008) at init_sec_context.c:178
#12 0x0808586d in globus_l_xio_gsi_read_token_cb (op=0x8230e68, result=0, 
    nbytes=2992, user_arg=0x8231000) at globus_xio_gsi.c:1136
#13 0x0806fcf6 in globus_l_xio_driver_op_read_kickout (user_arg=0x8230e68)
    at globus_xio_driver.c:620
#14 0x0807e51e in globus_xio_driver_finished_read (in_op=0x8230e68, result=0, 
    nbytes=2992) at globus_xio_pass.c:1227
#15 0x080a532a in globus_l_xio_tcp_finish_read (handle=0x8229088, result=0, 
    nbytes=2992) at globus_xio_tcp_driver.c:2196
#16 0x080a53b3 in globus_l_xio_tcp_system_read_cb (result=0, nbytes=2992, 
    user_arg=0x8229088) at globus_xio_tcp_driver.c:2211
#17 0x080be763 in globus_l_xio_system_kickout (user_arg=0x822fe50)
    at globus_xio_system_select.c:1016
#18 0x081aa7f9 in globus_callback_space_poll (timestop=0x8200428, space=-2)
    at globus_callback_nothreads.c:1430
#19 0x0804e7b6 in main ()

When I ran the code under valgrind, it complained that the comparison at
globus_gram_protocol_io.c:1707 (in
globus_l_gram_protocol_free_old_credentials()) was comparing unitiailized data.
Further poking with gdb revealed that the globus_io_tcp_get_credential() call
on the previous line was returning failure and cur_cred was unmodified. Thus,
it appears the function falsely concluded that the old credential was not in
use when it in fact still was in use, and freed it. Later, when the xio layer
tries to use the credential for the gram_ping, the memory has been reused and
CRASH.

As for why globus_io_tcp_get_credential() failed, here's a stack trace where
the error is generated:
#0  globus_l_xio_gsi_cntl (driver_specific_handle=0x0, cmd=1, 
    ap=0xbffff74c "p÷ÿ¿±@\207") at globus_xio_gsi.c:3169
#1  0x08072051 in globus_i_xio_driver_handle_cntl (context=0x822ea28, 
    start_ndx=0, driver=0x8228d10, cmd=1, ap=0xbffff74c "p÷ÿ¿±@\207")
    at globus_xio_driver.c:1533
#2  0x0806b9c9 in globus_xio_handle_cntl (handle=0x822c018, driver=0x8228d10, 
    cmd=1) at globus_xio_handle.c:2662
#3  0x0806266e in globus_io_tcp_get_credential (handle=0x822dce8, 
    credential=0xbffff770) at globus_io_xio_compat.c:4230
#4  0x0805679d in globus_l_gram_protocol_free_old_credentials ()
    at globus_gram_protocol_io.c:1706
#5  0x080568ad in globus_gram_protocol_set_credentials (
    new_credentials=0x8241900) at globus_gram_protocol_io.c:1763
#6  0x0804ef5e in globus_gram_client_set_credentials (
    new_credentials=0x8241900) at globus_gram_client.c:699
#7  0x0804e4ea in handle_refresh_proxy_from_file ()
#8  0x0804e5b2 in service_commands ()
#9  0x0804e694 in main ()

The NULL value for driver_specific_handle in globus_l_xio_gsi_cntl() is the
trigger of the error. I don't know xio well enough to trace back why it's NULL.
------- Comment #1 From 2006-09-07 20:12:23 -------
Created an attachment (id=1040) [details]
Code that provokes the crash
------- Comment #2 From 2006-09-07 20:15:13 -------
example2.c is the Condor-G gahp server boiled down to the code path that
provokes the crash. I can further simplify it by having main() call
service_commands() directly and adding a while loop around the switch statement
in service_commands() that also calls globus_poll_nonblocking(). That changes
where the crash occurs, though I believe the cause is the same.
------- Comment #3 From 2006-09-19 11:06:49 -------
This looks like the same issue as bug #4620. Please try the patch I attached to
that one.
------- Comment #4 From 2006-09-19 11:50:35 -------
That does the trick for my example code. I'll apply it to our full application
and run it for a few days to see what happens.
------- Comment #5 From 2006-10-02 18:58:40 -------
I'm assuming the patch works and this is actually a duplicate of bug 4620. I
resolved it appropriately. If this turns out not to be the case, feel free to
reopen it.

*** This bug has been marked as a duplicate of 4620 ***