Bugzilla – Bug 4689
gram client crashes when credential changed during active connections
Last modified: 2006-10-02 18:58:40
You need to log in before you can comment on or make changes to this bug.
I am able to reliably reproduce a crash of the pre-ws gram client code by calling globus_gram_client_set_credentials() while a gram ping command is active (initiated by globus_gram_client_register_ping()). It appears to be caused by a bug in the globus_gram_protocol module and a probable bug somewhere in globus_gram_protocol, globus_io, or globus_xio. I will post the code I'm using to reproduce the crash. Here's a stack trace of the crash point: #0 0x0814cf32 in X509_STORE_get_by_subject (vs=0xbffff350, type=1, name=0x8229648, ret=0xbffff290) at x509_lu.c:290 #1 0x0814d624 in X509_STORE_CTX_get1_issuer (issuer=0xbffff2e8, ctx=0xbffff350, x=0x8246be8) at x509_lu.c:495 #2 0x0814837f in X509_verify_cert (ctx=0xbffff350) at x509_vfy.c:238 #3 0x080e5c37 in globus_gsi_callback_X509_verify_cert (context=0xbffff350, arg=0x0) at globus_gsi_callback.c:378 #4 0x0810e852 in ssl_verify_cert_chain (s=0x82303e0, sk=0x82298a8) at ssl_cert.c:487 #5 0x080fe4b1 in ssl3_get_server_certificate (s=0x82303e0) at s3_clnt.c:833 #6 0x080fd242 in ssl3_connect (s=0x82303e0) at s3_clnt.c:275 #7 0x0810d2d1 in SSL_do_handshake (s=0x82303e0) at ssl_lib.c:1826 #8 0x08114046 in ssl_ctrl (b=0x822e7b0, cmd=101, num=0, ptr=0x0) at bio_ssl.c:417 #9 0x08123564 in BIO_ctrl (b=0x822e7b0, cmd=101, larg=0, parg=0x0) at bio_lib.c:324 #10 0x080d050b in globus_i_gsi_gss_handshake (minor_status=0xbffff704, context_handle=0x822e8f8) at globus_i_gsi_gss_utils.c:822 #11 0x080cb650 in gss_init_sec_context (minor_status=0xbffff7a4, initiator_cred_handle=0x82298a8, context_handle_P=0x8231010, target_name=0x822df68, mech_type=0x0, req_flags=34, time_req=0, input_chan_bindings=0x0, input_token=0xbffff790, actual_mech_type=0x823101c, output_token=0xbffff798, ret_flags=0x8231004, time_rec=0x8231008) at init_sec_context.c:178 #12 0x0808586d in globus_l_xio_gsi_read_token_cb (op=0x8230e68, result=0, nbytes=2992, user_arg=0x8231000) at globus_xio_gsi.c:1136 #13 0x0806fcf6 in globus_l_xio_driver_op_read_kickout (user_arg=0x8230e68) at globus_xio_driver.c:620 #14 0x0807e51e in globus_xio_driver_finished_read (in_op=0x8230e68, result=0, nbytes=2992) at globus_xio_pass.c:1227 #15 0x080a532a in globus_l_xio_tcp_finish_read (handle=0x8229088, result=0, nbytes=2992) at globus_xio_tcp_driver.c:2196 #16 0x080a53b3 in globus_l_xio_tcp_system_read_cb (result=0, nbytes=2992, user_arg=0x8229088) at globus_xio_tcp_driver.c:2211 #17 0x080be763 in globus_l_xio_system_kickout (user_arg=0x822fe50) at globus_xio_system_select.c:1016 #18 0x081aa7f9 in globus_callback_space_poll (timestop=0x8200428, space=-2) at globus_callback_nothreads.c:1430 #19 0x0804e7b6 in main () When I ran the code under valgrind, it complained that the comparison at globus_gram_protocol_io.c:1707 (in globus_l_gram_protocol_free_old_credentials()) was comparing unitiailized data. Further poking with gdb revealed that the globus_io_tcp_get_credential() call on the previous line was returning failure and cur_cred was unmodified. Thus, it appears the function falsely concluded that the old credential was not in use when it in fact still was in use, and freed it. Later, when the xio layer tries to use the credential for the gram_ping, the memory has been reused and CRASH. As for why globus_io_tcp_get_credential() failed, here's a stack trace where the error is generated: #0 globus_l_xio_gsi_cntl (driver_specific_handle=0x0, cmd=1, ap=0xbffff74c "p÷ÿ¿±@\207") at globus_xio_gsi.c:3169 #1 0x08072051 in globus_i_xio_driver_handle_cntl (context=0x822ea28, start_ndx=0, driver=0x8228d10, cmd=1, ap=0xbffff74c "p÷ÿ¿±@\207") at globus_xio_driver.c:1533 #2 0x0806b9c9 in globus_xio_handle_cntl (handle=0x822c018, driver=0x8228d10, cmd=1) at globus_xio_handle.c:2662 #3 0x0806266e in globus_io_tcp_get_credential (handle=0x822dce8, credential=0xbffff770) at globus_io_xio_compat.c:4230 #4 0x0805679d in globus_l_gram_protocol_free_old_credentials () at globus_gram_protocol_io.c:1706 #5 0x080568ad in globus_gram_protocol_set_credentials ( new_credentials=0x8241900) at globus_gram_protocol_io.c:1763 #6 0x0804ef5e in globus_gram_client_set_credentials ( new_credentials=0x8241900) at globus_gram_client.c:699 #7 0x0804e4ea in handle_refresh_proxy_from_file () #8 0x0804e5b2 in service_commands () #9 0x0804e694 in main () The NULL value for driver_specific_handle in globus_l_xio_gsi_cntl() is the trigger of the error. I don't know xio well enough to trace back why it's NULL.
Created an attachment (id=1040) [details] Code that provokes the crash
example2.c is the Condor-G gahp server boiled down to the code path that provokes the crash. I can further simplify it by having main() call service_commands() directly and adding a while loop around the switch statement in service_commands() that also calls globus_poll_nonblocking(). That changes where the crash occurs, though I believe the cause is the same.
This looks like the same issue as bug #4620. Please try the patch I attached to that one.
That does the trick for my example code. I'll apply it to our full application and run it for a few days to see what happens.
I'm assuming the patch works and this is actually a duplicate of bug 4620. I resolved it appropriately. If this turns out not to be the case, feel free to reopen it. *** This bug has been marked as a duplicate of 4620 ***