Bugzilla – Bug 2219
xio_close in an xio callback hangs
Last modified: 2005-04-02 11:25:04
You need to log in before you can comment on or make changes to this bug.
If the C service container crashes while handling a request without sending a response, the client is never given a fault that something bad has occurred and the blocking operation request never returns. How to duplicate: in a debugger, break in an operation implementation, when the breakpoint is triggered, kill the process. (Or do bug #2217 type things to crash the container :)
Setting the following envs on the client and sighupping the server during an invoke gives me a bunch of stack traces. As far as I can tell, the server isn't writing anything, but the globus_l_soap_message_response_ready_callback func still gets called with a 200 response code (and result == GLOBUS_SUCCESS). So the message code is still expecting to be able to register read, which it does, but the callback for that register never gets called. So it looks like the http driver isn't passing along the EOF result from the xio system to the message callbacks. GLOBUS_SOAP_MESSAGE_DEBUG='WARN|TRACE|MESSAGES|DEBUG' GLOBUS_XIO_BUFFER_DEBUG=ALL GLOBUS_XIO_DEBUG=ALL GLOBUS_XIO_SYSTEM_DEBUG=ALL The output on the client after the sighup is: [globus_l_xio_system_poll] After select [globus_l_xio_system_handle_read] fd=6, Entering [globus_l_xio_system_try_read] fd=6, Entering [globus_l_xio_system_try_read] fd=6, Exiting with error [globus_l_xio_system_unregister_read] fd=6, Entering [globus_l_xio_system_unregister_read] fd=6, Exiting [globus_l_xio_system_handle_read] fd=6, Exiting [globus_l_xio_system_poll] Exiting [globus_l_xio_system_kickout] fd=6, Entering [globus_xio_driver_finished_read] I Entering [globus_xio_driver_finished_read:1137] Context @ 0x846df44 state change: From:GLOBUS_XIO_CONTEXT_STATE_OPEN to: GLOBUS_XIO_CONTEXT_STATE_EOF_RECEIVED [globus_l_xio_driver_op_read_kickout] I Entering [globus_l_soap_message_response_ready_callback] Entering response code: 200 [globus_l_soap_message_response_callback] Entering [globus_xio_handle_cntl] Entering [globus_i_xio_driver_handle_cntl] Entering [globus_l_xio_buffer_cntl] Entering [globus_l_xio_buffer_cntl] Exiting [globus_i_xio_driver_handle_cntl] Exiting [globus_xio_handle_cntl] Exiting [globus_xio_register_read] Entering [globus_xio_register_read:2213] Op @ 0x846faa0 ref increased to 1: [globus_l_xio_register_readv] I Entering [globus_l_xio_register_readv] : inserting read op @ 0x846faa0 [globus_l_xio_register_readv:1634] Op @ 0x846faa0 ref increased to 2: [globus_xio_driver_pass_read] I Entering [globus_l_xio_buffer_read] Entering [globus_xio_driver_pass_read] I Entering [globus_xio_driver_pass_read:1057] Op @ 0x846faa0 ref decreased to 5: [globus_xio_driver_pass_read] I Exiting [globus_l_xio_buffer_read] Exiting [globus_xio_driver_pass_read:1057] Op @ 0x846faa0 ref decreased to 4: [globus_xio_driver_pass_read] I Exiting [globus_l_xio_register_readv:1644] Op @ 0x846faa0 ref decreased to 3: [globus_l_xio_register_readv] I Exiting [globus_xio_register_read] Exiting [globus_l_soap_message_response_callback] Exiting [globus_l_soap_message_response_ready_callback] Exiting [globus_xio_driver_read_delivered] I Entering [globus_xio_driver_read_delivered:1399] Op @ 0x846f8c0 ref decreased to 1: [globus_xio_driver_read_delivered:1413] Context @ 0x846df44 state change: From:GLOBUS_XIO_CONTEXT_STATE_EOF_RECEIVED to: GLOBUS_XIO_CONTEXT_STATE_EOF_DELIVERED [globus_xio_driver_read_delivered]: All eof ops delivered [globus_xio_driver_read_delivered:1439] Context @ 0x846df44 state change: From:GLOBUS_XIO_CONTEXT_STATE_EOF_DELIVERED to: GLOBUS_XIO_CONTEXT_STATE_OPEN [globus_l_xio_driver_purge_read_eof] I Entering [globus_l_xio_driver_purge_read_eof] I Exiting [globus_xio_driver_read_delivered] : Context @ 0x846df44 State=2 Count=0 close_start=0 [globus_xio_driver_read_delivered] I Exiting [globus_l_xio_driver_op_read_kickout] I Exiting [globus_xio_driver_finished_read] I Exiting [globus_l_xio_system_kickout] fd=6, Exiting [globus_l_xio_system_poll] Entering [globus_l_xio_system_poll] Before select
Joe: I'm guessing this is a HTTP driver bug. Can you look at it further and get back to me if its not?
Created an attachment (id=471) [details] transport-fixes.diff It looks like there are a couple of issues here. The attached patch gets things working somewhat better (the error returned from http is not being ignored) but the process still hangs. I think what is going now is that xio_close is being called from globus_soap_message_handle_destroy() in the response callback stack before the xio operation would finished after that callback returns. I think I'd be more comfortable with the destroy happening in a oneshot instead of changing the xio driver---is that possible? Otherwise I'll have to think hard about what the consequences of rearranging some of the XIO driver code will be.
You are allowed to do a blocking close in any xio callback. While we normally don't 'approve' of making blocking calls anywhere, this should still work and John B is working on a fix.
Created an attachment (id=545) [details] patch for globus xio
I have added a patch for xio that seem to be good. It passes the nonthreaded test suite without any leaks. However it is a fairly large change so it may be best to wait until after the release.
For 4.0 you should probably have a work around for the WS code. I think if you juse give globus_xio_register_close() a NULL callback instead of using globus_xio_close() you should be ok.
Committed fixes to http driver (new bug related to this) and to use the nonblocking close (as per John's last message). The hang symptom is gone with these fixes. The client gets an error similar to this: ERROR: globus_soap_message_module: Failed receiving response CounterPortType_add. globus_soap_message_module: SOAP Message transport failed: Error in HTTP response globus_xio: An end of file occurred I'm going to pitch this one back to John so that he can resolve the blocking close XIO problem which this bug revealed.