Bug 2219 - xio_close in an xio callback hangs
: xio_close in an xio callback hangs
Status: ASSIGNED
: XIO
Globus XIO
: development
: Other Linux
: P3 normal
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2004-11-05 15:32 by
Modified: 2005-04-02 11:25 (History)


Attachments
transport-fixes.diff (2.14 KB, patch)
2004-12-01 10:13, Joe Bester
Details
patch for globus xio (34.10 KB, patch)
2005-03-23 02:05, John Bresnahan
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2004-11-05 15:32:48
If the C service container crashes while handling a request without sending a 
response, the client is never given a fault that something bad has occurred 
and the blocking operation request never returns. 
 
How to duplicate: in a debugger, break in an operation implementation, when 
the breakpoint is triggered, kill the process. (Or do bug #2217 type things to 
crash the container :)
------- Comment #1 From 2004-11-11 03:42:38 -------
Setting the following envs on the client and sighupping the server during an
invoke gives me a bunch of stack traces.  As far as I can tell, the server isn't
 writing anything, but the globus_l_soap_message_response_ready_callback func
still gets called with a 200 response code (and result == GLOBUS_SUCCESS).  So
the message code is still expecting to be able to register read, which it does,
 but the callback for that register never gets called.  So it looks like the
http driver isn't passing along the EOF result from the xio system to the
message callbacks.

GLOBUS_SOAP_MESSAGE_DEBUG='WARN|TRACE|MESSAGES|DEBUG'
GLOBUS_XIO_BUFFER_DEBUG=ALL
GLOBUS_XIO_DEBUG=ALL
GLOBUS_XIO_SYSTEM_DEBUG=ALL

The output on the client after the sighup is:

[globus_l_xio_system_poll] After select
[globus_l_xio_system_handle_read] fd=6, Entering
[globus_l_xio_system_try_read] fd=6, Entering
[globus_l_xio_system_try_read] fd=6, Exiting with error
[globus_l_xio_system_unregister_read] fd=6, Entering
[globus_l_xio_system_unregister_read] fd=6, Exiting
[globus_l_xio_system_handle_read] fd=6, Exiting
[globus_l_xio_system_poll] Exiting
[globus_l_xio_system_kickout] fd=6, Entering
[globus_xio_driver_finished_read] I Entering
[globus_xio_driver_finished_read:1137] Context @ 0x846df44 state change:
    From:GLOBUS_XIO_CONTEXT_STATE_OPEN
    to:  GLOBUS_XIO_CONTEXT_STATE_EOF_RECEIVED
[globus_l_xio_driver_op_read_kickout] I Entering
[globus_l_soap_message_response_ready_callback] Entering
response code: 200
[globus_l_soap_message_response_callback] Entering
[globus_xio_handle_cntl] Entering
[globus_i_xio_driver_handle_cntl] Entering
[globus_l_xio_buffer_cntl] Entering
[globus_l_xio_buffer_cntl] Exiting
[globus_i_xio_driver_handle_cntl] Exiting
[globus_xio_handle_cntl] Exiting
[globus_xio_register_read] Entering
[globus_xio_register_read:2213] Op @ 0x846faa0 ref increased to 1:
[globus_l_xio_register_readv] I Entering
[globus_l_xio_register_readv] : inserting read op @ 0x846faa0
[globus_l_xio_register_readv:1634] Op @ 0x846faa0 ref increased to 2:
[globus_xio_driver_pass_read] I Entering
[globus_l_xio_buffer_read] Entering
[globus_xio_driver_pass_read] I Entering
[globus_xio_driver_pass_read:1057] Op @ 0x846faa0 ref decreased to 5:
[globus_xio_driver_pass_read] I Exiting
[globus_l_xio_buffer_read] Exiting
[globus_xio_driver_pass_read:1057] Op @ 0x846faa0 ref decreased to 4:
[globus_xio_driver_pass_read] I Exiting
[globus_l_xio_register_readv:1644] Op @ 0x846faa0 ref decreased to 3:
[globus_l_xio_register_readv] I Exiting
[globus_xio_register_read] Exiting
[globus_l_soap_message_response_callback] Exiting
[globus_l_soap_message_response_ready_callback] Exiting
[globus_xio_driver_read_delivered] I Entering
[globus_xio_driver_read_delivered:1399] Op @ 0x846f8c0 ref decreased to 1:
[globus_xio_driver_read_delivered:1413] Context @ 0x846df44 state change:
    From:GLOBUS_XIO_CONTEXT_STATE_EOF_RECEIVED
    to:  GLOBUS_XIO_CONTEXT_STATE_EOF_DELIVERED
[globus_xio_driver_read_delivered]: All eof ops delivered
[globus_xio_driver_read_delivered:1439] Context @ 0x846df44 state change:
    From:GLOBUS_XIO_CONTEXT_STATE_EOF_DELIVERED
    to:  GLOBUS_XIO_CONTEXT_STATE_OPEN
[globus_l_xio_driver_purge_read_eof] I Entering
[globus_l_xio_driver_purge_read_eof] I Exiting
[globus_xio_driver_read_delivered] : Context @ 0x846df44 State=2 Count=0
close_start=0
[globus_xio_driver_read_delivered] I Exiting
[globus_l_xio_driver_op_read_kickout] I Exiting
[globus_xio_driver_finished_read] I Exiting
[globus_l_xio_system_kickout] fd=6, Exiting
[globus_l_xio_system_poll] Entering
[globus_l_xio_system_poll] Before select
------- Comment #2 From 2004-11-30 23:45:25 -------
Joe: I'm guessing this is a HTTP driver bug.  Can you look at it further and
get
back to me if its not?
------- Comment #3 From 2004-12-01 10:13:38 -------
Created an attachment (id=471) [details]
transport-fixes.diff

It looks like there are a couple of issues here. The attached patch gets things
working somewhat better (the error returned from http is not being ignored) but
the process still hangs. I think what is going now is that xio_close is being
called from globus_soap_message_handle_destroy() in the response callback stack
before the xio operation would finished after that callback returns.

I think I'd be more comfortable with the destroy happening in a oneshot instead
of changing the xio driver---is that possible? Otherwise I'll have to think
hard about what the consequences of rearranging some of the XIO driver code
will be.
------- Comment #4 From 2004-12-01 12:20:47 -------
You are allowed to do a blocking close in any xio callback.  While we normally 
don't 'approve' of making blocking calls anywhere, this should still work and 
John B is working on a fix.
------- Comment #5 From 2005-03-23 02:05:39 -------
Created an attachment (id=545) [details]
patch for globus xio
------- Comment #6 From 2005-03-23 02:09:17 -------
I have added a patch for xio that seem to be good.  It passes the nonthreaded
test suite without any leaks.  However it is a fairly large change so it may be
best to wait until after the release.
------- Comment #7 From 2005-03-23 13:02:39 -------
For 4.0 you should probably have a work around for the WS code.  I think if you
juse give globus_xio_register_close() a NULL callback instead of using
globus_xio_close() you should be ok.
------- Comment #8 From 2005-03-25 14:04:00 -------
Committed fixes to http driver (new bug related to this) and to use the
nonblocking close (as per John's last message). The hang symptom is gone with
these fixes. The client gets an error similar to this:
ERROR:

globus_soap_message_module: Failed receiving response CounterPortType_add.
globus_soap_message_module: SOAP Message transport failed: Error in HTTP response
globus_xio: An end of file occurred

I'm going to pitch this one back to John so that he can resolve the blocking
close XIO problem which this bug revealed.