Bugzilla – Bug 6028
globusrun-ws fails in termination in special situation
Last modified: 2008-04-30 15:53:29
You need to log in before you can comment on or make changes to this bug.
I get the following in the situation where i submit an interactive job using globusrun-ws, shutdown the container on the server-side after the job is started and restart the container immediately. the cleanup and done notifications are sent by the restarted container. [martin@osg-test1 ~]$ globusrun-ws -submit -c /bin/sleep 30 Submitting job...Done. Job ID: uuid:60930abc-0fa7-11dd-95bc-0013d4c3b957 Termination time: 04/21/3008 13:32 GMT Current job state: Pending Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Failed. globusrun-ws: Unable to destroy job: globus_i_kill.c::88: Error destroying job globus_soap_client_request.c::696: Failed sending request http://www.globus.org/namespaces/2008/03/gram/job/terminate. globus_i_xio_system_common.c:globus_i_xio_system_try_writev:539: System error in writev: Broken pipe globus_xio: A system call failed: Broken pipe No faults in the server-side container log. It works fine if i use batch mode for submission and kill the job later.
I've updated globus_xio's http driver to indicate persistent connection drop failures when the occur, globus_c_ws_messaging to have an attribute to automatically retry when those faults occur, and updated globusrun-ws to use that attribute. It's committed to trunk.