Bugzilla – Bug 3449
ERROR container.GSIServiceThread
Last modified: 2005-06-22 17:16:29
You need to log in before you can comment on or make changes to this bug.
During intense workflows, I occasionally find the following error message in my container.log (see: http://griodine.uchicago.edu/ivdgl1/400x1x7200/run0004/): 2005-06-01 13:20:12,132 ERROR container.GSIServiceThread [Thread-80,process:120] Error processing request java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:160) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:91) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:300) While the workflows runs to successful completion, the above message may be an indication for some untimely delays we are seeing. I cannot tell, if the client was globusrun-ws or Condor-G, though. The Condor-G client logs are encapsulated inside GridmanagerLog.voeckler. globusrun-ws messages are unfortunately distributed over */ID*.dbg. The GT checkout is from last Friday.
should this bugs priority be escalated? If this is the reason for the occasional client delay that jens gets when running his workflow tests, then i think it should be. The background is that ~ 1 out of 10 jobs takes > 5 minutes to execute. For one of the jobs we were able to see the message delivered to the client-side (turning on log4j MessageLoggingHandler=DEBUG), but the notification did not seem to be recognized/received by the client. Thoughts?
I understand from Peter that he is experimenting with a new Axis jar Jarek is working on that might fix this issue (The jar basically fixes issues when large number of notifications are send). Both Peter and Ravi report these exceptions with large notification. Once I hear from Peter, if there are more issues I'll look into it.
Rachana, what are you waiting to get from me? I thought I had said on the mud that I had not seen this since the axis.jar update. Should this be reassigned to Jarek?
I understand from jarek that the jar has been committed to trunk and 4.0 brancj If the jar update fixes it, we can close the bug. Jens, can you update to latest jar and confirm this fixes the issue for you ?
Btw, we were debugging this problem with Peter. I am almost certain it is caused by a timed out exception on the other side of the connection. However, I was not able to pin point exactly why the time outs happened in the first place. This still might need to be debugged fruther.
I was given to understand that since the axis update (and hence fixes to delays in notifications) fixed this issue. I am reopening the bug, but is this something to investigate with core/notifications or security framework ?
I do not know where the problem is (if there is one really) but I don't think it's a security-specific issue. This error might be perfectly ok... just needs to be investiagated more to understand the real issue. The axis update might have just hidden the issue a bit more.