Bugzilla – Bug 4191
globusrun-ws job submission hangs
Last modified: 2006-10-03 09:05:03
You need to log in before you can comment on or make changes to this bug.
Here is a report from Charles and Tony Vu for a TeraGrid install. -Stu New kind of failure on an SDSC node. The job isn't even being run, unlike the UC and IU cases we saw. The relevant line in the container log appears to be 2006-02-01 16:14:59,887 ERROR exec.RunQueue [Thread-20,run:162] Unable to process state transition. Charles Begin forwarded message: From: Tony Vu <tonyv@sdsc.edu> Date: February 1, 2006 6:24:31 PM CST To: Charles Bacon <bacon@mcs.anl.gov> Subject: Re: globusrun-ws job submission hangs Yes.. same case if I try /tmp/touched_it. I've attached a fresh container.log file that contains only the messages since the last globusrun-ws I performed after turning on debugging for GRAM and restarting. There are several "User Cancel" request messages in there probably from previous validation jobs I submitted. I'm not sure how to clear those out of the container "queue" . Thanks for your help.? On Feb 1, 2006, at 4:08 PM, Charles Bacon wrote: On Feb 1, 2006, at 5:57 PM, Tony Vu wrote: Thanks for the reply, Charles. The file isn't being touched either actually.. tonyv@tg-login1:~> globusrun-ws -submit -F $CONTACT -c /bin/touch touched_it Submitting job...Done. Job ID: uuid:cb18a804-937d-11da-ab98-0007e9d81263 Termination time: 02/02/2006 23:52 GMT Current job state: Unsubmitted tonyv@tg-login1:~> ls -l ~tonyv/touched_it ls: /users/tonyv/touched_it: No such file or directory tonyv@tg-login1:~>
Created an attachment (id=838) [details] Container log
Part of the problem here is that the 4.0.1 code has some error reporting problems. The branch code does a better job of this. We should really get that community branch up and running so we can have the latest bug fixes included. As for the actual problem, it could be anything. I had the same problem with OSG earlier and it was just a permissions problem with the $GLOBUS_LOCATION/tmp directory.
Peter - Can you merge the updated ws-gram/service/java and ws-gram/utils/source/java into the globus_4_0_community branch? This is where we'll deliver the TG code from, it would be good to have the better error reporting merged in.
Ok, those two ws-gram sub dirs are merged. Is there any reason we aren't merging the rest of them? I thought that was part of the plan with the community branch.
I'm led to believe this was resolved as an NFS issue. Resolving as INVALID since it appears no code changes affected the outcome.