Bug 4191 - globusrun-ws job submission hangs
: globusrun-ws job submission hangs
Status: RESOLVED INVALID
: GRAM
wsrf managed execution job service
: 4.0.1
: Macintosh All
: P3 normal
: 4.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-02-02 00:19 by
Modified: 2006-10-03 09:05 (History)


Attachments
Container log (128.12 KB, text/plain)
2006-02-02 00:22, Stuart Martin
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-02-02 00:19:20
Here is a report from Charles and Tony Vu for a TeraGrid install.

-Stu

New kind of failure on an SDSC node.  The job isn't even being run, unlike the UC and IU cases we saw.  
The relevant line in the container log appears to be
2006-02-01 16:14:59,887 ERROR exec.RunQueue [Thread-20,run:162] Unable to process state 
transition.

Charles

Begin forwarded message:

From: Tony Vu <tonyv@sdsc.edu>
Date: February 1, 2006 6:24:31 PM CST
To: Charles Bacon <bacon@mcs.anl.gov>
Subject: Re: globusrun-ws job submission hangs

Yes.. same case if I try /tmp/touched_it.

I've attached a fresh container.log file that contains only the messages since the last globusrun-ws I 
performed after turning on debugging for GRAM and restarting.  There are several "User Cancel" 
request messages in there probably from previous validation jobs I submitted.  I'm not sure how to 
clear those out of the container "queue" .

Thanks for your help.?
On Feb 1, 2006, at 4:08 PM, Charles Bacon wrote:

On Feb 1, 2006, at 5:57 PM, Tony Vu wrote:

Thanks for the reply, Charles.  The file isn't being touched either actually..

tonyv@tg-login1:~> globusrun-ws -submit -F $CONTACT -c /bin/touch touched_it
Submitting job...Done.
Job ID: uuid:cb18a804-937d-11da-ab98-0007e9d81263
Termination time: 02/02/2006 23:52 GMT
Current job state: Unsubmitted

tonyv@tg-login1:~> ls -l ~tonyv/touched_it
ls: /users/tonyv/touched_it: No such file or directory
tonyv@tg-login1:~>
------- Comment #1 From 2006-02-02 00:22:35 -------
Created an attachment (id=838) [details]
Container log
------- Comment #2 From 2006-02-02 00:47:35 -------
Part of the problem here is that the 4.0.1 code has some error reporting
problems. The branch code does 
a better job of this. We should really get that community branch up and running
so we can have the latest 
bug fixes included.

As for the actual problem, it could be anything. I had the same problem with
OSG earlier and it was just a 
permissions problem with the $GLOBUS_LOCATION/tmp directory.
------- Comment #3 From 2006-02-02 16:13:42 -------
Peter -

Can you merge the updated ws-gram/service/java and ws-gram/utils/source/java into the 
globus_4_0_community branch?  This is where we'll deliver the TG code from, it would be good to have the 
better error reporting merged in.
------- Comment #4 From 2006-02-02 16:56:08 -------
Ok, those two ws-gram sub dirs are merged. Is there any reason we aren't
merging
the rest of them? I thought that was part of the plan with the community
branch.
------- Comment #5 From 2006-10-03 09:05:03 -------
I'm led to believe this was resolved as an NFS issue. Resolving as INVALID
since it appears no code changes affected the outcome.