Bugzilla – Bug 2524
Bad invalid script response error
Last modified: 2005-01-27 13:19:38
You need to
before you can comment on or make changes to this bug.
Submission ID: uuid:ba911c50-6007-11d9-bf6b-8c05ad099d09
WAITING FOR JOB TO FINISH
========== State Notification ==========
Job State: Failed
Exit Code: 0
fault type: org.globus.exec.generated.FaultType:
org.globus.exec.generated.FaultType: The job manager detected an invalid script
Timestamp: Thu Jan 06 09:24:12 PST 2005
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
The job manager detected an invalid script response
All the important parts of the real error is missing. For the user, the most
important part of information is what the scheduler said, so they can correct
their RSL. For example, if the reason the submit failed is that the user is not
allowed to submit to that queue, and the scheduler outputs a error saying so, it
should be shown to the user.
Also the timestamp is in an ugly format.
I think this is an important issue we ought to fix soon (before 3.9.5?)
so we don't get people ask us what's wrong whenever a job fail!
1) saying the script fails is akin to implementation leaking to the client/user.
This message should only be in a log message.
2) As Mats points out, we must be able to tell the user/client side,
within the error XML structure, wether what failed was:
a) the submission (i.e. enqueuing the job description to the scheduler
with whatever parameters were translated from RSL).
b) the job application itself (i.e the "/bin/echo Hello" etc...)
If a), then we must provide a description of which parameter of the job
description, or which scheduler policy, was violated.
- "the user might is allowed to submit to queue xyz"
- "max wall time supplied for this job goes beyond possible value range"
As the submission command spits back the submission error(s) right
away, we should parse its stderr in order to extract the failure information,
or, if too complex for now (involves dedicated processing depending on the
scheduler/submission command), we could at least put the error message
in the complex XML structure sent back inside the SOAP error.
I agree, this should be fixed for 3.9.5. The perl submit routine traps the
output from the scheduler
submission attempts, so i think it is just a matter of figuring out how this
error detail is passed along in
an exception that can then be output by globusrun-ws.
I thought the code was already fetching the script errors. It's possible that
the script crashed and left
no output. I'll double check that the code is doing the right thing with the
Testing using an invalid queue name with PBS:
[rynge@devrandom tmp]$ globusrun-ws -submit -J -S -factory
-factory-type PBS -job-description-file test.xml
Delegating user credentials...Done.
Job ID: uuid:ea88d150-6016-11d9-81bc-0008744f939a
Termination time: 01/07/2005 19:12 GMT
Current job state: Failed
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: The executable could not be started.
Using m-j-g gives the same error as the first comment in the bug report.
I added a statement to lib/perl/Globus/GRAM/JobManager/pbs.pm to copy the job
description file to /tmp. Using that description and qsub gives me the right error:
globus@viz-9:~$ qsub /tmp/bla.27692
qsub: Unknown queue
Using another cp statement for the error file, I made sure that the perl script
catches the right error:
globus@viz-9:~$ cat /tmp/err.27997
qsub: Unknown queue
I comitted an untested fix to the trunk. Mats has a PBS installation that he
going to test with, so I am reassigning it to him for testing.
I'm just going to close this bug since it blocks #2620. Please reopen it if
your tests still fail or mark it verified if they pass. Thanks.