Bug 3757 - Remote exit status 0 ambiguity
: Remote exit status 0 ambiguity
Status: RESOLVED
: GRAM
wsrf gram clients
: development
: All All
: P3 enhancement
: 4.0.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-09-16 10:54 by
Modified: 2006-03-01 14:43 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-09-16 10:54:53
This may be a question or an enhancement request. I am trying to come to grips
with the remote application exit code propagation. It appears to me that an exit
code 0 from globusrun-ws is ambiguous: It could either mean that the remote
application has exited with 0, or that there was no exit code while no severe
error was disovered.

For the GT4rc I did the following ugly kludge: My staged scripts explitely exit
with code 42 to indicate success. If I get an exit code of 0, I know that
something went wrong. I find the necessity to do so ugly. Other users will
stumble over this, too.

If you are still puzzled: My staged shell scripts should return <> 0 in case of
any error. However, sometimes this does not appear to be propagated correctly,
and grun-ws returns as if nothing happened. Maybe it's been fixed in the
meantime, and it is not easy to elicit, but it did happen, or I wouldn't have
had to program my kludge around it. 

I'm the first to admit that I don't know how to solve this well.  However, if no
remote exit code is available, I prefer *not* to see 0. If there is no remote
exit code, I don't trust that everything went well, and I want to branch into my
"replanning branch". Without my kludge above, an exit code of 0 gives a false or
at least misleading sense of security.  

I could imagine a translation mode. Usually, I am quite happy to distinguish
between just success and various flavors of failure. How about a CLI option
which activates translation mode: 

remote_ec      gr-ws_ec
0              0
status<>0      1
signal         2
not avail.     3

Of course, the grun-ws exitcode must still consider local failures. 

Another alternative, once signals are incorporated, is to write the Unix wait()
status into a file, specified as CLI option to grun-ws. If the file is empty, I
know that something went wrong (nothing available). In that case, the exit
status from grun-ws itself just deals with local (submit machine) failures.
------- Comment #1 From 2006-03-01 14:43:36 -------
I've updated some script situations where exit codes weren't getting
propagated.
I think all of our scripts handle this properly (to push the exit code to the
scheduler). If there are any cases were they aren't we can take a look at the
scheduler logs and see if there's a specific script or SEG issue we can debug.