Bugzilla – Bug 3757
Remote exit status 0 ambiguity
Last modified: 2006-03-01 14:43:36
You need to
before you can comment on or make changes to this bug.
This may be a question or an enhancement request. I am trying to come to grips
with the remote application exit code propagation. It appears to me that an exit
code 0 from globusrun-ws is ambiguous: It could either mean that the remote
application has exited with 0, or that there was no exit code while no severe
error was disovered.
For the GT4rc I did the following ugly kludge: My staged scripts explitely exit
with code 42 to indicate success. If I get an exit code of 0, I know that
something went wrong. I find the necessity to do so ugly. Other users will
stumble over this, too.
If you are still puzzled: My staged shell scripts should return <> 0 in case of
any error. However, sometimes this does not appear to be propagated correctly,
and grun-ws returns as if nothing happened. Maybe it's been fixed in the
meantime, and it is not easy to elicit, but it did happen, or I wouldn't have
had to program my kludge around it.
I'm the first to admit that I don't know how to solve this well. However, if no
remote exit code is available, I prefer *not* to see 0. If there is no remote
exit code, I don't trust that everything went well, and I want to branch into my
"replanning branch". Without my kludge above, an exit code of 0 gives a false or
at least misleading sense of security.
I could imagine a translation mode. Usually, I am quite happy to distinguish
between just success and various flavors of failure. How about a CLI option
which activates translation mode:
not avail. 3
Of course, the grun-ws exitcode must still consider local failures.
Another alternative, once signals are incorporated, is to write the Unix wait()
status into a file, specified as CLI option to grun-ws. If the file is empty, I
know that something went wrong (nothing available). In that case, the exit
status from grun-ws itself just deals with local (submit machine) failures.
I've updated some script situations where exit codes weren't getting
I think all of our scripts handle this properly (to push the exit code to the
scheduler). If there are any cases were they aren't we can take a look at the
scheduler logs and see if there's a specific script or SEG issue we can debug.