Bug 4360 - globus-job-get-output bug prevents output delivery, PBS jobmanager affected. See also globus-job-clean, globus-job-cancel
: globus-job-get-output bug prevents output delivery, PBS jobmanager affected. ...
Status: RESOLVED LATER
: GRAM
gt2 Gram client
: 4.0.1
: All All
: P3 normal
: 4.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-04-19 19:31 by
Modified: 2012-09-12 13:24 (History)


Attachments
patch to gram/jobmanager/source directory (3.15 KB, patch)
2006-05-10 07:59, Joe Bester
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-04-19 19:31:41
The job submission line in the globus-job-get-output script has an error that
causes $GLOBUS_LOCATION to be evaluated in the context of the system the
gatekeeper is running on rather than the target compute resource - I edited as
follows to correct it:

$ diff globus-job-get-output.orig globus-job-get-output
184c184
<     "&(executable=\$(GLOBUS_LOCATION)/bin/globus-sh-exec)(arguments=-exec
\"file=\`\${bindir}/globus-gass-cache -query -t anExtraTag
x-gass-cache://${jobid}${stream}\`; if test -r \"\"\$file\"\" ; then ${command}
\$file ; else echo Invalid job id. 1>&2; fi\")"
---
>     "&(executable=\"\${GLOBUS_LOCATION}/bin/globus-sh-exec\")(arguments=-exec \"file=\`\${bindir}/globus-gass-cache -query -t anExtraTag x-gass-cache://${jobid}${stream}\`; if test -r \"\"\$file\"\" ; then ${command} \$file ; else echo Invalid job id. 1>&2; fi\")"

To get this to work with our PBS jobmanagers, I also had to change the
following line so that executables specified with a leading environment
variable reference do not get ./ prefixed to them:

#original: if ($description->executable() =~ m|^[^/]|) changed to

    if ($description->executable() =~ m|^[^/]| && $description->executable =~
m|^[^\$]|)
    {       
        $description->add('executable', './' . $description->executable());
    }
------- Comment #1 From 2006-04-21 17:34:23 -------
globus-job-cancel and globus-job-clean (identical scripts except in name) also
suffer from the same problem and should be patched:

# diff globus-job-clean.orig globus-job-clean.patched 
202c202
<      
myrsl="&(executable=\$(GLOBUS_LOCATION)/bin/globus-sh-exec)(arguments=-exec
\"bad=0; \$bindir/globus-gass-cache -cleanup-url x-gass-cache://${jobid}stdout
>/dev/null 2>/dev/null; if test \$? != 0; then bad=1; fi ;
\$bindir/globus-gass-cache -cleanup-url x-gass-cache://${jobid}stderr 
>/dev/null 2>/dev/null; if test \$? != 0; then bad=1; fi; echo \$bad;\")"
---
>       myrsl="&(executable=\"\${GLOBUS_LOCATION}/bin/globus-sh-exec\")(arguments=-exec \"bad=0; \$bindir/globus-gass-cache -cleanup-url x-gass-cache://${jobid}stdout >/dev/null 2>/dev/null; if test \$? != 0; then bad=1; fi ; \$bindir/globus-gass-cache -cleanup-url x-gass-cache://${jobid}stderr  >/dev/null 2>/dev/null; if test \$? != 0; then bad=1; fi; echo \$bad;\")"
------- Comment #2 From 2006-04-25 08:09:25 -------
Can you explain how this patch works? It looks to me like it is switching the
executable's path from being resolved using the GLOBUS_LOCATION RSL
substitution in the original code, to using the GLOBUS_LOCATION environment
variable in the job's environment in the modified code. I think that both the
GLOBUS_LOCATION environment variable and RSL substitution are both set based on
the -home argument to the job manager, so I'm wondering why this is helping.

joe
------- Comment #3 From 2006-04-25 13:55:03 -------
Subject: Re:  globus-job-get-output bug prevents output delivery, PBS
jobmanager affected. See also globus-job-clean, globus-job-cancel

Here's the scenario: User, Gatekeeper and target computing system all  
have different GLOBUS_LOCATION

On the user workstation,
GLOBUS_LOCATION=/usr/local/globus/globus-4.0.1

On the gatekeeper system,
GLOBUS_LOCATION=/usr/local/globus/globus-4.0.1-r3

On the target computing system,
GLOBUS_LOCATION=/usr/local/packages/tg/globus-4.0.1-r3

Using globus-job-get-output from the distribution, I get the  
following error returned:

[dsimmel@kaminari ~]$ globus-job-get-output.orig -r gt4- 
submit.psc.teragrid.org/jobmanager-rachel-pbs -out $myjob

DATE:              Tue Apr 25 14:07:43 2006
PBS JOB ID:        51592
$LOCAL:            /carson64a/local/51592
Execution host:    carson64a
Current directory: /usr/users/0/dsimmel
/var/spool/OpenPBS/mom_priv/jobs/51592.rache.SC: /usr/local/globus/ 
globus-4.0.1-r3/bin/globus-sh-exec: not found

- - - - -

The DATE, PBS JOB ID, $LOCAL, Execution host, and Current directory  
lines are returned by the compute platform (rachel) for every job  
submitted.

The error reflects interpretation of $(GLOBUS_LOCATION) in the RSL  
submitted in the context of the gatekeeper system, rather than the  
target compute platform, which is where the commands need to execute.  
This despite the fact that we force GLOBUS_LOCATION in our jobmanager  
script to match the path on the target computing platform. The PBS  
script that is generated by the jobmanager and submitted on the  
compute platform for the -get-output command looks like:

[root@gt4-submit tmp]# cat pbs.rachel.out.23779
#! /bin/sh
# PBS batch job script built by Globus job manager
#
#PBS -S /bin/sh
#PBS -N TG23779
#PBS -m n
#PBS -o /usr/users/0/dsimmel/.globus/job/gt4-submit.psc.teragrid.org/ 
23776.1145988460/stdout
#PBS -e /usr/users/0/dsimmel/.globus/job/gt4-submit.psc.teragrid.org/ 
23776.1145988460/stderr
#PBS -l nodes=1:ppn=1
X509_USER_PROXY="/usr/users/0/dsimmel/.globus/job/gt4- 
submit.psc.teragrid.org/23776.1145988460/x509_up";
export X509_USER_PROXY;
GLOBUS_LOCATION="/usr/local/packages/tg/globus-4.0.1-r3";
export GLOBUS_LOCATION;
GLOBUS_GRAM_JOB_CONTACT="https://gt4-submit.psc.teragrid.org: 
50037/23776/1145988460/";
export GLOBUS_GRAM_JOB_CONTACT;
GLOBUS_GRAM_MYJOB_CONTACT="URLx-nexus://gt4-submit.psc.teragrid.org: 
50038/";
export GLOBUS_GRAM_MYJOB_CONTACT;
HOME="/usr/users/0/dsimmel";
export HOME;
LOGNAME="dsimmel";
export LOGNAME;
LD_LIBRARY_PATH=;
export LD_LIBRARY_PATH;

#Source the Globus enviroment script
. /usr/local/packages/tg/globus-4.0.1-r3/etc/globus-user-env.sh
cd ${LOCAL}
export OMP_NUM_THREADS ${PBS_VPPN}
/usr/local/globus/globus-4.0.1-r3/bin/globus-sh-exec "-exec" "file=\`\ 
${bindir}/globus-gass-cache -query -t anExtraTag x-gass-cache:// 
https://gt4-submit.psc.teragrid.org:50037/23746/1145988277/stdout\`;  
if tes
t -r \"\$file\" ; then \${GLOBUS_SH_CAT-cat} \$file ; else echo  
Invalid job id. 1>&2; fi"  </dev/null

- - - - -

The patch I applied to the globus-job-get-output client is as follows:

[dsimmel@kaminari ~]$ diff $GLOBUS_LOCATION/bin/globus-job-get- 
output.orig $GLOBUS_LOCATION/bin/globus-job-get-output.patched
184c184
<     "&(executable=\$(GLOBUS_LOCATION)/bin/globus-sh-exec) 
(arguments=-exec \"file=\`\${bindir}/globus-gass-cache -query -t  
anExtraTag x-gass-cache://${jobid}${stream}\`; if test -r \"\"\$file 
\"\" ; then ${command} \$file ; else echo Invalid job id. 1>&2; fi\")"
---
 >     "&(executable=\"\${GLOBUS_LOCATION}/bin/globus-sh-exec\") 
(arguments=-exec \"file=\`\${bindir}/globus-gass-cache -query -t  
anExtraTag x-gass-cache://${jobid}${stream}\`; if test -r \"\"\$file 
\"\" ; then ${command} \$file ; else echo Invalid job id. 1>&2; fi\")"

- - - - -

When I run using the patched edition, we get:

[dsimmel@kaminari ~]$ globus-job-get-output.patched -r gt4- 
submit.psc.teragrid.org/jobmanager-rachel-pbs -out $myjob

DATE:              Tue Apr 25 14:39:41 2006
PBS JOB ID:        51593
$LOCAL:            /carson64a/local/51593
Execution host:    carson64a
Current directory: /usr/users/0/dsimmel

DATE:              Tue Apr 25 14:04:41 2006
PBS JOB ID:        51591
$LOCAL:            /carson64a/local/51591
Execution host:    carson64a
Current directory: /usr/users/0/dsimmel
Tue Apr 25 14:04:41 EDT 2006

- - - - -

The first DATE...Current directory is for the -get-output, the rest  
is the original job's stdout.

The PBS script in this case looks like:

[root@gt4-submit tmp]# cat pbs.rachel.out.23809
#! /bin/sh
# PBS batch job script built by Globus job manager
#
#PBS -S /bin/sh
#PBS -N TG23809
#PBS -m n
#PBS -o /usr/users/0/dsimmel/.globus/job/gt4-submit.psc.teragrid.org/ 
23806.1145990378/stdout
#PBS -e /usr/users/0/dsimmel/.globus/job/gt4-submit.psc.teragrid.org/ 
23806.1145990378/stderr
#PBS -l nodes=1:ppn=1
X509_USER_PROXY="/usr/users/0/dsimmel/.globus/job/gt4- 
submit.psc.teragrid.org/23806.1145990378/x509_up";
export X509_USER_PROXY;
GLOBUS_LOCATION="/usr/local/packages/tg/globus-4.0.1-r3";
export GLOBUS_LOCATION;
GLOBUS_GRAM_JOB_CONTACT="https://gt4-submit.psc.teragrid.org: 
50037/23806/1145990378/";
export GLOBUS_GRAM_JOB_CONTACT;
GLOBUS_GRAM_MYJOB_CONTACT="URLx-nexus://gt4-submit.psc.teragrid.org: 
50038/";
export GLOBUS_GRAM_MYJOB_CONTACT;
HOME="/usr/users/0/dsimmel";
export HOME;
LOGNAME="dsimmel";
export LOGNAME;
LD_LIBRARY_PATH=;
export LD_LIBRARY_PATH;

#Source the Globus enviroment script
. /usr/local/packages/tg/globus-4.0.1-r3/etc/globus-user-env.sh
cd ${LOCAL}
export OMP_NUM_THREADS ${PBS_VPPN}
${GLOBUS_LOCATION}/bin/globus-sh-exec "-exec" "file=\`\${bindir}/ 
globus-gass-cache -query -t anExtraTag x-gass-cache://https://gt4- 
submit.psc.teragrid.org:50037/23746/1145988277/stdout\`; if test -r  
\"\$file\" ; then \${GLOBUS_SH_CAT-cat} \$file ; else echo Invalid  
job id. 1>&2; fi"  </dev/null

- - - - -

In this case, GLOBUS_LOCATION does not get interpreted until the  
script is run on the target, and the right thing happens.

Note that I had to change a line in the PBS jobmanager to prevent it  
from prefixing the executable $0 with "./" for $0 not beginning with  
a /:

     if ($description->executable() =~ m|^[^/]| && $description- 
 >executable =~ m|^[^\$]|)
     {
         $description->add('executable', './' . $description- 
 >executable());
     }

- - - - -

Note that globus-job-clean (a.k.a. globus-job-cancel) also suffer  
from this problem, and work correctly if the RSL submitted passes  
through the literal \${GLOBUS_LOCATION} rather than the RSL $ 
(GLOBUS_LOCATION):

[dsimmel@kaminari ~]$ diff $GLOBUS_LOCATION/bin/globus-job-clean.orig  
$GLOBUS_LOCATION/bin/globus-job-clean.patched
202c202
<       myrsl="&(executable=\$(GLOBUS_LOCATION)/bin/globus-sh-exec) 
(arguments=-exec \"bad=0; \$bindir/globus-gass-cache -cleanup-url x- 
gass-cache://${jobid}stdout >/dev/null 2>/dev/null; if test \$? != 0;  
then bad=1; fi ; \$bindir/globus-gass-cache -cleanup-url x-gass- 
cache://${jobid}stderr  >/dev/null 2>/dev/null; if test \$? != 0;  
then bad=1; fi; echo \$bad;\")"
---
 >       myrsl="&(executable=\"\${GLOBUS_LOCATION}/bin/globus-sh-exec 
\")(arguments=-exec \"bad=0; \$bindir/globus-gass-cache -cleanup-url  
x-gass-cache://${jobid}stdout >/dev/null 2>/dev/null; if test \$? !=  
0; then bad=1; fi ; \$bindir/globus-gass-cache -cleanup-url x-gass- 
cache://${jobid}stderr  >/dev/null 2>/dev/null; if test \$? != 0;  
then bad=1; fi; echo \$bad;\")"

- - - - -

- Derek

On Apr 25, 2006, at 9:09 AM, bugzilla-daemon@mcs.anl.gov wrote:

> http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=4360
>
>
> bester@mcs.anl.gov changed:
>
>            What    |Removed                     |Added
> ---------------------------------------------------------------------- 
> ------
>            Severity|blocker                     |normal
>    Target Milestone|4.0.1                       |---
>
>
>
>
> ------- Comment #2 from bester@mcs.anl.gov  2006-04-25 08:09 -------
> Can you explain how this patch works? It looks to me like it is  
> switching the
> executable's path from being resolved using the GLOBUS_LOCATION RSL
> substitution in the original code, to using the GLOBUS_LOCATION  
> environment
> variable in the job's environment in the modified code. I think  
> that both the
> GLOBUS_LOCATION environment variable and RSL substitution are both  
> set based on
> the -home argument to the job manager, so I'm wondering why this is  
> helping.
>
> joe
>
>
>
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> You reported the bug, or are watching the reporter.

---
Derek Simmel <dsimmel@psc.edu>
Pittsburgh Supercomputing Center
(412) 268-1035
------- Comment #4 From 2006-05-10 07:59:57 -------
Created an attachment (id=947) [details]
patch to gram/jobmanager/source directory

Here's an alternative patch which adds a command-line option to the job manager
which allows systems to present a different GLOBUS_LOCATION for the target
execution machine instead of using the same for the job manager environment and
the job environment. This avoids the scheduler-script-specific tweaks. If you
use this new option -target-globus-location, the GLOBUS_LOCATION rsl value will
be substituted with that value, and the GLOBUS_LOCATION environment variable
will be substituted with that value in the job's environment. The script
invocations used by the job manager (to submit and stage jobs) will have the
job manager's globus location in their environment.

joe
------- Comment #5 From 2006-06-05 16:07:56 -------
I've committed the new patch to the CVS trunk.
------- Comment #6 From 2006-06-05 16:38:20 -------
Apologies for not returning to comment sooner.

If I understand this approach correctly, it assumes that the
-target-globus-location will be the same for all target computing resources
served by the GRAM. This means that in order to serve multiple different target
resources that may each have different local GLOBUS_LOCATIONs, we would have to
run separate instances of GRAM, one for each target resource with a different
GLOBUS_LOCATION.

Is this right?

(In reply to comment #4)
> Created an attachment (id=947) [edit] [details]
> patch to gram/jobmanager/source directory
> 
> Here's an alternative patch which adds a command-line option to the job manager
> which allows systems to present a different GLOBUS_LOCATION for the target
> execution machine instead of using the same for the job manager environment and
> the job environment. This avoids the scheduler-script-specific tweaks. If you
> use this new option -target-globus-location, the GLOBUS_LOCATION rsl value will
> be substituted with that value, and the GLOBUS_LOCATION environment variable
> will be substituted with that value in the job's environment. The script
> invocations used by the job manager (to submit and stage jobs) will have the
> job manager's globus location in their environment.
> 
> joe
> 
------- Comment #7 From 2008-02-04 17:29:16 -------
Hi Derek,

I'm looking through 4.2 bugs.  Did this get resolved?  Does Joe's patch do what
you needed?

-Stu
------- Comment #8 From 2008-02-04 17:50:04 -------
Subject: Re:  globus-job-get-output bug prevents output delivery, PBS
jobmanager affected. See also globus-job-clean, globus-job-cancel

No, as far as I know, this was not resolved - we continued on here at  
PSC with the patches I had made at the time. I didn't get an answer to  
my last questions, and I don't recall being able to utilize the patch  
Joe made - to be frank I haven't looked at this in quite a long time.  
It has not been raised as a significant user issue here at PSC (yet)  
since only a very few users have ever tried to submit jobs via Globus  
to our systems.

- Derek

---
Derek Simmel
Pittsburgh Supercomputing Center
(412) 268-1035
------- Comment #9 From 2012-09-12 13:24:17 -------
We've migrated our issue tracking software to jira.globus.org. Any new issues
should be added here:

http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363

As this issue hasn't been commented on in several years, we're closing it. If
you feel it is still relevant, please add it to jira.