Bug 3672 - PBS output streaming
: PBS output streaming
Status: NEW
: LRMA
Jobmanagers
: unspecified
: PC Linux
: P2 enhancement
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-08-22 11:38 by
Modified: 2008-05-02 16:19 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-08-22 11:38:56
From: 	Damon Smith <damon@vpac.org>
To: 	discuss@globus.org
Subject: 	[Globus-discuss] Globus 4, PBS stdin and stdout with streaming
Date: 	Mon, 22 Aug 2005 15:46:40 +1000  (Sun, 23:46 MDT)

I'm testing globusrun-ws with streaming.  If I submit a job to a PBS
server which takes a while to complete, the streaming fails as the
gsiftp server can't find the output and error files.

As far as I can tell, the job goes in, it's got a PBS directive for
output and error files, but these files are only generated by PBS at the
end of the job, and the gsiftp server is looking for them at the start
of the job.  
If the generated PBS script used redirects instead, then streaming would
work, but then you assume things about the way pbs is setup.

This problem also occurs intermittently when NFS doesn't keep up with
the job starting and finishing.
------- Comment #1 From 2005-08-22 18:15:16 -------
Hi, I tried an simple approach. I touched input and error files in scheduler
adapter just before submitting the job. File
$GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm, lines 393,394:
 system("touch",$description->stdout());
 system("touch",$description->stderr());

and the globusrun-ws worked fine. 

However, just as Damon said, PBS's default behaviour is to transfer output at
the end of the job. In my case globusrun-ws simply waited until the job was
finished and returned the output. So, I'm not sure if this is a perfect solution
for you.

Concerning other solutions, I'm not sure in which way did Damon redirect output: 
- in the pbs script itself or
- by using interactive jobs (PBS option -I). 

I think that using interactive jobs would be the cleaner solution. However in
that case pbs adapter script has to be aware that job needs streaming, it should
call qsub with -I and redirect output, input and error. And (as Damon said) you
have to assume that PBS environment is properly configured for interactive jobs. 
Empty stdout and sterr files are needed even in this case to cope with NFS
delays and case when job gets queued. 
------- Comment #2 From 2005-08-22 19:37:11 -------
I meant using the redirect ">" operator in bash.
as in modifying the pbs script generator pbs.pm so that it says:
executable args < stdin > stdout 2>stderr;

The only problem then is that you assume there's a shared filesystem for
whichever directory you specify stdout in, so if the user specifies 
/home/damon/stdout1 that woud be fine, but if they specify
/tmp/stdout1, that won't work on our clusters.

Some ideas:
- A good start would be for gsiftp to try to get the file, then fail with a
detailed error message if it's not there.  Something googleable like "Globus
can't find your job's output file on the compute resource, check that you
specified a valid output file location"

- Redirect the output to a temp directory in the user's home (gass style) and
stream it from there, then copy it to the specified stdout location at the end
of the job.  That way the user can stream data while the job is running, and
collect the data file with gsiftp at the end as well, from the expected
location.  (that's probably quite a large change though)  You still assume then
that homes are shared, but that's a fairly safe assumption.

So for starters, this fixes the problem for me in pbs.pm:
205,206c205,206
<     print JOB '#PBS -o ', $description->stdout(), "\n";
<     print JOB '#PBS -e ', $description->stderr(), "\n";
---
>     #print JOB '#PBS -o ', $description->stdout(), "\n";
>     #print JOB '#PBS -e ', $description->stderr(), "\n";
385c385,387
<           $description->stdin(), "\n";
---
>            $description->stdin(), " > ",
>            $description->stdout(), " 2> ",
>            $description->stderr(), "\n";

--
That works for cases where no stdout is specified, or a stdout in home or shared
 scratch space is specified, but if I specify /tmp, we don't share those, so it
will fail.  I still think that this, with a detailed error message, would be a
good start.

Also, maybe you guys can elighten me, if no stdout is specified is it the java
code that makes up the temp file name?  If the perl script had access to the
uuid type temp file AND the user specified output file that would be useful.
------- Comment #3 From 2005-11-09 11:34:57 -------
I've committed a fix to CVS which does the equivalent of touching the stdout
and
stderr files in the submission script. This will at least fix the ftp error for
pbs when streaming. The data will still not be available until the job is
complete.

Doing the other change (redirection in place of -o and -e PBS directives) will
have to wait until 4.2. It would break installations without shared filesystems
in a different way and we'd like to make that sort of behavior configurable
(not
doable in the stable branch).

joe
------- Comment #4 From 2006-10-27 10:56:22 -------
Adjusting priority on this since the bug part is fixed.  The streaming feature
enhancement would be good, but needs to be weighed along with the other feature
additions being considered.