Bugzilla – Bug 3672
PBS output streaming
Last modified: 2008-05-02 16:19:06
You need to log in before you can comment on or make changes to this bug.
From: Damon Smith <damon@vpac.org> To: discuss@globus.org Subject: [Globus-discuss] Globus 4, PBS stdin and stdout with streaming Date: Mon, 22 Aug 2005 15:46:40 +1000 (Sun, 23:46 MDT) I'm testing globusrun-ws with streaming. If I submit a job to a PBS server which takes a while to complete, the streaming fails as the gsiftp server can't find the output and error files. As far as I can tell, the job goes in, it's got a PBS directive for output and error files, but these files are only generated by PBS at the end of the job, and the gsiftp server is looking for them at the start of the job. If the generated PBS script used redirects instead, then streaming would work, but then you assume things about the way pbs is setup. This problem also occurs intermittently when NFS doesn't keep up with the job starting and finishing.
Hi, I tried an simple approach. I touched input and error files in scheduler adapter just before submitting the job. File $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm, lines 393,394: system("touch",$description->stdout()); system("touch",$description->stderr()); and the globusrun-ws worked fine. However, just as Damon said, PBS's default behaviour is to transfer output at the end of the job. In my case globusrun-ws simply waited until the job was finished and returned the output. So, I'm not sure if this is a perfect solution for you. Concerning other solutions, I'm not sure in which way did Damon redirect output: - in the pbs script itself or - by using interactive jobs (PBS option -I). I think that using interactive jobs would be the cleaner solution. However in that case pbs adapter script has to be aware that job needs streaming, it should call qsub with -I and redirect output, input and error. And (as Damon said) you have to assume that PBS environment is properly configured for interactive jobs. Empty stdout and sterr files are needed even in this case to cope with NFS delays and case when job gets queued.
I meant using the redirect ">" operator in bash. as in modifying the pbs script generator pbs.pm so that it says: executable args < stdin > stdout 2>stderr; The only problem then is that you assume there's a shared filesystem for whichever directory you specify stdout in, so if the user specifies /home/damon/stdout1 that woud be fine, but if they specify /tmp/stdout1, that won't work on our clusters. Some ideas: - A good start would be for gsiftp to try to get the file, then fail with a detailed error message if it's not there. Something googleable like "Globus can't find your job's output file on the compute resource, check that you specified a valid output file location" - Redirect the output to a temp directory in the user's home (gass style) and stream it from there, then copy it to the specified stdout location at the end of the job. That way the user can stream data while the job is running, and collect the data file with gsiftp at the end as well, from the expected location. (that's probably quite a large change though) You still assume then that homes are shared, but that's a fairly safe assumption. So for starters, this fixes the problem for me in pbs.pm: 205,206c205,206 < print JOB '#PBS -o ', $description->stdout(), "\n"; < print JOB '#PBS -e ', $description->stderr(), "\n"; --- > #print JOB '#PBS -o ', $description->stdout(), "\n"; > #print JOB '#PBS -e ', $description->stderr(), "\n"; 385c385,387 < $description->stdin(), "\n"; --- > $description->stdin(), " > ", > $description->stdout(), " 2> ", > $description->stderr(), "\n"; -- That works for cases where no stdout is specified, or a stdout in home or shared scratch space is specified, but if I specify /tmp, we don't share those, so it will fail. I still think that this, with a detailed error message, would be a good start. Also, maybe you guys can elighten me, if no stdout is specified is it the java code that makes up the temp file name? If the perl script had access to the uuid type temp file AND the user specified output file that would be useful.
I've committed a fix to CVS which does the equivalent of touching the stdout and stderr files in the submission script. This will at least fix the ftp error for pbs when streaming. The data will still not be available until the job is complete. Doing the other change (redirection in place of -o and -e PBS directives) will have to wait until 4.2. It would break installations without shared filesystems in a different way and we'd like to make that sort of behavior configurable (not doable in the stable branch). joe
Adjusting priority on this since the bug part is fixed. The streaming feature enhancement would be good, but needs to be weighed along with the other feature additions being considered.