Bug 4241 - Allow for multi-line error from scheduler commands
: Allow for multi-line error from scheduler commands
Status: RESOLVED FIXED
: GRAM
wsrf scheduler interface
: unspecified
: PC Linux
: P3 normal
: 4.0.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-02-28 14:22 by
Modified: 2006-04-06 11:13 (History)


Attachments
4241.diff (1.73 KB, patch)
2006-03-01 14:28, Joe Bester
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-02-28 14:22:39
Currently, we use GT3_FAILURE_MESSAGE to return the stderr from the scheduler
commands if an error occurs. The problem is that we do not allow line breaks, so
we collaps the error to one line.

An error which is fairly readable on the command line:

Found valid account 'TGU209' for queue 'TGnormal'
  Using ACL 'sdsc_datastar:ux454281:tgu209:srt690'
  on DataStar TG-allocated P690 roaming nodes: queues TG*

WARNING: Changing #@node_usage to 'shared' instead of '' for the P690 nodes on
DataStar
WARNING: Changing #@network.MPI to 'sn_all,shared,US' as required for all jobs
on DataStar
WARNING: Changing #@resources to include 'ConsumableCpus(1)' for one CPU per task
WARNING: Setting ConsumableMemory to maximum value
Running job (step) for ux454281 under account 'TGU209'
        The job will run on 1 nodes; with 1 tasks per node.
Job passed jobfilter
llsubmit: Processed command file through Submit Filter:
"/users00/loadl/loadl/jobfilter.pl".
llsubmit: 2512-585 The "network.mpi" keyword is only valid for "job_type =
parallel" job steps.
llsubmit: 2512-051 This job has not been submitted to LoadLeveler.



Is very hard to understand when coming out via Globus:

Found valid account 'TGU209' for queue 'TGnormal'   Using ACL
'sdsc_datastar:ux454281:tgu209:srt690'   on DataStar TG-allocated P690 roaming
nodes: queues TG*WARNING: Changing #@node_usage to 'shared' instead of '' for
the P690 nodes on DataStar WARNING: Changing #@network.MPI to 'sn_all,shared,US'
as required for all jobs on DataStar WARNING: Changing #@resources to include
'ConsumableCpus(1)'for one CPU per task WARNING: Setting ConsumableMemory to
maximum value Runningjob (step) for ux454281 under account 'TGU209'  The job
will run on 1 nodes; with 1 tasks per node. Job passed jobfilter llsubmit:
Processed command file through Submit Filter:
"/users00/loadl/loadl/jobfilter.pl". llsubmit: 2512-585 The "network.mpi"
keyword is only valid for "job_type = parallel" job steps. llsubmit:2512-051
This job has not been submitted to LoadLeveler.



It would be nice to allow for line breaks in such errors.
------- Comment #1 From 2006-03-01 09:42:33 -------
I'll convert newlines to \n in the perl scripts and modify the Java code to
translate those back into newlines
------- Comment #2 From 2006-03-01 10:30:05 -------
The reason I found out is that the PBS job manager has the following code:

        $stderr =~ s/\n/ /g;

        $self->respond({GT3_FAILURE_MESSAGE => $stderr });

So, the job managers will also need a few modifications.
------- Comment #3 From 2006-03-01 14:28:42 -------
Created an attachment (id=862) [details]
4241.diff

Attached is a patch to the GRAM service (ws-gram/service/java/source) package
which will translate \n into newlines. Within the scheduler scripts, do
something like this to translate the newlines into \n

@@ -444,7 +444,7 @@
	 print ERR $stderr;
	 close(ERR);

-	 $stderr =~ s/\n/ /g;
+	 $stderr =~ s/\n/\\n/g;

	 $self->respond({GT3_FAILURE_MESSAGE => $stderr });
     }
------- Comment #4 From 2006-03-06 09:41:20 -------
I've committed a variation on the original patch (one which works) to the 4.0
branch. The scripts have already been updated to use \\n for newlines in the
error info.