Bugzilla – Bug 4241
Allow for multi-line error from scheduler commands
Last modified: 2006-04-06 11:13:28
You need to log in before you can comment on or make changes to this bug.
Currently, we use GT3_FAILURE_MESSAGE to return the stderr from the scheduler commands if an error occurs. The problem is that we do not allow line breaks, so we collaps the error to one line. An error which is fairly readable on the command line: Found valid account 'TGU209' for queue 'TGnormal' Using ACL 'sdsc_datastar:ux454281:tgu209:srt690' on DataStar TG-allocated P690 roaming nodes: queues TG* WARNING: Changing #@node_usage to 'shared' instead of '' for the P690 nodes on DataStar WARNING: Changing #@network.MPI to 'sn_all,shared,US' as required for all jobs on DataStar WARNING: Changing #@resources to include 'ConsumableCpus(1)' for one CPU per task WARNING: Setting ConsumableMemory to maximum value Running job (step) for ux454281 under account 'TGU209' The job will run on 1 nodes; with 1 tasks per node. Job passed jobfilter llsubmit: Processed command file through Submit Filter: "/users00/loadl/loadl/jobfilter.pl". llsubmit: 2512-585 The "network.mpi" keyword is only valid for "job_type = parallel" job steps. llsubmit: 2512-051 This job has not been submitted to LoadLeveler. Is very hard to understand when coming out via Globus: Found valid account 'TGU209' for queue 'TGnormal' Using ACL 'sdsc_datastar:ux454281:tgu209:srt690' on DataStar TG-allocated P690 roaming nodes: queues TG*WARNING: Changing #@node_usage to 'shared' instead of '' for the P690 nodes on DataStar WARNING: Changing #@network.MPI to 'sn_all,shared,US' as required for all jobs on DataStar WARNING: Changing #@resources to include 'ConsumableCpus(1)'for one CPU per task WARNING: Setting ConsumableMemory to maximum value Runningjob (step) for ux454281 under account 'TGU209' The job will run on 1 nodes; with 1 tasks per node. Job passed jobfilter llsubmit: Processed command file through Submit Filter: "/users00/loadl/loadl/jobfilter.pl". llsubmit: 2512-585 The "network.mpi" keyword is only valid for "job_type = parallel" job steps. llsubmit:2512-051 This job has not been submitted to LoadLeveler. It would be nice to allow for line breaks in such errors.
I'll convert newlines to \n in the perl scripts and modify the Java code to translate those back into newlines
The reason I found out is that the PBS job manager has the following code: $stderr =~ s/\n/ /g; $self->respond({GT3_FAILURE_MESSAGE => $stderr }); So, the job managers will also need a few modifications.
Created an attachment (id=862) [details] 4241.diff Attached is a patch to the GRAM service (ws-gram/service/java/source) package which will translate \n into newlines. Within the scheduler scripts, do something like this to translate the newlines into \n @@ -444,7 +444,7 @@ print ERR $stderr; close(ERR); - $stderr =~ s/\n/ /g; + $stderr =~ s/\n/\\n/g; $self->respond({GT3_FAILURE_MESSAGE => $stderr }); }
I've committed a variation on the original patch (one which works) to the 4.0 branch. The scripts have already been updated to use \\n for newlines in the error info.