Bug 4327 - Script crashing with "Terminated"
: Script crashing with "Terminated"
Status: RESOLVED FIXED
: GRAM
wsrf scheduler interface
: development
: PC Linux
: P3 blocker
: 4.0.2
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-04-05 20:14 by
Modified: 2006-04-05 22:07 (History)


Attachments
A regular Perl job description for testing. (1.79 KB, text/plain)
2006-04-05 20:16, Peter Lane
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-04-05 20:14:48
This is what I'm seeing:

% $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl -m fork -f crashing.jdl
-c cancel
Terminated

I will attach crashing.jdl.

Running this command by hand always crashes. It doesn't seem to always happen
when the WS-GRAM service runs it, but it can be easily reproduced by running
the WS-GRAM unit tests. Somehow this is crashing the entire container. I've
been able to reproduce this on two machines so far (my home machine and ruly).
------- Comment #1 From 2006-04-05 20:16:23 -------
Created an attachment (id=923) [details]
A regular Perl job description for testing.
------- Comment #2 From 2006-04-05 20:48:53 -------
Here's some more strangeness. I ran into this in the container log when it
crashed at one point:

...
    [junit] /usr/local/globus/globus-4.0/libexec/globus-job-manager-script.pl
-m fork -f /usr/local/globus/globus-4.0/tmp/gram_job_mgr58689.tmp -c
cache_cleanup
Terminated
logan%

But running it by hand using the crashed.jdl seems to be ok.
------- Comment #3 From 2006-04-05 21:50:25 -------
I believe I found the problem. When the UUID was added to the PIDs, the cancel
subroutine in fork.pm was attempting to kill the process based on <UUID>:<PID>
rather than just <PID>. Removing the UUID seems to have fixed the problem. The
change was tivial:

Index: fork.in
===================================================================
RCS file:
/home/globdev/CVS/globus-packages/gram/jobmanager/setup/fork/fork.in,v
retrieving revision 1.20.4.2
diff -u -r1.20.4.2 fork.in
--- fork.in     3 Mar 2006 19:59:33 -0000       1.20.4.2
+++ fork.in     6 Apr 2006 02:49:06 -0000
@@ -443,6 +443,7 @@

     foreach (split(/,/,$jobid))
     {
+        s/..*://;
         $pgid = getpgrp($_);

         $pgid == -1 ? kill($signo{TERM}, $_) :
------- Comment #4 From 2006-04-05 22:07:30 -------
Fix in HEAD and globus_4_0_branch.