Bugzilla – Bug 4327
Script crashing with "Terminated"
Last modified: 2006-04-05 22:07:30
You need to log in before you can comment on or make changes to this bug.
This is what I'm seeing: % $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl -m fork -f crashing.jdl -c cancel Terminated I will attach crashing.jdl. Running this command by hand always crashes. It doesn't seem to always happen when the WS-GRAM service runs it, but it can be easily reproduced by running the WS-GRAM unit tests. Somehow this is crashing the entire container. I've been able to reproduce this on two machines so far (my home machine and ruly).
Created an attachment (id=923) [details] A regular Perl job description for testing.
Here's some more strangeness. I ran into this in the container log when it crashed at one point: ... [junit] /usr/local/globus/globus-4.0/libexec/globus-job-manager-script.pl -m fork -f /usr/local/globus/globus-4.0/tmp/gram_job_mgr58689.tmp -c cache_cleanup Terminated logan% But running it by hand using the crashed.jdl seems to be ok.
I believe I found the problem. When the UUID was added to the PIDs, the cancel subroutine in fork.pm was attempting to kill the process based on <UUID>:<PID> rather than just <PID>. Removing the UUID seems to have fixed the problem. The change was tivial: Index: fork.in =================================================================== RCS file: /home/globdev/CVS/globus-packages/gram/jobmanager/setup/fork/fork.in,v retrieving revision 1.20.4.2 diff -u -r1.20.4.2 fork.in --- fork.in 3 Mar 2006 19:59:33 -0000 1.20.4.2 +++ fork.in 6 Apr 2006 02:49:06 -0000 @@ -443,6 +443,7 @@ foreach (split(/,/,$jobid)) { + s/..*://; $pgid = getpgrp($_); $pgid == -1 ? kill($signo{TERM}, $_) :
Fix in HEAD and globus_4_0_branch.