Bug 1538 - Gatekeeper log rotation and logging job accounting info
: Gatekeeper log rotation and logging job accounting info
Status: RESOLVED FIXED
: GRAM
gt2 Gatekeeper/Jobmanager
: 1.6
: PC All
: P2 enhancement
: 4.2.1
Assigned To:
:
: VDT
:
: 6192
  Show dependency treegraph
 
Reported: 2004-02-12 17:57 by
Modified: 2008-08-15 04:44 (History)


Attachments
Gatekeeper patch (12.22 KB, patch)
2004-02-12 18:01, Alain Roy
Details
Job Manager patch (9.62 KB, patch)
2004-02-12 18:02, Alain Roy
Details
LSF Accounting patch (2.02 KB, patch)
2004-02-12 18:03, Alain Roy
Details
Patch to find-lsf-tools (495 bytes, patch)
2004-02-12 18:05, Alain Roy
Details
patch for globus-script-lsf-queue (636 bytes, patch)
2004-02-12 18:07, Alain Roy
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2004-02-12 17:57:06
We have here several related patches. It's kind of a big lump, but:

1) They are well-tested because they have been used heavily by EDG and LCG.
2) They are very useful.

There are two features added:

1) Rotate the gatekeeper's log upon receiving SIGUSR1
2) Do some logging for job accounting purposes. By default, this goes into the 
gatekeeper log, but it can be configured to go into another log. This is 
desired by MANY users, not just EDG and LCG.

These features are combined since the accounting log will also be rotated.

These patches are in the VDT, and we really hope we can stop distributing a 
modified version of Globus. I hope we can work together to find a way to get 
this, or something similar, into an upcoming version of Globus. I realize that 
these are big patches: please let me know what we can do to work effectively 
with you on these patches. 

Let me repeat: these are well-tested patches. 

I will add the patches as attachments one at a time after the initial bug 
submission.

Please talk to me if there is any confusion or questions. I'm happy to discuss 
this further.

Thanks!
-alain, from the VDT
------- Comment #1 From 2004-02-12 18:01:14 -------
Created an attachment (id=302) [details]
Gatekeeper patch

This patch modifies the gatekeeper in two ways:

1) Setup the job accounting log file and inform job managers about it.
2) Rotate gatekeeper and accounting log files when SIGUSR1 is received
------- Comment #2 From 2004-02-12 18:02:17 -------
Created an attachment (id=303) [details]
Job Manager patch

This modifies the job manager to allow logging of job accounting information.
------- Comment #3 From 2004-02-12 18:03:39 -------
Created an attachment (id=304) [details]
LSF Accounting patch

This allow LSF to log extra information about the job for the job accounting.
You get basic accounting without it, this just gives more information.
------- Comment #4 From 2004-02-12 18:05:21 -------
Created an attachment (id=305) [details]
Patch to find-lsf-tools

The LSF accounting patch uses bacct, so find-lsf-tools needs to be modified to
find it. This patch does that. It is a trivial patch.
------- Comment #5 From 2004-02-12 18:07:23 -------
Created an attachment (id=306) [details]
patch for globus-script-lsf-queue

If LSF returns FAILED, then the accounting doesn't work, so this patch changes
that. This is the part I'm least confident off: David, will users will see
failed jobs properly?
------- Comment #6 From 2004-02-13 10:53:23 -------
Hi,

For LSF the EXIT state means that a job finished with a non zero exit code. (In
addition it can also happen if the job is removed from the batch system)

Depening how LSF is setup the 'job exit code' may be the exit code from the
user's job submission script - but in general it is the exit code of the
administrator defined 'LSF job starter' (also often a script). Therefore it's
not clear, for every site, what EXIT will imply regarding the success of the
system to run the job. (That was what I imagined was relevant for the globus job
state)

Of course we could consider other approaches - for instance to exit with a zero
result after the user command, as written in the LSF submission script generated
by the lsf jobmanager. If there are then globus submitted jobs in EXIT we could
guess something went wrong. However this implies that we expect the user job
return code to be returned by the LSF job starter - possibly not true
everywhere. The simplest way forward appeared to be not to try to use the LSF
DONE/EXIT state information to determine the globus job state.

Yours,
David
------- Comment #7 From 2004-02-13 16:33:43 -------
Alain,

I am doubtful that this patch will make it into the 3.2 release.  We are 
testing release candidates now and hope to have the 3.2 beta out early next 
week.  After beta, only bug fixes will be applied.   A patch of this 
significance seems to risky.  I would anticipate this patch being applied to 
the 4.0 release.

-Stu
------- Comment #8 From 2004-06-10 10:02:24 -------
Alain,

I am removing this enhancement from the 4.0 target milestone.  I don't think we are going 
to have the manpower to review, apply and test this in time for the 4.0 release.

-Stu
------- Comment #9 From 2004-06-10 15:01:03 -------
Subject: Re:  Gatekeeper log rotation and logging job
  accounting info


>I am removing this enhancement from the 4.0 target milestone.  I don't 
>think we are going to have the manpower to review, apply and test this in 
>time for the 4.0 release.

Really? That's too bad!

I think that the accounting log is not only quite simple, but is incredibly 
useful. Is there anything I can do to help make the process easier?

Thanks,
-alain



------- Comment #10 From 2004-06-15 10:18:27 -------
For some reason bugzilla attributed comment 9 to "d arroyo" when it should have
been attributed to Alain Roy.  I'll see why the email interface behaved like
that, but wanted to correct the attribution for the record.
------- Comment #11 From 2004-07-20 18:16:35 -------
Now that Globus 4.0 is being delayed, is it possible to include this patch 
into it?

Thanks,
-alain
------- Comment #12 From 2008-07-30 15:02:05 -------
*** Bug 4771 has been marked as a duplicate of this bug. ***
------- Comment #13 From 2008-08-15 04:44:09 -------
The patches for these are committed to 4.2 branch and trunk.