Bug 5713 - GRAM auditing: Failed database connection loses audit records
: GRAM auditing: Failed database connection loses audit records
Status: RESOLVED DUPLICATE of bug 6400
: GRAM
general
: 4.0.5
: Open Science Grid (OSG) Linux
: P2 major
: 4.0.7
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2007-12-10 12:30 by
Modified: 2009-03-19 09:47 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-12-10 12:30:38
In both pre-ws and ws Gram auditing, when a database connection/update fails,
the audit record is lost.

I am currently testing this with:
 Condor 6.8.3
 Globus 4.0.5
 MySQL 4.1.22


There a a couple related issues with this behavior:
  1. In both ws and pre-ws, the only indication of this ia a java stack trace.
     These connection problems never appear to be caught and therefore no log
     message is generated.

  2. In ws, the exception is thrown only once indicating to me that it
     recognizes the failure and never attempts again to connect.

     In pre-ws, the exception is naturally thrown with each execution of the
     cron job.

  3. In both cases, the audit information is lost.
     In ws, since there is no queuing/staging of the data, they just vanish.
     In pre-ws, the file in
     $GLOBUS_LOCATION/share/globus_gram_job_manager_auditing is deleted.

I would suggest that in both ws and pre-ws...,
  1. audit records should have some form of recovery capability in the event
     of a database outage
  2. some type of log message should be generated to notify an admin of a
     problem

Also, I should note that the jobs are being successfully processed by the
batch job/queue manager (condor) regardless of the GRAM audit failure.
------- Comment #1 From 2008-01-09 16:24:44 -------
In GRAM2, I discovered the --check argument to the cron which will advise that
the database connection failed and it does NOT remove the audit record.

In GRAM4, I still cannot determine an option for this same capability.  
------- Comment #2 From 2009-03-19 09:47:44 -------

*** This bug has been marked as a duplicate of bug 6400 ***