Bug 5776 - GRAM4 auditing: Need for an INFO log message
: GRAM4 auditing: Need for an INFO log message
Status: RESOLVED WONTFIX
: GRAM
general
: 4.0.5
: Open Science Grid (OSG) All
: P3 critical
: 4.0.7
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-01-11 12:58 by
Modified: 2012-09-05 13:44 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-01-11 12:58:12
In GRAM4, when GRAM auditing is activated, there are no messages written to a
log file unless you request DEBUG as the logging level.  

I added this entry in the GLOBUS_LOCATION/container-log4j.properties file to
get some messages out:
  log4j.category.org.globus.exec.utils.audit=DEBUG

It generated more than I ever needed, but not a single one had an INFO debug
level.  At the very least, there should be at least one to an admin can readily
verify the auditing is working and for troubleshooting purposes.

It may vary well be that there another package other than
org.globus.exec.utils.audit that should have been turned on.  I just did not
find it.

For that one INFO message, I would like to suggest an abbreviated version of
the actual database update.  The entire update/insert DML in GRAM4 is rather
long.  I think for general admin purposes a shorter version would be sufficient
containing something like:
 SQL: UPDATE gram_audit_table 
 resource_manager_type='Condor'

job_grid_id='https://131.225.107.63:9443/wsrf/services/ManagedExecutableJobService?HnhFkVGtwfI5hn9a7g5rqA5oiww='
 local_job_id='061.000.000',
 subject_name='/DC=org/DC=doegrids/OU=People/CN=John Weigand 458491',
 username='fnalgrid',
 success_flag='true',
 finished_flag='true' 

Even this can be quite lengthy, but I am not sure what of it can be eliminated.

This would be for each DML command sent to the database.
------- Comment #1 From 2008-09-16 17:30:41 -------
Hi John,

Martin has made some improvements for auditing that should be coming in 4.2.1. 
One thing is that for any DB error and exception will be thrown and seen in the
container log.  Given that, we don't think it makes sense to have an INFO
statement just for the DB auditing.  Do you agree?

If an admin does want to troubleshoot a problem, there are log4j DEBUG entries
to turn on all the details.
------- Comment #2 From 2008-09-17 11:30:14 -------
In response to the previous comment about only showing DBErrors /exceptions:

Just getting errors does not tell me if GRAM Auditing is working or activated. 
I would hope there would be an INFO message, telling me when it is active and
another if it is inactive although I am not sure how this can be handled as I
am not sure if there is a change in how this is being implemented for WS versus
pre-WS.  

As I recall, for web services, it is the container that has an open thread in
which case this would be viable.  Can WS updates, record an INFO message for
every nth update saying something like n of N updates completed.

For pre-WS, it is a cron so this is likely not practical.  For pre-WS can each
update say 'n updates applied' or something to that affect.

At least then an admin can view the log and verify all is well and easily see
things are working. 

As for DBErrors/exceptions, I am hoping 'exceptions' is not implying a
stacktrace of some sort but rather just a generic reference to an error
condition.  A stacktrace is only of value to a developer and not an
administrator and should never appear in log files unless it is a totally
bizarre condition.

If a generic reference, I assume it will give a meaningful message with
meaningful information.  Same for DBErrors.  

I have not kept up on how the 'Reliability' concern mentioned in
http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=6024 is addressed.  This, I
believe, was added for this buzilla item:
http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5713

Depending on how this 'Reliability' concern is addressed, how will the
generation of log messages be handle when the database is down for example
(this would likely be the most common condition for error).
  1. In pre-ws, if a cron, one message every time the cron runs is probably ok.
  2. For WS, one message for every update could get overwhelming.

When errors occur that affect logging, it would be good if some form of email
message be generated to inform the administrator of the problem. Since the
logging is using log4j and log4j has a mechanism for emailing, this might seem
like a good solution.  However, my experience (limited) with log4j emails is
that the application has to be very careful in how many messages it will flag
for email notices or you overwhelm the mail services.  So, the application has
to be be sensitive to this or have its own internal mail notification
mechanism. Has any thought been given to this.

I recognize that logging is generally an afterthought in most applications, but
it is very important in administering a production system.  I also recognize
that doing it right can also be as time-consuming as getting the basic
functionality of any service right. But it is a critical component of all
applications.
------- Comment #3 From 2008-09-17 15:44:56 -------
Martin implemented fallback functionality in GRAM4 in case of DB errors, see
bug 6357.

It would make sense to have an INFO statement like:

INFO DB insert failed - saving audit record to be uploaded later in fallback
directory /some/dir/save/unique_audit_record_file

and then once the DB is back up, there would be statements like:
INFO DB saved audit record from fallback directory
/some/dir/save/unique_audit_record_file inserted successfully

Those records would be rare, only when the DB would go down.

For audit V2, we have defined 5 different tables, an INFO statement for each
successful insert seems like a lot.

I suppose a periodic INFO statement saying all is well 100 of last 100 audit
records successfully inserted, but seems to me that you could trust that no
news is good news.
------- Comment #4 From 2012-09-05 13:44:43 -------
Doing some bugzilla cleanup...  Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5.  Also, we're now tracking
issue in jira.  Any new issues should be added here:

http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363