Bugzilla – Full Text Bug Listing
|Summary:||GRAM4 auditing: Need for an INFO log message|
|Product:||GRAM||Reporter:||John Weigand <email@example.com>|
|Component:||general||Assignee:||Stuart Martin <firstname.lastname@example.org>|
|Severity:||critical||CC:||email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com|
|Hardware:||Open Science Grid (OSG)|
In GRAM4, when GRAM auditing is activated, there are no messages written to a log file unless you request DEBUG as the logging level. I added this entry in the GLOBUS_LOCATION/container-log4j.properties file to get some messages out: log4j.category.org.globus.exec.utils.audit=DEBUG It generated more than I ever needed, but not a single one had an INFO debug level. At the very least, there should be at least one to an admin can readily verify the auditing is working and for troubleshooting purposes. It may vary well be that there another package other than org.globus.exec.utils.audit that should have been turned on. I just did not find it. For that one INFO message, I would like to suggest an abbreviated version of the actual database update. The entire update/insert DML in GRAM4 is rather long. I think for general admin purposes a shorter version would be sufficient containing something like: SQL: UPDATE gram_audit_table resource_manager_type='Condor' job_grid_id='https://126.96.36.199:9443/wsrf/services/ManagedExecutableJobService?HnhFkVGtwfI5hn9a7g5rqA5oiww=' local_job_id='061.000.000', subject_name='/DC=org/DC=doegrids/OU=People/CN=John Weigand 458491', username='fnalgrid', success_flag='true', finished_flag='true' Even this can be quite lengthy, but I am not sure what of it can be eliminated. This would be for each DML command sent to the database.
Hi John, Martin has made some improvements for auditing that should be coming in 4.2.1. One thing is that for any DB error and exception will be thrown and seen in the container log. Given that, we don't think it makes sense to have an INFO statement just for the DB auditing. Do you agree? If an admin does want to troubleshoot a problem, there are log4j DEBUG entries to turn on all the details.
In response to the previous comment about only showing DBErrors /exceptions: Just getting errors does not tell me if GRAM Auditing is working or activated. I would hope there would be an INFO message, telling me when it is active and another if it is inactive although I am not sure how this can be handled as I am not sure if there is a change in how this is being implemented for WS versus pre-WS. As I recall, for web services, it is the container that has an open thread in which case this would be viable. Can WS updates, record an INFO message for every nth update saying something like n of N updates completed. For pre-WS, it is a cron so this is likely not practical. For pre-WS can each update say 'n updates applied' or something to that affect. At least then an admin can view the log and verify all is well and easily see things are working. As for DBErrors/exceptions, I am hoping 'exceptions' is not implying a stacktrace of some sort but rather just a generic reference to an error condition. A stacktrace is only of value to a developer and not an administrator and should never appear in log files unless it is a totally bizarre condition. If a generic reference, I assume it will give a meaningful message with meaningful information. Same for DBErrors. I have not kept up on how the 'Reliability' concern mentioned in http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=6024 is addressed. This, I believe, was added for this buzilla item: http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5713 Depending on how this 'Reliability' concern is addressed, how will the generation of log messages be handle when the database is down for example (this would likely be the most common condition for error). 1. In pre-ws, if a cron, one message every time the cron runs is probably ok. 2. For WS, one message for every update could get overwhelming. When errors occur that affect logging, it would be good if some form of email message be generated to inform the administrator of the problem. Since the logging is using log4j and log4j has a mechanism for emailing, this might seem like a good solution. However, my experience (limited) with log4j emails is that the application has to be very careful in how many messages it will flag for email notices or you overwhelm the mail services. So, the application has to be be sensitive to this or have its own internal mail notification mechanism. Has any thought been given to this. I recognize that logging is generally an afterthought in most applications, but it is very important in administering a production system. I also recognize that doing it right can also be as time-consuming as getting the basic functionality of any service right. But it is a critical component of all applications.
Martin implemented fallback functionality in GRAM4 in case of DB errors, see bug 6357. It would make sense to have an INFO statement like: INFO DB insert failed - saving audit record to be uploaded later in fallback directory /some/dir/save/unique_audit_record_file and then once the DB is back up, there would be statements like: INFO DB saved audit record from fallback directory /some/dir/save/unique_audit_record_file inserted successfully Those records would be rare, only when the DB would go down. For audit V2, we have defined 5 different tables, an INFO statement for each successful insert seems like a lot. I suppose a periodic INFO statement saying all is well 100 of last 100 audit records successfully inserted, but seems to me that you could trust that no news is good news.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363