Bugzilla – Bug 3912
Rotation of gram_condor_log?
Last modified: 2008-01-22 09:19:49
You need to log in before you can comment on or make changes to this bug.
The condor.pm script creates a log file for recording all Condor events for all Condor jobs. It is shared between users. This file can grow without bound, but it is not clear how it should be rotated. Is it safe to rotate it while a job is being executed? For instance, if the log has: ... submit alain's job job begins running ... If I rotate the log file at this point, will it cause problems because the events for the submission and starting can no longer be read? Or does GRAM only rely on being able to read new events, so it is actually safe to rotate the file at any time? Thanks, -alain
Joe, This recently came up as needing to be understood for OSG. How can the single condor log file that is used by the condor SEG be rotated safely? -Stu
Alain, Currently, the only way to know that the condor log file can be safely rotated is when all events from the log have been processed by the SEG. There is not really a good way to know this. The SEG keeps a timestamp for recovery purposes. The timestamp is unique to a single "event" in the log file. If the container was down for some reason, meaning the SEG wasn't running, but condor continued to run, there would then be unprocessed events waiting for the SEG when it is restarted. If you rotate/truncate/remove those events, then they would be lost causing problems (job hangs waiting for DONE mostlikely) LSF and PBS have log rotation schemes that WS GRAM understand and knows how to deal with. The recovery timestamp will lead the seg to a rotated log file and then it will continue on the current log file. Something similar could be done with condor. However one problem with the log rotation method for condor is that the job submission script does the naming of the log file, and that if a job takes a very very very long time to run a job, it will keep trying to write to the name chosen at job creation time, even if it has been rotated away Given these issues, maybe it's worth revisiting if there is another channel (central log/DB/...) that the SEG can suck the information out of without using the user specified log files? The events the SEG needs are: job started, done, failed, and exit code. All events need to have a timestamp in order to identify them uniquely for recovery. Thoughts? -Stu
Subject: Re: Rotation of gram_condor_log? At 03:49 PM 3/13/2007 -0500, you wrote: >Currently, the only way to know that the condor log file can be safely rotated >is when all events from the log have been processed by the SEG. I was worried about that. >LSF and PBS have log rotation schemes that WS GRAM understand and knows how to >deal with. The recovery timestamp will lead the seg to a rotated log file and >then it will continue on the current log file. Something similar >could be done with condor. Today, Condor does not rotate this log file. I suppose it could (I'll bring it up with the Condor team), but the original expectation was that the log file would be used on a per-job basis (or perhaps per-set of jobs basis), not for all jobs. I'll talk to the Condor Team about it, but no version of Condor today rotates this log file. >Given these issues, maybe it's worth revisiting if there is another channel >(central log/DB/...) that the SEG can suck the information out of >without using the user specified log files? The events the SEG >needs are: job started, done, failed, and exit code. All events >need to have a timestamp in order to identify them uniquely for recovery. Nothing pops out at me right now: the job log is where we put those events. Theoretically you could use condor_q, but that's a bad choice for several reasons. In the past, we had talked about another option: the SEG could read multiple log files, one per job. Is that an option? -alain
test. I didn't get alain comment in email. Testing to see if I get this.
*** This bug has been marked as a duplicate of 5731 ***