Bugzilla – Bug 3781
GSI caching of CRLs causes problems when process lifetime exceeds CRL lifetime
Last modified: 2008-08-11 14:56:31
You need to
before you can comment on or make changes to this bug.
If the lifetime of the CRL exceeds the lifetime of an application linked to the
GSI libraries, the CRL is not automatically refreshed. This causes things to
basically stop working once the CRL expires.
This causes problems specifially with Purdue's CA (4 hour CRL lifetime) in the
context of TeraGrid.
The gahp server (the part of Condor that talks to Globus) caches the user's
proxy in memory. GSI automatically caches the CRL along with the proxy and
doesn't attempt to re-read it from disk if the in-memory version expires. Thus,
after 5 hours (or less), authentication stops working.
The Java code loads the CRLs each time the client authenticates to the service
and if the CRL file has been modified since last access, it is loaded again.
I am assinging it to Raj to look at the issues in C code.
Reading this again, I am confused on what exactly is failing. Can you please
provide more details. Also, anytime a proxy path validation is done using the
Java GSI Code, the CRLs should be reloaded. Is this using the Java CoG API ?
This is using the C GSI APIs in pre-WS GRAM.
Quoting Wendy Lin@Purdue below, I believe with the long-running job attempted to
contact another process (in this case to report completion), it's path
validation used the cached CRL which was now expired.
We had observed something puzzling when submitting a long job (4+ hours), via
condor_submit, to the TG. When the certificate of the job submitter was issued
by NCSA, the job would run to completion and the output would come back. But
when the certificate was issued by Purdue, the job would run to completion, but
the output would not come back. In fact, condor_q would show the job was being
held when it's still queued or running at remote site.