Bug 3781 - GSI caching of CRLs causes problems when process lifetime exceeds CRL lifetime
: GSI caching of CRLs causes problems when process lifetime exceeds CRL lifetime
Status: NEW
Credentials and Proxies
: 4.0.1
: All All
: P3 normal
: ---
Assigned To:
  Show dependency treegraph
Reported: 2005-09-26 15:26 by
Modified: 2008-08-11 14:56 (History)



You need to log in before you can comment on or make changes to this bug.

Description From 2005-09-26 15:26:33
If the lifetime of the CRL exceeds the lifetime of an application linked to the
GSI libraries, the CRL is not automatically refreshed. This causes things to
basically stop working once the CRL expires.

This causes problems specifially with Purdue's CA (4 hour CRL lifetime) in the
context of TeraGrid.

From Jaime:
The gahp server (the part of Condor that talks to Globus) caches the user's
proxy in memory. GSI automatically caches the CRL along with the proxy and
doesn't attempt to re-read it from disk if the in-memory version expires. Thus,
after 5 hours (or less), authentication stops working.
------- Comment #1 From 2005-09-26 16:20:48 -------
The Java code loads the CRLs each time the client authenticates to the service 
and if the CRL file has been modified since last access, it is loaded again.

I am assinging it to Raj to look at the issues in C code.
------- Comment #2 From 2005-09-26 16:31:04 -------
Reading this again, I am confused on what exactly is failing. Can you please 
provide more details. Also, anytime a proxy path validation is done using the 
Java GSI Code, the CRLs should be reloaded. Is this using the Java CoG API ? 
------- Comment #3 From 2005-09-26 16:52:29 -------
This is using the C GSI APIs in pre-WS GRAM.

Quoting Wendy Lin@Purdue below, I believe with the long-running job attempted to
contact another process (in this case to report completion), it's path
validation used the cached CRL which was now expired.

begin quote:
We had observed something puzzling when submitting a long job (4+ hours), via
condor_submit, to the TG. When the certificate of the job submitter was issued
by NCSA, the job would run to completion and the output would come back. But
when the certificate was issued by Purdue, the job would run to completion, but
the output would not come back. In fact, condor_q would show the job was being
held when it's still queued or running at remote site.
end quote: