Bug 4558 - ws failure on corrupt grid-mapfile
: ws failure on corrupt grid-mapfile
Status: RESOLVED FIXED
: Java WS Core
globus_wsrf_core
: 4.0.1
: All Linux
: P3 normal
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-06-29 10:08 by
Modified: 2006-12-19 16:29 (History)


Attachments
CoG Jar (654.97 KB, application/octet-stream)
2006-07-26 10:34, Rachana Ananthakrishnan
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-06-29 10:08:19
The software installed is Globus Toolkit, web-services, server 4.0.1 from the
VDT 1.3.10 distribution.

Using the default gridmap authorization service, if a corrupt entry is found in
the grid-mapfile, the container appears to have the following behavior:

  1. If it is just being started,
       the container will fail to start.

  2. If the container is already running and it detects an 
     updated grid-mapfile,
       the container will retain the currently cached (the caching is
       an assumption on my part) grid-mapfile and ignore the new
       the new grid-mapfile.  It will continue servicing
       authorization requests.

The corruption can be just a single entry in the grid-mapfile (like having only
one token on the line) as in the example (a real one) below:

"/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand/USERID=jweigand" cms
"/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand"
"/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand/UID=jweigand" cms

In the our environment (OSG), the population of the grid-mapfile can occur is
several ways, the 2 most common are:
  1. using edg-mkgridmap
  2. using a PRIMA/GUMS authorization server

Both of these methods pull VO membership data from 22 VOs to create the gridmap
file.

In pre-ws, the gatekeeper and gsiftp processes just ignore the individual
corrupt lines and load all valid grid-mapfile entries during the authorization
process.

I would like to see the ws gridmap authorization service behave the same.

By doing so, a minimal number of VO members are affected.  Generally, it would
be only the member(s) whose record is corrupt.  
 - Updated membership for all the other VO's would be in effect.   
 - If my ws container happens to be down or a restart is required 
   for some reason, my grid services would still be available to 
   most of the VO members I support.  As it currently operates, 
   I am now dead in the water and have to take immediate action.

Does this sound reasonable?

John Weigand
------- Comment #1 From 2006-06-29 10:11:39 -------
I've recategorized this as a Java WS Core bug since GRAM has no control over
the API that loads the grid-mapfile.
------- Comment #2 From 2006-07-05 18:31:33 -------
Updated the GridMap API in CoG with a setIgnoreErrors() function to control
whether errors in the gridmap file should be ignored or not.
By default errors will not be ignored so the security code in Java WS Core
needs to be updated to call this function appropriately. Reassigning to
Rachana.
------- Comment #3 From 2006-07-06 10:16:55 -------
Updated security code in WS Core trunk to ignore errors in the gridmap file.
All correct entries will be be picked up and used, even if erroneous entries
are present.

I am wary of making this change in 4.0.x. But we provide support for pluggable
authorization scheme in 4.0.x, so you could plugin a custom GridMap
Authorization implementation that ignores errors. 
------- Comment #4 From 2006-07-06 10:36:45 -------
Will it still generate a error message on the erroneous entries?

It did not sound like this was going in 4.0.x.  In what version will it be
available?
------- Comment #5 From 2006-07-06 10:41:57 -------
You will just see a warning for bad entries.

The feature will be in all future 4.x versions. I think the next one will be
4.2. 
------- Comment #6 From 2006-07-06 11:00:03 -------
Subject: Re:  ws failure on corrupt grid-mapfile

It is possible to provide a set of patches that can be applied to GT 
4.0.2? If so, we can integrate these changes into a future version of 
the VDT much sooner than we'll get GT 4.2.

John, it's too late for this to go into VDT 1.3.11, but it can make 
it in time for OSG 0.6.0.

-alain
------- Comment #7 From 2006-07-06 11:07:16 -------
I think I'd like to suggest that this be an error (rather than warning) as it
suggests a condition that an administrator should take action on.  Warnings are
generated when authorizations are denied which is adequate since there is no
need to act on every denial.  But this indicates a data corruption problem in
some system somewhere  and it should be corrected.

To be more specific, what I would like to suggest is that a single ERROR
message indicating corruption in the grid-mapfile being loaded and individaul
WARN messages for the lines in error.  In this way, log4j can be configured to
issue an email message to the administrator when the corruption is detected,
but it would only be a single email.  If all I get is warnings for each line, I
have no way of limiting the number of emails sent.  Potentially I could get
thousands sent.

Can this be taken into consideration?
------- Comment #8 From 2006-07-06 11:39:27 -------
As an aside, is there any chance you would modify your grid-mapfile updating
process to include a call to
$GLOBUS_LOCATION/sbin/grid-mapfile-check-consistency?  It seems that you could
be notified of a grid-mapfile error at the time the grid-mapfile is updated,
rather than waiting for a runtime ERROR.

This is independent of the activity of this bug, which all sounds reasonable to
me.  I'm just curious about how the errors are making it in undetected in the
first place.
------- Comment #9 From 2006-07-06 12:01:45 -------
Rather than a Java WS Core patch, Jarek suggsted a patch against CoG code.
Woukd you open to that ? There it is just a one line change and you will need
to ship a different CoG jar.

Rachana
------- Comment #10 From 2006-07-06 13:00:03 -------
Subject: Re:  ws failure on corrupt grid-mapfile

There is a bugzilla request in on the process that generates the 
grid-mapfile as well.  Timing-wise they would be no difference since the 
ws container checks the time-stamp on the grid-mapfile with every 
authorization (at least that is the behavior I observed)..  So, on an 
active CE node, it will almost be simultaneously.  The 
grid-mapfile-check-consistency would have to be used on the CE node as 
that is where the script is run.

As for the systems involved, I''ll give you a little background  into 
the dataflow in some of the OSG environments:
   VOMRS -> VOMS -> GUMS ->Globus-ws on the CE node

The above is pretty much the max number of "systems" the data flows 
through.  In this particular instance, VOMRS was not in the picture and 
it started in VOMS.  Each of these systems should prevent this condition 
from occurring, but there are always the "OOPS's" that occur.  This was 
one of those.   It  was actually caused by a mass change of some of the 
data in VOMS that was basically done in MySql/Oracle SQL bypassing 
validations.  These things have to be done sometimes.   Each system 
should be checking for these conditions (either when outputting the 
data-VOMS/GUMS or inputting the data-GUMS), but none were.  Actually, I 
found it kindof interesting that Oracle/MySql will let you put the 
newline character in and, even stranger yet, when you looked in VOMS at 
the list of users for the VO on the webUI, that DN looked normal.  Only 
when you used sql itself, did it show up.  Was not easy to find.



bugzilla-daemon@mcs.anl.gov wrote:

>http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=4558
>
>
>
>
>
>------- Comment #8 from bacon@mcs.anl.gov  2006-07-06 11:39 -------
>As an aside, is there any chance you would modify your grid-mapfile updating
>process to include a call to
>$GLOBUS_LOCATION/sbin/grid-mapfile-check-consistency?  It seems that you could
>be notified of a grid-mapfile error at the time the grid-mapfile is updated,
>rather than waiting for a runtime ERROR.
>
>This is independent of the activity of this bug, which all sounds reasonable to
>me.  I'm just curious about how the errors are making it in undetected in the
>first place.
>
>
>
>
>------- You are receiving this mail because: -------
>You reported the bug, or are watching the reporter.
>  
>
------- Comment #11 From 2006-07-06 13:31:23 -------
Thanks for the information about the workflow.  That satisfies my curiosity, so
I'll let this bug get back to its regularly scheduled owners.
------- Comment #12 From 2006-07-06 23:45:03 -------
Subject: Re:  ws failure on corrupt grid-mapfile


>Rather than a Java WS Core patch, Jarek suggsted a patch against CoG code.
>Woukd you open to that ? There it is just a one line change and you will need
>to ship a different CoG jar.

To be honest, I have no idea what is preferable. Both Java WS Core 
and the CoG code are distributed as part of a full GT4 installation, 
right? Either one fixes the bug for me?

In principle, I really don't care which you do, as long as it makes 
sense you to. You're the developers and you know best.

That said, I'm not sure how to patch the CoG code. I think that in 
our build process, CoG is pre-built and not built when we build the 
rest of Globus. If you, or someone else, can help us figure out how 
to get the new CoG into our GT 4.0.2 build, we're happy with that solution.

Thanks,
-alain
------- Comment #13 From 2006-07-26 09:45:00 -------
I notice that this bug has a status of resolved.

Is it possible to know how it was fixed and in what release?

My issue is with how it is going to react to corrupt or invalid entries in the
grid-mapfile:
  1. is going to do it silently (not good)
  2. is it going to generate an error for every invalid line (also not good)

Ideally, it should generate 1 error message and multiple warnings (1 per line).
 This allows for use of log4j properties to notify an admin of the problem with
minimal mail messages generated.

Also, I am curious as to why I never got email on some of the comments or on
the closure of the bug?  .... and why when I use the query screen for My Bugs,
I never see it come up?

Thanks
John Weigand
------- Comment #14 From 2006-07-26 10:32:13 -------
Alain,

For including this in GT4.0.x, I created a globus_4_0_community branch within
the CoG repository and changed the default gridmap behavior to ignore errors.
Uploading jar as attachment here.

Rachana
------- Comment #15 From 2006-07-26 10:34:29 -------
Created an attachment (id=1016) [details]
CoG Jar 

CoG Jar with deault GridMap behavior changed to ignore errors in the gridmap
file. 
------- Comment #16 From 2006-07-26 10:38:37 -------
The trunk code can be modified to print one error log if issues arise with
parsing the GridMap file. We'll modify the CoG API to indicate if any error
occured while reading the file and have the security code in GT core write out
an error. I am reopening this bug.

But this would require an API change and can only be done in trunk. The patch
provided for 4.0.x code has to remain as it is, where it prints warnings.
------- Comment #17 From 2006-07-26 10:50:03 -------
Subject: Re:  ws failure on corrupt grid-mapfile

Thanks!

Do I need the jar file at build time, or can I just lay it down in 
our Globus installation at run time? I'm guessing that I can do the 
latter, and that would be pleasantly simple.

Thanks,
-alain
------- Comment #18 From 2006-07-26 12:21:44 -------
I hate to be a nag, but I'm still unclear as to the long-term solution.

In the short-term (for VDT's current release), I am not that concerned.

ILong-term it really needs:
  1. One error for a corrupt log file regardless of how many invalid entires
  2. A warning for each invalid entry so the admin can trouble-shoot the
problem.

john
------- Comment #19 From 2006-07-26 16:57:56 -------
Alain,

If I understand correctly, you are interested in getting this going with
globus_4_0_2 + patches. So it might be okay to just replace the jar in $G_L
after you are done building. But in general, it would be best to build with the
jar, so that any incompatibilities can be identified during the build process.

Rachana
------- Comment #20 From 2006-07-26 17:02:14 -------
John,

We are working towards what you are proposing and will include the changes to
trunk code.

The CoG API that reads the GridMap file will write out a warning for each
invalid entry in the gridmap.

The Java WS Security code (that uses the CoG API to read GridMap during
authorization) will write out a single logger error if the read of GridMap
detected any errors.

Rachana
------- Comment #21 From 2006-07-27 10:21:15 -------
Updated Java WS security trunk code with relevant changes. A warning for every
invalid entry and a single error log for any load/refresh that has invalid
entries will be printed.
------- Comment #22 From 2006-12-12 13:01:49 -------
I had been advised that this fix was contained available in the 4.0.3 release
which is what is being used in the current VDT 1.5.1 release.

Can someone verify that this is true, at least, in terms of the 4.0.3 release?

I have just completed testing of VDT 1.5.1, which contains 4.0.3, and this does
not appear to be resolved.  The behavior is the same as before.

Thanks
John Weigand
------- Comment #23 From 2006-12-19 16:29:03 -------
Here is a summary of things in GT:

(1) Trunk: 

* By default GirdMap class does not ignore errors. If ignore error is enabled,
it prints warning per error with detail
* WS security layer enbales ignore errors in GridMap class and prints one
logger.error per file with atleast one erroneous entry

(2) 4.0.branch and any 4.0.x release: 

* Gridmap read will fail if there are any erroneous entries. The change
required to support just warning is an API change and cannot be made

(3) Community branch

* By default GridMap class ignores errors and prints a warning per error with
detail
* WS security layer uses above class, but does not print any additional
logger.error

Based on previous comments on this bug it seems like plan was for VDT release
to use the CoG jar from community branch, but I am not aware of what was pulled
in for VDT release.