Bugzilla – Bug 4558
ws failure on corrupt grid-mapfile
Last modified: 2006-12-19 16:29:03
You need to log in before you can comment on or make changes to this bug.
The software installed is Globus Toolkit, web-services, server 4.0.1 from the VDT 1.3.10 distribution. Using the default gridmap authorization service, if a corrupt entry is found in the grid-mapfile, the container appears to have the following behavior: 1. If it is just being started, the container will fail to start. 2. If the container is already running and it detects an updated grid-mapfile, the container will retain the currently cached (the caching is an assumption on my part) grid-mapfile and ignore the new the new grid-mapfile. It will continue servicing authorization requests. The corruption can be just a single entry in the grid-mapfile (like having only one token on the line) as in the example (a real one) below: "/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand/USERID=jweigand" cms "/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand" "/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=John Weigand/UID=jweigand" cms In the our environment (OSG), the population of the grid-mapfile can occur is several ways, the 2 most common are: 1. using edg-mkgridmap 2. using a PRIMA/GUMS authorization server Both of these methods pull VO membership data from 22 VOs to create the gridmap file. In pre-ws, the gatekeeper and gsiftp processes just ignore the individual corrupt lines and load all valid grid-mapfile entries during the authorization process. I would like to see the ws gridmap authorization service behave the same. By doing so, a minimal number of VO members are affected. Generally, it would be only the member(s) whose record is corrupt. - Updated membership for all the other VO's would be in effect. - If my ws container happens to be down or a restart is required for some reason, my grid services would still be available to most of the VO members I support. As it currently operates, I am now dead in the water and have to take immediate action. Does this sound reasonable? John Weigand
I've recategorized this as a Java WS Core bug since GRAM has no control over the API that loads the grid-mapfile.
Updated the GridMap API in CoG with a setIgnoreErrors() function to control whether errors in the gridmap file should be ignored or not. By default errors will not be ignored so the security code in Java WS Core needs to be updated to call this function appropriately. Reassigning to Rachana.
Updated security code in WS Core trunk to ignore errors in the gridmap file. All correct entries will be be picked up and used, even if erroneous entries are present. I am wary of making this change in 4.0.x. But we provide support for pluggable authorization scheme in 4.0.x, so you could plugin a custom GridMap Authorization implementation that ignores errors.
Will it still generate a error message on the erroneous entries? It did not sound like this was going in 4.0.x. In what version will it be available?
You will just see a warning for bad entries. The feature will be in all future 4.x versions. I think the next one will be 4.2.
Subject: Re: ws failure on corrupt grid-mapfile It is possible to provide a set of patches that can be applied to GT 4.0.2? If so, we can integrate these changes into a future version of the VDT much sooner than we'll get GT 4.2. John, it's too late for this to go into VDT 1.3.11, but it can make it in time for OSG 0.6.0. -alain
I think I'd like to suggest that this be an error (rather than warning) as it suggests a condition that an administrator should take action on. Warnings are generated when authorizations are denied which is adequate since there is no need to act on every denial. But this indicates a data corruption problem in some system somewhere and it should be corrected. To be more specific, what I would like to suggest is that a single ERROR message indicating corruption in the grid-mapfile being loaded and individaul WARN messages for the lines in error. In this way, log4j can be configured to issue an email message to the administrator when the corruption is detected, but it would only be a single email. If all I get is warnings for each line, I have no way of limiting the number of emails sent. Potentially I could get thousands sent. Can this be taken into consideration?
As an aside, is there any chance you would modify your grid-mapfile updating process to include a call to $GLOBUS_LOCATION/sbin/grid-mapfile-check-consistency? It seems that you could be notified of a grid-mapfile error at the time the grid-mapfile is updated, rather than waiting for a runtime ERROR. This is independent of the activity of this bug, which all sounds reasonable to me. I'm just curious about how the errors are making it in undetected in the first place.
Rather than a Java WS Core patch, Jarek suggsted a patch against CoG code. Woukd you open to that ? There it is just a one line change and you will need to ship a different CoG jar. Rachana
Subject: Re: ws failure on corrupt grid-mapfile There is a bugzilla request in on the process that generates the grid-mapfile as well. Timing-wise they would be no difference since the ws container checks the time-stamp on the grid-mapfile with every authorization (at least that is the behavior I observed).. So, on an active CE node, it will almost be simultaneously. The grid-mapfile-check-consistency would have to be used on the CE node as that is where the script is run. As for the systems involved, I''ll give you a little background into the dataflow in some of the OSG environments: VOMRS -> VOMS -> GUMS ->Globus-ws on the CE node The above is pretty much the max number of "systems" the data flows through. In this particular instance, VOMRS was not in the picture and it started in VOMS. Each of these systems should prevent this condition from occurring, but there are always the "OOPS's" that occur. This was one of those. It was actually caused by a mass change of some of the data in VOMS that was basically done in MySql/Oracle SQL bypassing validations. These things have to be done sometimes. Each system should be checking for these conditions (either when outputting the data-VOMS/GUMS or inputting the data-GUMS), but none were. Actually, I found it kindof interesting that Oracle/MySql will let you put the newline character in and, even stranger yet, when you looked in VOMS at the list of users for the VO on the webUI, that DN looked normal. Only when you used sql itself, did it show up. Was not easy to find. bugzilla-daemon@mcs.anl.gov wrote: >http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=4558 > > > > > >------- Comment #8 from bacon@mcs.anl.gov 2006-07-06 11:39 ------- >As an aside, is there any chance you would modify your grid-mapfile updating >process to include a call to >$GLOBUS_LOCATION/sbin/grid-mapfile-check-consistency? It seems that you could >be notified of a grid-mapfile error at the time the grid-mapfile is updated, >rather than waiting for a runtime ERROR. > >This is independent of the activity of this bug, which all sounds reasonable to >me. I'm just curious about how the errors are making it in undetected in the >first place. > > > > >------- You are receiving this mail because: ------- >You reported the bug, or are watching the reporter. > >
Thanks for the information about the workflow. That satisfies my curiosity, so I'll let this bug get back to its regularly scheduled owners.
Subject: Re: ws failure on corrupt grid-mapfile >Rather than a Java WS Core patch, Jarek suggsted a patch against CoG code. >Woukd you open to that ? There it is just a one line change and you will need >to ship a different CoG jar. To be honest, I have no idea what is preferable. Both Java WS Core and the CoG code are distributed as part of a full GT4 installation, right? Either one fixes the bug for me? In principle, I really don't care which you do, as long as it makes sense you to. You're the developers and you know best. That said, I'm not sure how to patch the CoG code. I think that in our build process, CoG is pre-built and not built when we build the rest of Globus. If you, or someone else, can help us figure out how to get the new CoG into our GT 4.0.2 build, we're happy with that solution. Thanks, -alain
I notice that this bug has a status of resolved. Is it possible to know how it was fixed and in what release? My issue is with how it is going to react to corrupt or invalid entries in the grid-mapfile: 1. is going to do it silently (not good) 2. is it going to generate an error for every invalid line (also not good) Ideally, it should generate 1 error message and multiple warnings (1 per line). This allows for use of log4j properties to notify an admin of the problem with minimal mail messages generated. Also, I am curious as to why I never got email on some of the comments or on the closure of the bug? .... and why when I use the query screen for My Bugs, I never see it come up? Thanks John Weigand
Alain, For including this in GT4.0.x, I created a globus_4_0_community branch within the CoG repository and changed the default gridmap behavior to ignore errors. Uploading jar as attachment here. Rachana
Created an attachment (id=1016) [details] CoG Jar CoG Jar with deault GridMap behavior changed to ignore errors in the gridmap file.
The trunk code can be modified to print one error log if issues arise with parsing the GridMap file. We'll modify the CoG API to indicate if any error occured while reading the file and have the security code in GT core write out an error. I am reopening this bug. But this would require an API change and can only be done in trunk. The patch provided for 4.0.x code has to remain as it is, where it prints warnings.
Subject: Re: ws failure on corrupt grid-mapfile Thanks! Do I need the jar file at build time, or can I just lay it down in our Globus installation at run time? I'm guessing that I can do the latter, and that would be pleasantly simple. Thanks, -alain
I hate to be a nag, but I'm still unclear as to the long-term solution. In the short-term (for VDT's current release), I am not that concerned. ILong-term it really needs: 1. One error for a corrupt log file regardless of how many invalid entires 2. A warning for each invalid entry so the admin can trouble-shoot the problem. john
Alain, If I understand correctly, you are interested in getting this going with globus_4_0_2 + patches. So it might be okay to just replace the jar in $G_L after you are done building. But in general, it would be best to build with the jar, so that any incompatibilities can be identified during the build process. Rachana
John, We are working towards what you are proposing and will include the changes to trunk code. The CoG API that reads the GridMap file will write out a warning for each invalid entry in the gridmap. The Java WS Security code (that uses the CoG API to read GridMap during authorization) will write out a single logger error if the read of GridMap detected any errors. Rachana
Updated Java WS security trunk code with relevant changes. A warning for every invalid entry and a single error log for any load/refresh that has invalid entries will be printed.
I had been advised that this fix was contained available in the 4.0.3 release which is what is being used in the current VDT 1.5.1 release. Can someone verify that this is true, at least, in terms of the 4.0.3 release? I have just completed testing of VDT 1.5.1, which contains 4.0.3, and this does not appear to be resolved. The behavior is the same as before. Thanks John Weigand
Here is a summary of things in GT: (1) Trunk: * By default GirdMap class does not ignore errors. If ignore error is enabled, it prints warning per error with detail * WS security layer enbales ignore errors in GridMap class and prints one logger.error per file with atleast one erroneous entry (2) 4.0.branch and any 4.0.x release: * Gridmap read will fail if there are any erroneous entries. The change required to support just warning is an API change and cannot be made (3) Community branch * By default GridMap class ignores errors and prints a warning per error with detail * WS security layer uses above class, but does not print any additional logger.error Based on previous comments on this bug it seems like plan was for VDT release to use the CoG jar from community branch, but I am not aware of what was pulled in for VDT release.