Bugzilla – Bug 4528
WS-GRAM Auditing Test Integration on TeraGrid
Last modified: 2007-02-14 15:26:53
You need to
before you can comment on or make changes to this bug.
Title: WS-GRAM Auditing Test Integration on TeraGrid
Globus Resource Allocation Manager (GRAM)
An auditing mechanism for WS-GRAM and a proof-of-concept interface to
compound audit/TeraGrid accounting database queries has been created
using OGSA-DAI at the request of the TeraGrid infrastructure team. The
next step is to actually deploy these components on TeraGrid to get a
working example. This will provide a fully integrated proof of concept
for the entire setup as well as allow TeraGrid people to use it and
report back on how they would like to use it (i.e. what speicifc queries
will they need). Additional campaigns may need to be created to add
additional OGSA-DAI activities to support the desired query set.
1) A Globus Toolkit installation on a TeraGrid machine with WS-GRAM
and supporting services from the globus_4_0_community branch.
2) An audit database setup somewhere on TeraGrid that is accessible
from the machine used in #1.
3) OGSA-DAI deployed in the container from #1.
4) TeraGrid-specific resources and activities for OGSA-DAI in #3
installed and configured (see http://bugzilla.globus.org/bugzilla/
show_bug.cgi?id=4412) for the TeraGrid-wide accounting database and
the auditing database from #2.
5) Documentation for TeraGrid users on getting GRAM auditing data and
TeraGrid accounting data.
1) Create a Globus Toolkit installer from the globus_4_0_community
2) Determine which TeraGrid machine will be used for the test
3) Install the Globus Toolkit from the installer created in #1 on the
machine determined in #2.
4) Install the WSRF version of OGSA-DAI.
5) Install the TeraGrid resources for OGSA-DAI (obtained from link in
6) Create a database for logging audit records from WS-GRAM installed
7) Configure the resources installed in #5 to use the audit database
8) Test by submitting jobs to the container installed in #3 and using
the DemoClient provided with the resources in #5 to obtain the charge
for the submitted jobs.
9) Document the commands used in the testing from #8.
I had to do task #1 on an ia32 node since the ia64 nodes wasn't being nice
about creating an installer.
For task #2 we've been told by JP to use tg-grid1.uc.teragrid.org.
Task #3 is mostly done. I'm setting up security right now, but that may need to
change since I don't have access to a TeraGrid host cert. I'm using my DOE user
proxy for now.
Task #6 is partially done. We have a database allocated but it isn't configured
with the audit schema just yet.
Task #3 is fully done. Audit logging is happening to the database. I forced the
container to use the FQDN instead of the IP to get more human-readalbe job
Task 4-5, and 7 are done as well. I need the correct resource ID for
tg-grid1.uc.teragrid.org in order to setup the host mappings properly. After
that I can start testing the OGSA-DAI interface.
It looks like jobs submitted via PBS go to tg-master.uc.teragrid.org. I checked
the accounting DB and the associated resource ID appears to be
Unfortunately I haven't been able to test yet because my job isn't showing up
in the accounting database.
According to Michael Shapiro someone took down AIME for some reason and he
doesn't know when it will be back up. I'll keep looking for my job daily, but
this campaign is stalled until the TG accounting database is up to date.
The accounting database finally got my job info. Unfortunately I'm having
problems getting the DemoClient (queries for the charge based on a jobs GID) to
do anything but host authorization. I wrote Ally an email, so hopefully he can
point me in the right direction.
I have the OGSA-DAI stuff working and running on port 9554, but multi-user job
submissions is offline still. I need the two magic sudoers lines added before
anybody other than the globus user can submit jobs.
Also, the sudoers line to allow me to start and stop the container through the
init.d script still needs to be added. This isn't a show stopper since I can
manually use globus-[start|stop]-container-detached to do the same thing.
Here are my notes during my installation of all the WS-based stuff as the
globus user. They are very rough and quite dependent on both my personal
installation that I had and the one running under the globus account.
Nevertheless, I think it should be documented so we can develop better
instructions later on as well as a more scripted deployment.
1) Install globus_4_0_community branch.
2) Copy the following files to $GLOBUS_LOCATION/lib
3) Copy $GLOBUS_LOCATION/etc/ogsadai_wsrf
4) Copy $GLOBUS_LOCATION/share/schema/ogsadai
5) Copy the values for the following parameters to
OMITTED FOR SECURITY REASONS
6) Add the following lines to container-log4j.properties:
7) Add the following parameters to
8) Find all files with /home/lane in it and execute the following substitution:
9) Add service host mapping to
10) Copy the values for the following parameters to
<value>OMITTED FOR SECURITY REASONS</value>
11) Copy /soft/globus-wsrf-4.0.1-r3/lib/perlGlobus/GRAM/JobManager/pbs.pm to
I also fixed a couple of bugs in the WS audit logging stuff, so a fresh
checkout of globus_4_0_community is required. I updated the deployment on
tg-grid1, so it shouldn't be a problem there anymore.
I was able to get globus-personal-gatkeeper jobs to write audit records. I
believe I configured the non-personal gatekeepers appropriately, but I need
someone with root access on tg-grid1 to start those gatekeepers up before they
can be tested.
As for uploading the prews audit records, there's a bug whereby the GT version
is set to "NULL" instead of a valid version string. This causes a null value to
be uploaded to the DB which isn't allowed by the table schema. I wrote a script
(attachement coming) that will substitute in the value returned by
$GLOBUS_LOCATION/bin/globus-version to get things working. This means that
prews audit record uploading will be a two step process until Joe fixes that.
I've ran a test audit query for the job I submitted via
globus-personal-gatekeeper without problems. The accounting query returned no
results for the job as expected since accounting record uploading hasn't been
done yet today. Otherwise the accounting query worked fine. I haven't tried the
charge query since the accounting records isn't present yet.
Created an attachment (id=990) [details]
script that inserts GT version into Pre-WS audit records
Since the Pre-WS GRAM audit records are being generated with a null GT version,
this script compensates for that by reading in all *.gramaudit files in a
directory and writing out files with the same name but with a ".fixed" suffix
that contains the GT version obtained from $GLOBUS_LOCATION/bin/globus-version.
As documentation for the documentation deliverable (#5), I'm copying an email I
just wrote to some people so they could try things out themselves:
> Audit Query
Here is an example perform document that simulates the first half of what
OGSA-DAI would do in a charge query:
<?xml version="1.0" encoding="UTF-8"?>
<!-- (c) International Business Machines Corporation, 2002 - 2005.-->
<!-- (c) University of Edinburgh, 2002 - 2005.-->
<!-- See OGSA-DAI-Licence.txt for licencing information.-->
This example performs a simple select statement to retrieve
one row from the test database. The results are delivered
within the response document.
<expression>select local_job_id,queued_time from gram_audit_table where
AND subject_name='/DC=org/DC=doegrids/OU=People/CN=Peter G Lane
If you safe this to a file named, say, ./perform_audit.xml, then you
can execute the following command using the "ogsadai-client" found in
my /home/lane on tg-grid1:
% ogsadai-client -u
Audit_tg-grid1.uc.teragrid.org -tls encrypt ./perform_audit.xml
This will query the audit database through OGSA-DAI using the query
specified in the above perform document. The value of the -k option is
an arbitrary resource key I picked for associating with audit queries.
The will return the following resutl document on stdout:
<?xml version="1.0" encoding="UTF-8"?>
<ns1:result name="statement" status="COMPLETED"/>
<ns1:result name="statementRSToXML" status="COMPLETED"/>
<ns1:result name="statementOutput" status="COMPLETED"><![CDATA[<webRowSet
The interesting part is the 5th and 4th to last lines ("<columnValue>"
and from now on I'll cut the boring stuff out. Ideally a custom client would be
created that converts the XML document into a data structure that can be walked
through via an API. At any rate, this results document gives the values of the
columns we requested (logal_job_id and queued_time). Specifically, these values
are "288114.tg-master.uc.teragrid.org" and "2006-06-30 21:09:44".
Changing the <expression> element in the above perform document to the
expression>select local_job_id,queued_time from gram_audit_table where
will yield the following result when the ogsadai-client command is run again:
The <columnValue> elements contain the local_job_id and queued_time
data for each job that user "lane" has submitted to this compute
> Accounting Query
Now copy the audit perform document and name it perform_accounting.xml.
Change the <expression> element to the following:
<expression>select charge from jobs where
resource_name='dtf.anl.teragrid' and '2006-06-22 15:44:10' between submit_time
- INTERVAL '24 hours' and submit_time + INTERVAL '24 hours'</expression>
The above query is an example of a query OGSA-DAI might make to the
accounting database after it has obtained the results from the
first audit query example above.
Execute the following command to query the accounting database via
OGSA-DAI using the new perform document:
% ogsadai-client -u
TeragridAccounting -tls encrypt ./perform_accounting.xml
Notice that I changed the value of the -k option to "TeragridAccounting".
Again, this is an arbitrary resource key. In this case it is associated
with performing queries to the TeraGrid-wide accounting database. The
results of the command are as follows:
So the charge on the job with Grid ID
has a charge of "0.00384666666666667".
> Charge Query
> 1) On tg-grid1, got to ~lane/ogsadai-teragrid <<<different directory>>>
I put the demo-client script in /home/lane so people can find it more easily.
> 2) Execute the following:
> ./demo-client \
> https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid \
> TeraGridResource \
> <job GID>
To do the above combined audit/accounting query automatically, execute the
% ./demo-client \
The output of the command are as follows:
Data Service Resource ID: TeraGridResource
Grid Job ID:
User's DN: /DC=org/DC=doegrids/OU=People/CN=Peter G Lane 364243
The charge for this job is: 0.00384666666666667
This is a better example of how a custom client should work. Instead of
returning ugly XML documents, the data is processed using an API and
formatted for human consumption.
The sudoers entries have been added so that the WS GRAM can be used in
multi-user mode. I've submitted a WS GRAM job and will check tomorrow to make
sure the charge queries are working. I gave some documentation on testing
everything to Nancy Wilkins-Diehr, and Steve Keinvehn put it up on a web page
that can be found here:
All that's left is for JP to get a gatekeeper running as root so I can test
pre-WS in multi-user mode.
This campaign has been reassigned. Outstading Deliverables:
1. Run a test GRAM2 and GRAM4 job and check usage information.
2. Spruce up existing document on submitting a remote job and running usage
clients installed on tg-grid1.uc.teragrid.org
3. Write up documentation on client side installation and API calls that will
be required for the gateways to incorporate programatic usage information query
I am waiting on TG allocation to test things out. But created a minimum set of
client jars that will be required and documentation on steps involved. A tar.gz
of the files can be downlaoded at
(too large for bugzilla)
Confirmed that this minimum set of jars can be used build a client that mimic
the sample client code. Uploading the README and jars to bugzilla. Account
usage information will take a while to get updated, so currently usage query
does not return with that information. Will need to run cleint again alter.
Created an attachment (id=1113) [details]
Readme file for TeraGrid Client side usage query
On testing this using the API provided to convert EPR to string, an issue with
EPRs written to files or serialized differently was found. A simpler algorithm,
the extracts the resource key value and the to address to generate digest has
been committed. Standalone testing of GRAM audit has been completed. Will now
have to get the TG install updated with latest code.
Updated docuemntation and loaded files:
All tests work on on TG, but accounting database does not seem to have the job
Accounting database update is apparentely backed up.
The accounting database seems to have information about the test job and the
client query for accounting works. Closing campaign.