Bug 4528 - WS-GRAM Auditing Test Integration on TeraGrid
: WS-GRAM Auditing Test Integration on TeraGrid
Status: RESOLVED FIXED
: GRAM
Campaign
: development
: PC Linux
: P3 normal
: TG
Assigned To:
:
:
:
: 4409
  Show dependency treegraph
 
Reported: 2006-06-16 17:26 by
Modified: 2007-02-14 15:26 (History)


Attachments
script that inserts GT version into Pre-WS audit records (972 bytes, application/x-perl)
2006-06-30 19:31, Peter Lane
Details
Readme file for TeraGrid Client side usage query (12.64 KB, text/plain)
2006-10-30 12:25, Rachana Ananthakrishnan
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-06-16 17:26:13
Title: WS-GRAM Auditing Test Integration on TeraGrid

Projects:

TeraGrid


Technologies:

Globus Resource Allocation Manager (GRAM)
OGSA-DAI

Definition:

An auditing mechanism for WS-GRAM and a proof-of-concept interface to
compound audit/TeraGrid accounting database queries has been created
using OGSA-DAI at the request of the TeraGrid infrastructure team. The
next step is to actually deploy these components on TeraGrid to get a
working example. This will provide a fully integrated proof of concept
for the entire setup as well as allow TeraGrid people to use it and
report back on how they would like to use it (i.e. what speicifc queries
will they need). Additional campaigns may need to be created to add
additional OGSA-DAI activities to support the desired query set.


Deliverables:

1) A Globus Toolkit installation on a TeraGrid machine with WS-GRAM  
and supporting services from the globus_4_0_community branch.
2) An audit database setup somewhere on TeraGrid that is accessible  
from the machine used in #1.
3) OGSA-DAI deployed in the container from #1.
4) TeraGrid-specific resources and activities for OGSA-DAI in #3  
installed and configured (see http://bugzilla.globus.org/bugzilla/ 
show_bug.cgi?id=4412) for the TeraGrid-wide accounting database and  
the auditing database from #2.
5) Documentation for TeraGrid users on getting GRAM auditing data and  
TeraGrid accounting data.


Tasks:

1) Create a Globus Toolkit installer from the globus_4_0_community  
branch.
2) Determine which TeraGrid machine will be used for the test  
installation.
3) Install the Globus Toolkit from the installer created in #1 on the  
machine determined in #2.
4) Install the WSRF version of OGSA-DAI.
5) Install the TeraGrid resources for OGSA-DAI (obtained from link in  
deliverable #4).
6) Create a database for logging audit records from WS-GRAM installed  
in #3.
7) Configure the resources installed in #5 to use the audit database  
in #6.
8) Test by submitting jobs to the container installed in #3 and using  
the DemoClient provided with the resources in #5 to obtain the charge  
for the submitted jobs.
9) Document the commands used in the testing from #8.


Time Estimate:

5 days
------- Comment #1 From 2006-06-22 11:42:59 -------
I had to do task #1 on an ia32 node since the ia64 nodes wasn't being nice
about creating an installer.

For task #2 we've been told by JP to use tg-grid1.uc.teragrid.org.

Task #3 is mostly done. I'm setting up security right now, but that may need to
change since I don't have access to a TeraGrid host cert. I'm using my DOE user
proxy for now.

Task #6 is partially done. We have a database allocated but it isn't configured
with the audit schema just yet.
------- Comment #2 From 2006-06-22 14:06:47 -------
Task #3 is fully done. Audit logging is happening to the database. I forced the
container to use the FQDN instead of the IP to get more human-readalbe job
GIDs.

Task 4-5, and 7 are done as well. I need the correct resource ID for
tg-grid1.uc.teragrid.org in order to setup the host mappings properly. After
that I can start testing the OGSA-DAI interface.
------- Comment #3 From 2006-06-23 13:21:42 -------
It looks like jobs submitted via PBS go to tg-master.uc.teragrid.org. I checked
the accounting DB and the associated resource ID appears to be
"dtf.anl.teragrid".

Unfortunately I haven't been able to test yet because my job isn't showing up
in the accounting database.
------- Comment #4 From 2006-06-26 13:38:48 -------
According to Michael Shapiro someone took down AIME for some reason and he
doesn't know when it will be back up. I'll keep looking for my job daily, but
this campaign is stalled until the TG accounting database is up to date.
------- Comment #5 From 2006-06-28 19:29:36 -------
The accounting database finally got my job info. Unfortunately I'm having
problems getting the DemoClient (queries for the charge based on a jobs GID) to
do anything but host authorization. I wrote Ally an email, so hopefully he can
point me in the right direction.
------- Comment #6 From 2006-06-30 13:41:15 -------
I have the OGSA-DAI stuff working and running on port 9554, but multi-user job
submissions is offline still. I need the two magic sudoers lines added before
anybody other than the globus user can submit jobs.

Also, the sudoers line to allow me to start and stop the container through the
init.d script still needs to be added. This isn't a show stopper since I can
manually use globus-[start|stop]-container-detached to do the same thing.

Here are my notes during my installation of all the WS-based stuff as the
globus user. They are very rough and quite dependent on both my personal
installation that I had and the one running under the globus account.
Nevertheless, I think it should be documented so we can develop better
instructions later on as well as a more scripted deployment.

1) Install globus_4_0_community branch.
2) Copy the following files to $GLOBUS_LOCATION/lib
    activation.jar
    jakarta-oro-2.0.8.jar
    lucene-1.4.3.jar
    mail.jar
    ogsadai-activities.jar
    ogsadai-core.jar
    ogsadai-examples.jar
    ogsadai-teragrid.jar
    ogsadai-tools.jar
    ogsadai-wsrf-stubs.jar
    ogsadai-wsrf.jar
    postgresql-8.0-315.jdbc3.jar
    xmldb.jar
3) Copy $GLOBUS_LOCATION/etc/ogsadai_wsrf
4) Copy $GLOBUS_LOCATION/share/schema/ogsadai
5) Copy the values for the following parameters to
$GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml:
            <parameter>
                <name>
                connectionString
                </name>
                <value>
                jdbc:postgresql://tg-mayor1.uc.teragrid.org/gt4auditgram
                </value>
            </parameter>
            <parameter>
                <name>
                userName
                </name>
                <value>
                gt4audit
                </value>
            </parameter>
           <parameter>
                <name>
                password
                </name>
                <value>
                OMITTED FOR SECURITY REASONS
                </value>
            </parameter>
6) Add the following lines to container-log4j.properties:
# AUDIT
log4j.appender.AUDIT=org.globus.exec.utils.AuditDatabaseAppender
log4j.appender.AUDIT.layout=org.apache.log4j.PatternLayout
log4j.category.org.globus.exec.service.exec.StateMachine.audit=INFO, AUDIT
#log4j.category.org.globus.exec.service.exec.StateMachine.audit=INFO, A1
log4j.additivity.org.globus.exec.service.exec.StateMachine.audit=false
7) Add the following parameters to
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd:
       <parameter name="logicalHost"
                   value="tg-grid1.uc.teragrid.org"/>
        <parameter name="publishHostName"
                   value="true"/>
8) Find all files with /home/lane in it and execute the following substitution:
:% s#home/lane/globus/globus-community#home/globus/Audit/globus#g
9) Add service host mapping to
$GLOBUS_LOCATION/etc/ogsadai_wsrf/TeraGridResource/hostToResource.txt
10) Copy the values for the following parameters to
$GLOBUS_LOCATION/etc/gram-service/jndi-config.xml:
                <parameter>
                    <name>url</name>
                   
<value>jdbc:postgresql://tg-mayor1.uc.teragrid.org/gt4auditgram?USESSL=force&amp;ssl=true&amp;sslfactory=org.postgresql.ssl.NonValidatingFactory</value>
                </parameter>
                <parameter>
                    <name>user</name>
                    <value>gt4audit</value>
                </parameter>
                <parameter>
                    <name>password</name>
                    <value>OMITTED FOR SECURITY REASONS</value>
                </parameter>
11) Copy /soft/globus-wsrf-4.0.1-r3/lib/perlGlobus/GRAM/JobManager/pbs.pm to
$GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/
------- Comment #7 From 2006-06-30 13:44:24 -------
I also fixed a couple of bugs in the WS audit logging stuff, so a fresh
checkout of globus_4_0_community is required. I updated the deployment on
tg-grid1, so it shouldn't be a problem there anymore.
------- Comment #8 From 2006-06-30 19:28:59 -------
I was able to get globus-personal-gatkeeper jobs to write audit records. I
believe I configured the non-personal gatekeepers appropriately, but I need
someone with root access on tg-grid1 to start those gatekeepers up before they
can be tested.

As for uploading the prews audit records, there's a bug whereby the GT version
is set to "NULL" instead of a valid version string. This causes a null value to
be uploaded to the DB which isn't allowed by the table schema. I wrote a script
(attachement coming) that will substitute in the value returned by
$GLOBUS_LOCATION/bin/globus-version to get things working. This means that
prews audit record uploading will be a two step process until Joe fixes that.

I've ran a test audit query for the job I submitted via
globus-personal-gatekeeper without problems. The accounting query returned no
results for the job as expected since accounting record uploading hasn't been
done yet today. Otherwise the accounting query worked fine. I haven't tried the
charge query since the accounting records isn't present yet.
------- Comment #9 From 2006-06-30 19:31:55 -------
Created an attachment (id=990) [details]
script that inserts GT version into Pre-WS audit records

Since the Pre-WS GRAM audit records are being generated with a null GT version,
this script compensates for that by reading in all *.gramaudit files in a
directory and writing out files with the same name but with a ".fixed" suffix
that contains the GT version obtained from $GLOBUS_LOCATION/bin/globus-version.
------- Comment #10 From 2006-06-30 19:34:45 -------
As documentation for the documentation deliverable (#5), I'm copying an email I
just wrote to some people so they could try things out themselves:


> Audit Query
> -----------

Here is an example perform document that simulates the first half of what
OGSA-DAI would do in a charge query:

<?xml version="1.0" encoding="UTF-8"?>
<!-- (c) International Business Machines Corporation, 2002 - 2005.-->
<!-- (c) University of Edinburgh, 2002 - 2005.-->
<!-- See OGSA-DAI-Licence.txt for licencing information.-->

<perform
  xmlns="http://ogsadai.org.uk/namespaces/2005/10/types">

  <documentation>
    This example performs a simple select statement to retrieve
    one row from the test database. The results are delivered
    within the response document.
  </documentation>

  <sqlQueryStatement name="statement">
    <expression>select local_job_id,queued_time from gram_audit_table where
job_grid_id='https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU='
AND subject_name='/DC=org/DC=doegrids/OU=People/CN=Peter G Lane
364243'</expression>
    <resultStream name="statementOutputRS"/>
  </sqlQueryStatement>
  <sqlResultsToXML name="statementRSToXML">
    <resultSet from="statementOutputRS"/>
    <webRowSet name="statementOutput"/>
  </sqlResultsToXML>

</perform>

If you safe this to a file named, say, ./perform_audit.xml, then you
can execute the following command using the "ogsadai-client" found in
my /home/lane on tg-grid1:

% ogsadai-client -u
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid -k
Audit_tg-grid1.uc.teragrid.org -tls encrypt ./perform_audit.xml

This will query the audit database through OGSA-DAI using the query
specified in the above perform document. The value of the -k option is
an arbitrary resource key I picked for associating with audit queries.

The will return the following resutl document on stdout:

<?xml version="1.0" encoding="UTF-8"?>
<ns1:response xmlns:ns1="http://ogsadai.org.uk/namespaces/2005/10/types">
    <ns1:session id="session-ogsadai-10c2730f792"/>
    <ns1:request status="COMPLETED"/>
    <ns1:result name="statement" status="COMPLETED"/>
    <ns1:result name="statementRSToXML" status="COMPLETED"/>
    <ns1:result name="statementOutput" status="COMPLETED"><![CDATA[<webRowSet
xmlns="http://java.sun.com/xml/ns/jdbc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/jdbc
http://java.sun.com/xml/ns/jdbc/webrowset.xsd">
<properties>
<command></command>
<concurrency>1007</concurrency>
<datasource></datasource>
<escape-processing>true</escape-processing>
<fetch-direction>1000</fetch-direction>
<fetch-size>0</fetch-size>
<isolation-level>0</isolation-level>
<key-columns></key-columns>
<map></map>
<max-field-size>0</max-field-size>
<max-rows>0</max-rows>
<query-timeout>0</query-timeout>
<read-only>true</read-only>
<rowset-type>ResultSet.TYPE_FORWARD_ONLY</rowset-type>
<show-deleted>false</show-deleted>
<table-name></table-name>
<url></url>
<sync-provider>
<sync-provider-name/>
<sync-provider-vendor/>
<sync-provider-version/>
<sync-provider-grade/>
<data-source-lock/>
</sync-provider>
</properties>
<metadata>
<column-count>2</column-count>
<column-definition>
<column-index>1</column-index>
<auto-increment>false</auto-increment>
<case-sensitive>true</case-sensitive>
<currency>false</currency>
<nullable>1</nullable>
<signed>false</signed>
<searchable>true</searchable>
<column-display-size>512</column-display-size>
<column-label>local_job_id</column-label>
<column-name>local_job_id</column-name>
<schema-name></schema-name>
<column-precision>512</column-precision>
<column-scale>0</column-scale>
<table-name></table-name>
<catalog-name></catalog-name>
<column-type>12</column-type>
<column-type-name>varchar</column-type-name>
</column-definition>
<column-definition>
<column-index>2</column-index>
<auto-increment>false</auto-increment>
<case-sensitive>false</case-sensitive>
<currency>false</currency>
<nullable>1</nullable>
<signed>false</signed>
<searchable>true</searchable>
<column-display-size>26</column-display-size>
<column-label>queued_time</column-label>
<column-name>queued_time</column-name>
<schema-name></schema-name>
<column-precision>0</column-precision>
<column-scale>6</column-scale>
<table-name></table-name>
<catalog-name></catalog-name>
<column-type>93</column-type>
<column-type-name>timestamp</column-type-name>
</column-definition>
</metadata>
<data><currentRow>
<columnValue>287254.tg-master.uc.teragrid.org</columnValue>
<columnValue>2006-06-22 15:44:10</columnValue>
</currentRow></data>
</webRowSet>]]></ns1:result>
</ns1:response>

The interesting part is the 5th and 4th to last lines ("<columnValue>"
elements),
and from now on I'll cut the boring stuff out. Ideally a custom client would be
created that converts the XML document into a data structure that can be walked
through via an API. At any rate, this results document gives the values of the
columns we requested (logal_job_id and queued_time). Specifically, these values
are "288114.tg-master.uc.teragrid.org" and "2006-06-30 21:09:44".

Changing the <expression> element in the above perform document to the
following:

expression>select local_job_id,queued_time from gram_audit_table where
username='lane'</expression>

will yield the following result when the ogsadai-client command is run again:

[...]
<data><currentRow>
<columnValue>3d985e94-0221-11db-8a3d-0007e9d81215:2624</columnValue>
<columnValue>2006-06-22 13:59:32</columnValue>
</currentRow><currentRow>
<columnValue>287254.tg-master.uc.teragrid.org</columnValue>
<columnValue>2006-06-22 15:44:10</columnValue>
</currentRow><currentRow>
<columnValue><null/></columnValue>
<columnValue><null/></columnValue>
</currentRow><currentRow>
<columnValue><null/></columnValue>
<columnValue><null/></columnValue>
</currentRow><currentRow>
<columnValue>288114.tg-master.uc.teragrid.org</columnValue>
<columnValue>2006-06-30 21:09:44</columnValue>
</currentRow></data>
</webRowSet>]]></ns1:result>
</ns1:response>

The <columnValue> elements contain the local_job_id and queued_time
data for each job that user "lane" has submitted to this compute
resource.

> Accounting Query
> ----------------

Now copy the audit perform document and name it perform_accounting.xml.
Change the <expression> element to the following:

<expression>select charge from jobs where
local_jobid='287254.tg-master.uc.teragrid.org' and
resource_name='dtf.anl.teragrid' and '2006-06-22 15:44:10' between submit_time
- INTERVAL '24 hours' and submit_time + INTERVAL '24 hours'</expression>

The above query is an example of a query OGSA-DAI might make to the
accounting database after it has obtained the results from the
first audit query example above.

Execute the following command to query the accounting database via
OGSA-DAI using the new perform document:

% ogsadai-client -u
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid -k
TeragridAccounting -tls encrypt ./perform_accounting.xml

Notice that I changed the value of the -k option to "TeragridAccounting".
Again, this is an arbitrary resource key. In this case it is associated
with performing queries to the TeraGrid-wide accounting database. The
results of the command are as follows:

[...]
<data><currentRow>
<columnValue>0.00384666666666667</columnValue>
</currentRow></data>
</webRowSet>]]></ns1:result>
</ns1:response>

So the charge on the job with Grid ID
"https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU="
has a charge of "0.00384666666666667".

> Charge Query
> ------------
> 1) On tg-grid1, got to ~lane/ogsadai-teragrid <<<different directory>>>

I put the demo-client script in /home/lane so people can find it more easily.

> 2) Execute the following:
>
> ./demo-client \
> https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid \
> TeraGridResource \
> <job GID>

To do the above combined audit/accounting query automatically, execute the
following command:

% ./demo-client \
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid \
TeraGridResource \
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU=

The output of the command are as follows:

Service URL:             
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid
Data Service Resource ID: TeraGridResource
Grid Job ID:             
https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU=
User's DN:                /DC=org/DC=doegrids/OU=People/CN=Peter G Lane 364243
The charge for this job is: 0.00384666666666667

This is a better example of how a custom client should work. Instead of
returning ugly XML documents, the data is processed using an API and
formatted for human consumption.
------- Comment #11 From 2006-07-11 13:11:39 -------
The sudoers entries have been added so that the WS GRAM can be used in
multi-user mode. I've submitted a WS GRAM job and will check tomorrow to make
sure the charge queries are working. I gave some documentation on testing
everything to Nancy Wilkins-Diehr, and Steve Keinvehn put it up on a web page
that can be found here:

https://repo.teragrid.org/wg/Gateways/gram-audit.html

All that's left is for JP to get a gatekeeper running as root so I can test
pre-WS in multi-user mode.
------- Comment #12 From 2006-10-27 13:51:55 -------
This campaign has been reassigned. Outstading Deliverables:

1. Run a test GRAM2 and GRAM4 job and check usage information.

2. Spruce up existing document on submitting a remote job and running usage
clients installed on tg-grid1.uc.teragrid.org

3. Write up documentation on client side installation and API calls that will
be required for the gateways to incorporate programatic usage information query 
------- Comment #13 From 2006-10-30 09:46:22 -------
I am waiting on TG allocation to test things out. But created a minimum set of
client jars that will be required and documentation on steps involved. A tar.gz
of the files can be downlaoded at
http://www-unix.mcs.anl.gov/~ranantha/gramAuditTgClient.tar.gz 
(too large for bugzilla)
------- Comment #14 From 2006-10-30 12:25:04 -------
Confirmed that this minimum set of jars can be used build a client that mimic
the sample client code. Uploading the README and jars to bugzilla. Account
usage information will take a while to get updated, so currently usage query
does not return with that information. Will need to run cleint again alter.
------- Comment #15 From 2006-10-30 12:25:58 -------
Created an attachment (id=1113) [details]
Readme file for TeraGrid Client side usage query
------- Comment #16 From 2006-11-06 11:12:33 -------
On testing this using the API provided to convert EPR to string, an issue with
EPRs written to files or serialized differently was found. A simpler algorithm,
the extracts the resource key value and the to address to generate digest has
been committed. Standalone testing of GRAM audit has been completed. Will now
have to get the TG install updated with latest code.
------- Comment #17 From 2006-11-08 16:36:43 -------
Updated docuemntation and loaded files:
http://www.teragridforum.org/mediawiki/index.php?title=GRAM4_Audit

All tests work on on TG, but accounting database does not seem to have the job
record. 
------- Comment #18 From 2006-11-09 13:23:36 -------
Accounting database update is apparentely backed up. 
------- Comment #19 From 2006-11-13 09:28:49 -------
The accounting database seems to have information about the test job and the
client query for accounting works. Closing campaign.