Bugzilla – Bug 4528
WS-GRAM Auditing Test Integration on TeraGrid
Last modified: 2007-02-14 15:26:53
You need to log in before you can comment on or make changes to this bug.
Title: WS-GRAM Auditing Test Integration on TeraGrid Projects: TeraGrid Technologies: Globus Resource Allocation Manager (GRAM) OGSA-DAI Definition: An auditing mechanism for WS-GRAM and a proof-of-concept interface to compound audit/TeraGrid accounting database queries has been created using OGSA-DAI at the request of the TeraGrid infrastructure team. The next step is to actually deploy these components on TeraGrid to get a working example. This will provide a fully integrated proof of concept for the entire setup as well as allow TeraGrid people to use it and report back on how they would like to use it (i.e. what speicifc queries will they need). Additional campaigns may need to be created to add additional OGSA-DAI activities to support the desired query set. Deliverables: 1) A Globus Toolkit installation on a TeraGrid machine with WS-GRAM and supporting services from the globus_4_0_community branch. 2) An audit database setup somewhere on TeraGrid that is accessible from the machine used in #1. 3) OGSA-DAI deployed in the container from #1. 4) TeraGrid-specific resources and activities for OGSA-DAI in #3 installed and configured (see http://bugzilla.globus.org/bugzilla/ show_bug.cgi?id=4412) for the TeraGrid-wide accounting database and the auditing database from #2. 5) Documentation for TeraGrid users on getting GRAM auditing data and TeraGrid accounting data. Tasks: 1) Create a Globus Toolkit installer from the globus_4_0_community branch. 2) Determine which TeraGrid machine will be used for the test installation. 3) Install the Globus Toolkit from the installer created in #1 on the machine determined in #2. 4) Install the WSRF version of OGSA-DAI. 5) Install the TeraGrid resources for OGSA-DAI (obtained from link in deliverable #4). 6) Create a database for logging audit records from WS-GRAM installed in #3. 7) Configure the resources installed in #5 to use the audit database in #6. 8) Test by submitting jobs to the container installed in #3 and using the DemoClient provided with the resources in #5 to obtain the charge for the submitted jobs. 9) Document the commands used in the testing from #8. Time Estimate: 5 days
I had to do task #1 on an ia32 node since the ia64 nodes wasn't being nice about creating an installer. For task #2 we've been told by JP to use tg-grid1.uc.teragrid.org. Task #3 is mostly done. I'm setting up security right now, but that may need to change since I don't have access to a TeraGrid host cert. I'm using my DOE user proxy for now. Task #6 is partially done. We have a database allocated but it isn't configured with the audit schema just yet.
Task #3 is fully done. Audit logging is happening to the database. I forced the container to use the FQDN instead of the IP to get more human-readalbe job GIDs. Task 4-5, and 7 are done as well. I need the correct resource ID for tg-grid1.uc.teragrid.org in order to setup the host mappings properly. After that I can start testing the OGSA-DAI interface.
It looks like jobs submitted via PBS go to tg-master.uc.teragrid.org. I checked the accounting DB and the associated resource ID appears to be "dtf.anl.teragrid". Unfortunately I haven't been able to test yet because my job isn't showing up in the accounting database.
According to Michael Shapiro someone took down AIME for some reason and he doesn't know when it will be back up. I'll keep looking for my job daily, but this campaign is stalled until the TG accounting database is up to date.
The accounting database finally got my job info. Unfortunately I'm having problems getting the DemoClient (queries for the charge based on a jobs GID) to do anything but host authorization. I wrote Ally an email, so hopefully he can point me in the right direction.
I have the OGSA-DAI stuff working and running on port 9554, but multi-user job submissions is offline still. I need the two magic sudoers lines added before anybody other than the globus user can submit jobs. Also, the sudoers line to allow me to start and stop the container through the init.d script still needs to be added. This isn't a show stopper since I can manually use globus-[start|stop]-container-detached to do the same thing. Here are my notes during my installation of all the WS-based stuff as the globus user. They are very rough and quite dependent on both my personal installation that I had and the one running under the globus account. Nevertheless, I think it should be documented so we can develop better instructions later on as well as a more scripted deployment. 1) Install globus_4_0_community branch. 2) Copy the following files to $GLOBUS_LOCATION/lib activation.jar jakarta-oro-2.0.8.jar lucene-1.4.3.jar mail.jar ogsadai-activities.jar ogsadai-core.jar ogsadai-examples.jar ogsadai-teragrid.jar ogsadai-tools.jar ogsadai-wsrf-stubs.jar ogsadai-wsrf.jar postgresql-8.0-315.jdbc3.jar xmldb.jar 3) Copy $GLOBUS_LOCATION/etc/ogsadai_wsrf 4) Copy $GLOBUS_LOCATION/share/schema/ogsadai 5) Copy the values for the following parameters to $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml: <parameter> <name> connectionString </name> <value> jdbc:postgresql://tg-mayor1.uc.teragrid.org/gt4auditgram </value> </parameter> <parameter> <name> userName </name> <value> gt4audit </value> </parameter> <parameter> <name> password </name> <value> OMITTED FOR SECURITY REASONS </value> </parameter> 6) Add the following lines to container-log4j.properties: # AUDIT log4j.appender.AUDIT=org.globus.exec.utils.AuditDatabaseAppender log4j.appender.AUDIT.layout=org.apache.log4j.PatternLayout log4j.category.org.globus.exec.service.exec.StateMachine.audit=INFO, AUDIT #log4j.category.org.globus.exec.service.exec.StateMachine.audit=INFO, A1 log4j.additivity.org.globus.exec.service.exec.StateMachine.audit=false 7) Add the following parameters to $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd: <parameter name="logicalHost" value="tg-grid1.uc.teragrid.org"/> <parameter name="publishHostName" value="true"/> 8) Find all files with /home/lane in it and execute the following substitution: :% s#home/lane/globus/globus-community#home/globus/Audit/globus#g 9) Add service host mapping to $GLOBUS_LOCATION/etc/ogsadai_wsrf/TeraGridResource/hostToResource.txt 10) Copy the values for the following parameters to $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml: <parameter> <name>url</name> <value>jdbc:postgresql://tg-mayor1.uc.teragrid.org/gt4auditgram?USESSL=force&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory</value> </parameter> <parameter> <name>user</name> <value>gt4audit</value> </parameter> <parameter> <name>password</name> <value>OMITTED FOR SECURITY REASONS</value> </parameter> 11) Copy /soft/globus-wsrf-4.0.1-r3/lib/perlGlobus/GRAM/JobManager/pbs.pm to $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/
I also fixed a couple of bugs in the WS audit logging stuff, so a fresh checkout of globus_4_0_community is required. I updated the deployment on tg-grid1, so it shouldn't be a problem there anymore.
I was able to get globus-personal-gatkeeper jobs to write audit records. I believe I configured the non-personal gatekeepers appropriately, but I need someone with root access on tg-grid1 to start those gatekeepers up before they can be tested. As for uploading the prews audit records, there's a bug whereby the GT version is set to "NULL" instead of a valid version string. This causes a null value to be uploaded to the DB which isn't allowed by the table schema. I wrote a script (attachement coming) that will substitute in the value returned by $GLOBUS_LOCATION/bin/globus-version to get things working. This means that prews audit record uploading will be a two step process until Joe fixes that. I've ran a test audit query for the job I submitted via globus-personal-gatekeeper without problems. The accounting query returned no results for the job as expected since accounting record uploading hasn't been done yet today. Otherwise the accounting query worked fine. I haven't tried the charge query since the accounting records isn't present yet.
Created an attachment (id=990) [details] script that inserts GT version into Pre-WS audit records Since the Pre-WS GRAM audit records are being generated with a null GT version, this script compensates for that by reading in all *.gramaudit files in a directory and writing out files with the same name but with a ".fixed" suffix that contains the GT version obtained from $GLOBUS_LOCATION/bin/globus-version.
As documentation for the documentation deliverable (#5), I'm copying an email I just wrote to some people so they could try things out themselves: > Audit Query > ----------- Here is an example perform document that simulates the first half of what OGSA-DAI would do in a charge query: <?xml version="1.0" encoding="UTF-8"?> <!-- (c) International Business Machines Corporation, 2002 - 2005.--> <!-- (c) University of Edinburgh, 2002 - 2005.--> <!-- See OGSA-DAI-Licence.txt for licencing information.--> <perform xmlns="http://ogsadai.org.uk/namespaces/2005/10/types"> <documentation> This example performs a simple select statement to retrieve one row from the test database. The results are delivered within the response document. </documentation> <sqlQueryStatement name="statement"> <expression>select local_job_id,queued_time from gram_audit_table where job_grid_id='https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU=' AND subject_name='/DC=org/DC=doegrids/OU=People/CN=Peter G Lane 364243'</expression> <resultStream name="statementOutputRS"/> </sqlQueryStatement> <sqlResultsToXML name="statementRSToXML"> <resultSet from="statementOutputRS"/> <webRowSet name="statementOutput"/> </sqlResultsToXML> </perform> If you safe this to a file named, say, ./perform_audit.xml, then you can execute the following command using the "ogsadai-client" found in my /home/lane on tg-grid1: % ogsadai-client -u https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid -k Audit_tg-grid1.uc.teragrid.org -tls encrypt ./perform_audit.xml This will query the audit database through OGSA-DAI using the query specified in the above perform document. The value of the -k option is an arbitrary resource key I picked for associating with audit queries. The will return the following resutl document on stdout: <?xml version="1.0" encoding="UTF-8"?> <ns1:response xmlns:ns1="http://ogsadai.org.uk/namespaces/2005/10/types"> <ns1:session id="session-ogsadai-10c2730f792"/> <ns1:request status="COMPLETED"/> <ns1:result name="statement" status="COMPLETED"/> <ns1:result name="statementRSToXML" status="COMPLETED"/> <ns1:result name="statementOutput" status="COMPLETED"><![CDATA[<webRowSet xmlns="http://java.sun.com/xml/ns/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/jdbc http://java.sun.com/xml/ns/jdbc/webrowset.xsd"> <properties> <command></command> <concurrency>1007</concurrency> <datasource></datasource> <escape-processing>true</escape-processing> <fetch-direction>1000</fetch-direction> <fetch-size>0</fetch-size> <isolation-level>0</isolation-level> <key-columns></key-columns> <map></map> <max-field-size>0</max-field-size> <max-rows>0</max-rows> <query-timeout>0</query-timeout> <read-only>true</read-only> <rowset-type>ResultSet.TYPE_FORWARD_ONLY</rowset-type> <show-deleted>false</show-deleted> <table-name></table-name> <url></url> <sync-provider> <sync-provider-name/> <sync-provider-vendor/> <sync-provider-version/> <sync-provider-grade/> <data-source-lock/> </sync-provider> </properties> <metadata> <column-count>2</column-count> <column-definition> <column-index>1</column-index> <auto-increment>false</auto-increment> <case-sensitive>true</case-sensitive> <currency>false</currency> <nullable>1</nullable> <signed>false</signed> <searchable>true</searchable> <column-display-size>512</column-display-size> <column-label>local_job_id</column-label> <column-name>local_job_id</column-name> <schema-name></schema-name> <column-precision>512</column-precision> <column-scale>0</column-scale> <table-name></table-name> <catalog-name></catalog-name> <column-type>12</column-type> <column-type-name>varchar</column-type-name> </column-definition> <column-definition> <column-index>2</column-index> <auto-increment>false</auto-increment> <case-sensitive>false</case-sensitive> <currency>false</currency> <nullable>1</nullable> <signed>false</signed> <searchable>true</searchable> <column-display-size>26</column-display-size> <column-label>queued_time</column-label> <column-name>queued_time</column-name> <schema-name></schema-name> <column-precision>0</column-precision> <column-scale>6</column-scale> <table-name></table-name> <catalog-name></catalog-name> <column-type>93</column-type> <column-type-name>timestamp</column-type-name> </column-definition> </metadata> <data><currentRow> <columnValue>287254.tg-master.uc.teragrid.org</columnValue> <columnValue>2006-06-22 15:44:10</columnValue> </currentRow></data> </webRowSet>]]></ns1:result> </ns1:response> The interesting part is the 5th and 4th to last lines ("<columnValue>" elements), and from now on I'll cut the boring stuff out. Ideally a custom client would be created that converts the XML document into a data structure that can be walked through via an API. At any rate, this results document gives the values of the columns we requested (logal_job_id and queued_time). Specifically, these values are "288114.tg-master.uc.teragrid.org" and "2006-06-30 21:09:44". Changing the <expression> element in the above perform document to the following: expression>select local_job_id,queued_time from gram_audit_table where username='lane'</expression> will yield the following result when the ogsadai-client command is run again: [...] <data><currentRow> <columnValue>3d985e94-0221-11db-8a3d-0007e9d81215:2624</columnValue> <columnValue>2006-06-22 13:59:32</columnValue> </currentRow><currentRow> <columnValue>287254.tg-master.uc.teragrid.org</columnValue> <columnValue>2006-06-22 15:44:10</columnValue> </currentRow><currentRow> <columnValue><null/></columnValue> <columnValue><null/></columnValue> </currentRow><currentRow> <columnValue><null/></columnValue> <columnValue><null/></columnValue> </currentRow><currentRow> <columnValue>288114.tg-master.uc.teragrid.org</columnValue> <columnValue>2006-06-30 21:09:44</columnValue> </currentRow></data> </webRowSet>]]></ns1:result> </ns1:response> The <columnValue> elements contain the local_job_id and queued_time data for each job that user "lane" has submitted to this compute resource. > Accounting Query > ---------------- Now copy the audit perform document and name it perform_accounting.xml. Change the <expression> element to the following: <expression>select charge from jobs where local_jobid='287254.tg-master.uc.teragrid.org' and resource_name='dtf.anl.teragrid' and '2006-06-22 15:44:10' between submit_time - INTERVAL '24 hours' and submit_time + INTERVAL '24 hours'</expression> The above query is an example of a query OGSA-DAI might make to the accounting database after it has obtained the results from the first audit query example above. Execute the following command to query the accounting database via OGSA-DAI using the new perform document: % ogsadai-client -u https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid -k TeragridAccounting -tls encrypt ./perform_accounting.xml Notice that I changed the value of the -k option to "TeragridAccounting". Again, this is an arbitrary resource key. In this case it is associated with performing queries to the TeraGrid-wide accounting database. The results of the command are as follows: [...] <data><currentRow> <columnValue>0.00384666666666667</columnValue> </currentRow></data> </webRowSet>]]></ns1:result> </ns1:response> So the charge on the job with Grid ID "https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU=" has a charge of "0.00384666666666667". > Charge Query > ------------ > 1) On tg-grid1, got to ~lane/ogsadai-teragrid <<<different directory>>> I put the demo-client script in /home/lane so people can find it more easily. > 2) Execute the following: > > ./demo-client \ > https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid \ > TeraGridResource \ > <job GID> To do the above combined audit/accounting query automatically, execute the following command: % ./demo-client \ https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid \ TeraGridResource \ https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU= The output of the command are as follows: Service URL: https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ogsadai/Teragrid Data Service Resource ID: TeraGridResource Grid Job ID: https://tg-grid1.uc.teragrid.org:9554/wsrf/services/ManagedExecutableJobService?Tb1eLvO6mVl/Of9KGw9nSOmgGmU= User's DN: /DC=org/DC=doegrids/OU=People/CN=Peter G Lane 364243 The charge for this job is: 0.00384666666666667 This is a better example of how a custom client should work. Instead of returning ugly XML documents, the data is processed using an API and formatted for human consumption.
The sudoers entries have been added so that the WS GRAM can be used in multi-user mode. I've submitted a WS GRAM job and will check tomorrow to make sure the charge queries are working. I gave some documentation on testing everything to Nancy Wilkins-Diehr, and Steve Keinvehn put it up on a web page that can be found here: https://repo.teragrid.org/wg/Gateways/gram-audit.html All that's left is for JP to get a gatekeeper running as root so I can test pre-WS in multi-user mode.
This campaign has been reassigned. Outstading Deliverables: 1. Run a test GRAM2 and GRAM4 job and check usage information. 2. Spruce up existing document on submitting a remote job and running usage clients installed on tg-grid1.uc.teragrid.org 3. Write up documentation on client side installation and API calls that will be required for the gateways to incorporate programatic usage information query
I am waiting on TG allocation to test things out. But created a minimum set of client jars that will be required and documentation on steps involved. A tar.gz of the files can be downlaoded at http://www-unix.mcs.anl.gov/~ranantha/gramAuditTgClient.tar.gz (too large for bugzilla)
Confirmed that this minimum set of jars can be used build a client that mimic the sample client code. Uploading the README and jars to bugzilla. Account usage information will take a while to get updated, so currently usage query does not return with that information. Will need to run cleint again alter.
Created an attachment (id=1113) [details] Readme file for TeraGrid Client side usage query
On testing this using the API provided to convert EPR to string, an issue with EPRs written to files or serialized differently was found. A simpler algorithm, the extracts the resource key value and the to address to generate digest has been committed. Standalone testing of GRAM audit has been completed. Will now have to get the TG install updated with latest code.
Updated docuemntation and loaded files: http://www.teragridforum.org/mediawiki/index.php?title=GRAM4_Audit All tests work on on TG, but accounting database does not seem to have the job record.
Accounting database update is apparentely backed up.
The accounting database seems to have information about the test job and the client query for accounting works. Closing campaign.