Bug 3598 - globusrun-ws -self doesn't work for multi-jobs
: globusrun-ws -self doesn't work for multi-jobs
: unspecified
: Macintosh All
: P3 blocker
: 4.0.1
Assigned To:
  Show dependency treegraph
Reported: 2005-07-26 15:22 by
Modified: 2005-07-28 12:06 (History)

Multi-job auth problems patch (6.04 KB, patch)
2005-07-27 18:08, Peter Lane


You need to log in before you can comment on or make changes to this bug.

Description From 2005-07-26 15:22:54
I built GT 4.0.1 release candidate 2 and edited the following three files:
vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
<!-- comment out the following lines
        <key-file value="/etc/grid-security/containerkey.pem"/>
        <cert-file value="/etc/grid-security/containercert.pem"/>

vim $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml
/* set value of stagingSubject to (no quotes):
    /DC=org/DC=doegrids/OU=People/CN=Nicholas T. Karonis 886776

/* the next line will allow -self to work on globusrun-ws
   with multijobs (and with streaming output?) */
vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
        <!-- NICK

I then fired up the container and tried to run /bin/true as part of a multireq.
Here is what I ran and the output it generated:
G5% globusrun-ws -self -submit -F
ManagedJobFactoryService -f true-multi.xml
Delegating user credentials...Failed.xmlps:// 
globusrun-ws: Error trying to delegate
globus_delegation_client_util: DelegationFactoryPortType_RequestSecurityToken callback failed.
globus_soap_message_module: Failed receiving response 
globus_soap_message_module: Deserialization of {http://schemas.xmlsoap.org/soap/
envelope/}Header failed.

Here are the contents of true-multi.xml:
G5% cat true-multi.xml
        Any time you run a multijob you need a factoryEndpoint
        for the whole job. 
        The URLs that I specify in the xmlns:gram and xmlns:wsa
        should NOT change.  They are name-space URLs that must
        be exactly as they appear here for *all* Globus  jobs everywhere.
        I would change the URL in <wsa:Address> if I was
        using a different whole-job container than the one
        NOTE: in xmlns:gram="...", the 'gram' part is an arbitrary
              variable name that *I* choose (it could be anything).
              Whatever I choose there must match what I specify later 
              in this same factoryEndpoint in <gram:ResourceID>.
              The same is true for the 'wsa' in xmlns:wsa="..."
              and where it also appears in this factoryEndpoint.
        NOTE: *This* factoryEndpoint is *not* inherited by all the jobs.
              In fact, it would be erroneous for the jobs to inherit
              *this* factory endpoint because it is 

    <!-- START: stuff put here is inherited by ALL jobs -->
    <!-- END:   stuff put here is inherited by ALL jobs -->

            *Each* job must have its own factoryEndpoint.
            The URL's specified in xmlns:gram and xmlns:wsa must
            be exactly as they appear here for *all* Globus jobs.
            They identify name spaces.
            The <wsa:Address> identifies the container/service where
            this job is to run.

        <!-- <directory></directory> -->
        <!-- <argument></argument> -->
        <!-- <argument></argument> -->
        <!-- <environment> <name></name><value></value> </environment> -->
        <!-- <stdin></stdin> -->
        <!-- <stdout>env.out</stdout> -->
        <!-- <stderr></stderr> -->

------- Comment #1 From 2005-07-27 11:12:17 -------
I just re-tested this on lucky0 with an updated C transport buffer package, but
I'm still getting the same error.
------- Comment #2 From 2005-07-27 13:34:27 -------
Just a note that the bits about commenting out the <credential> section and
setting stagingSubject are irrelevant.  The real problem occurs when you comment
out the containerSecDesc property and try to submit a multi-job with the -self
option of globusrun-ws.

The deserialization error stems from globusrun-ws choking on a "org.globus.
wsrf.impl.security.authorization.exceptions.AuthorizationException: Policy
decision failed [Caused by: No gridmap file]" fault message.  The root of the
fault is what will ultimately prevent multi-job submissions from working under
user creds regardless of whether the deserialization bug is fixed.

Specifically, the heuristics for determining the subjob auth method don't work
anymore by just commenting out the <credential> section, and there doesn't
appear to be any way to fix that (Rachana, please correct me if I'm wrong). 
Commenting out containerSecDesc also doesn't work anymore because we removed all
the gridmap settings from the service configs and there is no default (it would
be nice if the default was ~/.gridmap like pre-ws).  The only ways to fix this
aside from adding a default gridmap setting in the security code is to either
use the 4.2 authSubject job description element (not helpful for 4.0.x users) or
manually add back in the gridmap settings to the service configs.
------- Comment #3 From 2005-07-27 13:39:42 -------
Just to clarify, the problem isn't the deserialization of the error type but
fact that some types of java faults are not sent along with any soap headers.
The code generated by globus-wsrf-cgen program does not handle that case. This
is documented in bug #2437.

------- Comment #4 From 2005-07-27 14:58:15 -------
Downgrading from blocker status. The error parsing issue isn't critical since
the actual functionality (self-authorization of multijobs) won't work with 4.0.1
------- Comment #5 From 2005-07-27 17:56:54 -------
Since the deserialization error is due to bug #2437, I'm assuming
for this bug to resolve the auth problems with multi-jobs.
------- Comment #6 From 2005-07-27 18:08:48 -------
Created an attachment (id=662) [details]
Multi-job auth problems patch

Attaching the a patch that fixes the auth problems.  I tested it with both
<key-file value="..."/> and no credentials section in the global security
descriptor.  I double checked also that regular host auth was not messed up.
------- Comment #7 From 2005-07-28 12:06:12 -------
Ok, fixes are in the trunk and globus_4_0_branch.  Closing the bug out...