Bug 3714 - Add Job elements to GLUECE RP
: Add Job elements to GLUECE RP
Status: RESOLVED WONTFIX
: GRAM
wsrf discovery interface
: unspecified
: PC Linux
: P3 enhancement
: 4.2.1
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-08-31 11:56 by
Modified: 2012-09-05 11:42 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-08-31 11:56:27
I don't think it would be too hard to make the GLUE scheduler providers list
Job
elements based on inspections of the job persistence data.  The ce:JobType has
an xsd:any, so we could even stick the full EPR in there if we wanted to. 
Basing this off the persistence data also has the advantage that we don't have
to setup create, remove, and status change events to keep the job list current.
 If we switch to a DB this will be even easier.
------- Comment #1 From 2005-09-02 16:44:36 -------
Here's an example of the GLUCE resource property after modifying the fork
provider:

<ns1:ServiceMetaDataInfo xmlns:ns1="http://mds.globus.org/metadata/2005/02">
 <ns1:startTime>2005-09-02T19:37:39.590Z</ns1:startTime>
 <ns1:version>4.0.1</ns1:version>
 <ns1:serviceTypeName>ManagedJobFactoryService</ns1:serviceTypeName>
</ns1:ServiceMetaDataInfo><ns2:GLUECE
xmlns:ns2="http://mds.globus.org/glue/ce/1.1">
 <ns2:ComputingElement ns2:Name="default" ns2:UniqueID="default">
  <ns2:Info ns2:GRAMVersion="4.0.1" ns2:LRMSType="Fork" ns2:LRMSVersion="1.0"
ns2:HostName="logan" ns2:TotalCPUs="1"/>
  <ns2:State ns2:EstimatedResponseTime="0" ns2:FreeCPUs="1" ns2:RunningJobs="0"
ns2:Status="enabled" ns2:TotalJobs="0" ns2:WaitingJobs="0"
ns2:WorstResponseTime="0"/>
  <ns2:Policy ns2:MaxCPUTime="0" ns2:MaxRunningJobs="0" ns2:MaxTotalJobs="0"
ns2:MaxWallClockTime="0" ns2:Priority="0"/>
  <ns2:Job ns2:GlobalID="b2cd7359-aa17-49e6-4667-510675dce61b"
ns2:GlobalOwner="/DC=org/DC=doegrids/OU=People/CN=Peter G Lane 291467"
ns2:LocalID="14122" ns2:LocalOwner="lane" ns2:Status="CleanUp">
   <ns00:EndpointReferenceType
xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/addressing">

<ns00:Address>https://192.168.0.101:8443/wsrf/services/ManagedExecutableJobService</ns00:Address>
    <ns00:ReferenceProperties>
     <ResourceID
xmlns="http://www.globus.org/namespaces/2004/10/gram/job">b2cd7359-aa17-49e6-4667-510675dce61b</ResourceID>
    </ns00:ReferenceProperties>
    <wsa:ReferenceParameters
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"/>
   </ns00:EndpointReferenceType>
  </ns2:Job>
  <ns2:Job ns2:GlobalID="c4cb6c49-a244-478d-44c9-1c25555f286c"
ns2:GlobalOwner="/DC=org/DC=doegrids/OU=People/CN=Peter G Lane 291467"
ns2:LocalID="13012" ns2:LocalOwner="lane" ns2:Status="CleanUp">
   <ns00:EndpointReferenceType
xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/addressing">

<ns00:Address>https://192.168.0.101:8443/wsrf/services/ManagedExecutableJobService</ns00:Address>
    <ns00:ReferenceProperties>
     <ResourceID
xmlns="http://www.globus.org/namespaces/2004/10/gram/job">c4cb6c49-a244-478d-44c9-1c25555f286c</ResourceID>
    </ns00:ReferenceProperties>
    <wsa:ReferenceParameters
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"/>
   </ns00:EndpointReferenceType>
  </ns2:Job>
 </ns2:ComputingElement>
</ns2:GLUECE>

Some of the other providers are sh scripts instead of perl scripts so I will
probably just port those so I don't have to rewrite the code to do this for the
others.

We still need to think about this, though, because the providers are only
called
every five minutes.  This means that this is only useful for getting a list of
long-running jobs.  Short-running jobs may get on the list by luck, but it's
not
a reliable method for getting current a current job list.  Also, I'm not sure
the status attribute should be included since a) it's not very reliable due to
the update rate and thus not very usefull, and b) although the schema doesn't
dictate an enumeration, there are comments that suggest that the values are
restricted and not compatible with our state values.
------- Comment #2 From 2005-09-02 19:11:53 -------
PBS is done.

Condor has been converted to Perl and is done.

Need to do LSF.
------- Comment #3 From 2005-09-09 10:15:25 -------
Peter, although I am a new voice to this post, I'd like to discuss an 
alternative design if I may. I am very interested in what you are doing since 
we to are doing similar things in an internal prototype we are developeing. 
Currently your code uses the ./globus/persisted information and briefly 
mentions using a database later on. To me this is too low level for accessing 
this job information, because it uses internal implementation details. 
Consider if a database were used instead then your code would have to change 
to access it. This means that your code is dependent on the persistent 
implementation whereas I think it should be independent. So instead I would 
like to suggest that you access this job information resources from the 
externally published GRAM interface (as opposed to under the covers or using 
the internal workings of the GRAM implementation). Then as the internal 
workings of the GRAM implementation changes your code would be unaffected. 
What do you think? Would this be a better alternative?
------- Comment #4 From 2005-09-09 10:32:58 -------
Doing it via the external gram interface (via the Resource Properties of the
MEJRs) makes sense, but it is 
probably not the most efficient method.  It would be a difference of 1000s of
jobs each updating their job 
information individually, or a single information provider that can gather this
information up via a single 
DB query or in the current example, reading many files.  We are strongly
considering going with a DB for 
storing the job persistence information in the 4.2 version of ws gram.  I think
the efficiency is worth the 
cost of multiple information provider implementations.
------- Comment #5 From 2005-09-09 15:56:28 -------
Stu, I agree that directly using the internal file system or database is
somewhat more efficient that using the external GRAM interface due to the
overhead of getting through the interface to access the internal storage and
file system/database information. However, I am not suggesting anything
significantly new or grossly inefficient, that is, I too would suggest a single
information provider to gather the job information (IOW, I am not suggesting
that each job does its own thing). Yet I would suggest that the information
provider uses an external GRAM interface to get the current list of job
resources being managed by GRAM which in turn are mapped to the aforementioned
persistent file system/database entries. Therefore in 4.2 you can easily move to
a DB and any existing information providers (ours and yours included) need not
be recoded/updated/modified. 

Sidebar: By including the access to the internal persistent store in your latest
information providers GT4 is broadcasting to all information provider developers
that this is how they should do business (that is, write code) and I think this
is sending the wrong message. GT4 instead should be using "fixed/standardized"
external interfaces and be promoting them to all information provider developers
ourselves and yourselves included. 

Maybe because accessing the external GRAM interface is more difficult in today's
shell script information providers using a command-line interface, I can
understand some of this reticence, however, if both cluster providers and
scheduler providers were able to be written in Java (a separate enhancement I
will request shortly) then using an external interface might seem more
appropriate than accessing the internally formatted and organized filesystem
and/or database directly. Locking into compiled code the internal
design/implementation conventions of using and accessing this persistent
information is something I personally would not like to insure in future releases.

Finally IMHO if you wish to have grid scheduler vendors develop for the GT4
environment, you should plan of a clear separation of ownership. Although GT4
today currently ships with scheduler providers for fork, lsf, pbs and condor,
and cluster providers for hawkeye and ganglia, I would consider the
scheduler/cluster provider capability to be the responsibility of these
vendors[Much like operating system device drivers]. Therefore I assume that you
should define APIs and the information provider developers should use to them
(Naturally one of the most obvious APIs is the current external GRAM interface).
If APIs are appropriate, then exposing the internal design and implemenation of
persistent store directly to the vendors would be something I'd try not to do so
that GT4 is free to change later without impacting your vendor's efforts.

Please excuse me if I seem too assertive, I'm just interested in your point of
view on this topic and I'm sorry if I'm coming off too strong on the subject -
for that I apologize.
------- Comment #6 From 2005-09-09 16:35:12 -------
I'm a bit confused here.  There is no public interface to the job data that we
can use.  That's exactly 
what we are trying to provide in the first place through this provider.  If we
had a public interface to the 
job data, we wouldn't need to add the capability to the provider to generate
the job data.  So while I 
understand the desire to have the provider using an open interface, I don't see
how it makes sense in 
practice.  We could generate the data internally, but then this would be
overriding what a provider 
might be generating or be redundant information.  Can you be more specific
about how you propose 
we go about this?  Thanks.
------- Comment #7 From 2005-09-12 16:19:09 -------
Peter, yes I agree, there is no public external interface. What I was
suggesting
is to add one to the ManagedJobFactoryService, for example, queryJobs, which
gets the list of jobs from a new method in the ManagedExecutableJobHome to get
the resources map entries. Then the provider can invoke this new interface and
it is isolated from the implementation details (file system or database
persistence that we talked about earlier). I hope that doing something like
this
doesn't go against the overall architecture and design? I think the big
question
is, is it exposing too much information through a non-MDS interface.
------- Comment #8 From 2005-09-12 17:39:10 -------
Right, so you're advocating either exposing redundant information or requiring
that the provider query the jobs to fill in the information using a job ID list
to generate EPRs.  The former is impractical and the latter is impossible (from
a security point of view).

I think if we are going to avoid this problem of openly accessing non-public
sources of job data in our default provider, we either need to override any job
data comming from the provider and insert our own within the Java code or throw
out the idea of use the GLUECE RP and publish the job data in a completely
different RP.  I dislike just ignoring an RP that exists already for the purpose
of exposing this information, so this brings up questions of who owns the
provider invocation code and whether it can be dependent on the GRAM service
code.  Somebody from the MDS team needs to chime in about this.
------- Comment #9 From 2005-09-26 20:25:43 -------
Peter, I've stopped contributing and I appreciated your feedback and comments.
We seem to need the ability to have lower level service interfaces as part of
the GT4 definition which can be only invoked by the WSRF container itself and
not by outside parties, that is, clients of the container. Similar in manner to
Java's package level access for all classes in the package but are not
accessible to those classes outside of the package. Thanks for listening and
we'll leave it at that.
------- Comment #10 From 2007-09-19 11:37:57 -------
Reassigning to current GRAM developer to close/fix as appropriate.
------- Comment #11 From 2012-09-05 11:42:51 -------
Doing some bugzilla cleanup...  Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5.  Also, we're now tracking
issue in jira.  Any new issues should be added here:

http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363