Bugzilla – Bug 3714
Add Job elements to GLUECE RP
Last modified: 2012-09-05 11:42:51
You need to log in before you can comment on or make changes to this bug.
I don't think it would be too hard to make the GLUE scheduler providers list Job elements based on inspections of the job persistence data. The ce:JobType has an xsd:any, so we could even stick the full EPR in there if we wanted to. Basing this off the persistence data also has the advantage that we don't have to setup create, remove, and status change events to keep the job list current. If we switch to a DB this will be even easier.
Here's an example of the GLUCE resource property after modifying the fork provider: <ns1:ServiceMetaDataInfo xmlns:ns1="http://mds.globus.org/metadata/2005/02"> <ns1:startTime>2005-09-02T19:37:39.590Z</ns1:startTime> <ns1:version>4.0.1</ns1:version> <ns1:serviceTypeName>ManagedJobFactoryService</ns1:serviceTypeName> </ns1:ServiceMetaDataInfo><ns2:GLUECE xmlns:ns2="http://mds.globus.org/glue/ce/1.1"> <ns2:ComputingElement ns2:Name="default" ns2:UniqueID="default"> <ns2:Info ns2:GRAMVersion="4.0.1" ns2:LRMSType="Fork" ns2:LRMSVersion="1.0" ns2:HostName="logan" ns2:TotalCPUs="1"/> <ns2:State ns2:EstimatedResponseTime="0" ns2:FreeCPUs="1" ns2:RunningJobs="0" ns2:Status="enabled" ns2:TotalJobs="0" ns2:WaitingJobs="0" ns2:WorstResponseTime="0"/> <ns2:Policy ns2:MaxCPUTime="0" ns2:MaxRunningJobs="0" ns2:MaxTotalJobs="0" ns2:MaxWallClockTime="0" ns2:Priority="0"/> <ns2:Job ns2:GlobalID="b2cd7359-aa17-49e6-4667-510675dce61b" ns2:GlobalOwner="/DC=org/DC=doegrids/OU=People/CN=Peter G Lane 291467" ns2:LocalID="14122" ns2:LocalOwner="lane" ns2:Status="CleanUp"> <ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/addressing"> <ns00:Address>https://192.168.0.101:8443/wsrf/services/ManagedExecutableJobService</ns00:Address> <ns00:ReferenceProperties> <ResourceID xmlns="http://www.globus.org/namespaces/2004/10/gram/job">b2cd7359-aa17-49e6-4667-510675dce61b</ResourceID> </ns00:ReferenceProperties> <wsa:ReferenceParameters xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"/> </ns00:EndpointReferenceType> </ns2:Job> <ns2:Job ns2:GlobalID="c4cb6c49-a244-478d-44c9-1c25555f286c" ns2:GlobalOwner="/DC=org/DC=doegrids/OU=People/CN=Peter G Lane 291467" ns2:LocalID="13012" ns2:LocalOwner="lane" ns2:Status="CleanUp"> <ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/addressing"> <ns00:Address>https://192.168.0.101:8443/wsrf/services/ManagedExecutableJobService</ns00:Address> <ns00:ReferenceProperties> <ResourceID xmlns="http://www.globus.org/namespaces/2004/10/gram/job">c4cb6c49-a244-478d-44c9-1c25555f286c</ResourceID> </ns00:ReferenceProperties> <wsa:ReferenceParameters xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"/> </ns00:EndpointReferenceType> </ns2:Job> </ns2:ComputingElement> </ns2:GLUECE> Some of the other providers are sh scripts instead of perl scripts so I will probably just port those so I don't have to rewrite the code to do this for the others. We still need to think about this, though, because the providers are only called every five minutes. This means that this is only useful for getting a list of long-running jobs. Short-running jobs may get on the list by luck, but it's not a reliable method for getting current a current job list. Also, I'm not sure the status attribute should be included since a) it's not very reliable due to the update rate and thus not very usefull, and b) although the schema doesn't dictate an enumeration, there are comments that suggest that the values are restricted and not compatible with our state values.
PBS is done. Condor has been converted to Perl and is done. Need to do LSF.
Peter, although I am a new voice to this post, I'd like to discuss an alternative design if I may. I am very interested in what you are doing since we to are doing similar things in an internal prototype we are developeing. Currently your code uses the ./globus/persisted information and briefly mentions using a database later on. To me this is too low level for accessing this job information, because it uses internal implementation details. Consider if a database were used instead then your code would have to change to access it. This means that your code is dependent on the persistent implementation whereas I think it should be independent. So instead I would like to suggest that you access this job information resources from the externally published GRAM interface (as opposed to under the covers or using the internal workings of the GRAM implementation). Then as the internal workings of the GRAM implementation changes your code would be unaffected. What do you think? Would this be a better alternative?
Doing it via the external gram interface (via the Resource Properties of the MEJRs) makes sense, but it is probably not the most efficient method. It would be a difference of 1000s of jobs each updating their job information individually, or a single information provider that can gather this information up via a single DB query or in the current example, reading many files. We are strongly considering going with a DB for storing the job persistence information in the 4.2 version of ws gram. I think the efficiency is worth the cost of multiple information provider implementations.
Stu, I agree that directly using the internal file system or database is somewhat more efficient that using the external GRAM interface due to the overhead of getting through the interface to access the internal storage and file system/database information. However, I am not suggesting anything significantly new or grossly inefficient, that is, I too would suggest a single information provider to gather the job information (IOW, I am not suggesting that each job does its own thing). Yet I would suggest that the information provider uses an external GRAM interface to get the current list of job resources being managed by GRAM which in turn are mapped to the aforementioned persistent file system/database entries. Therefore in 4.2 you can easily move to a DB and any existing information providers (ours and yours included) need not be recoded/updated/modified. Sidebar: By including the access to the internal persistent store in your latest information providers GT4 is broadcasting to all information provider developers that this is how they should do business (that is, write code) and I think this is sending the wrong message. GT4 instead should be using "fixed/standardized" external interfaces and be promoting them to all information provider developers ourselves and yourselves included. Maybe because accessing the external GRAM interface is more difficult in today's shell script information providers using a command-line interface, I can understand some of this reticence, however, if both cluster providers and scheduler providers were able to be written in Java (a separate enhancement I will request shortly) then using an external interface might seem more appropriate than accessing the internally formatted and organized filesystem and/or database directly. Locking into compiled code the internal design/implementation conventions of using and accessing this persistent information is something I personally would not like to insure in future releases. Finally IMHO if you wish to have grid scheduler vendors develop for the GT4 environment, you should plan of a clear separation of ownership. Although GT4 today currently ships with scheduler providers for fork, lsf, pbs and condor, and cluster providers for hawkeye and ganglia, I would consider the scheduler/cluster provider capability to be the responsibility of these vendors[Much like operating system device drivers]. Therefore I assume that you should define APIs and the information provider developers should use to them (Naturally one of the most obvious APIs is the current external GRAM interface). If APIs are appropriate, then exposing the internal design and implemenation of persistent store directly to the vendors would be something I'd try not to do so that GT4 is free to change later without impacting your vendor's efforts. Please excuse me if I seem too assertive, I'm just interested in your point of view on this topic and I'm sorry if I'm coming off too strong on the subject - for that I apologize.
I'm a bit confused here. There is no public interface to the job data that we can use. That's exactly what we are trying to provide in the first place through this provider. If we had a public interface to the job data, we wouldn't need to add the capability to the provider to generate the job data. So while I understand the desire to have the provider using an open interface, I don't see how it makes sense in practice. We could generate the data internally, but then this would be overriding what a provider might be generating or be redundant information. Can you be more specific about how you propose we go about this? Thanks.
Peter, yes I agree, there is no public external interface. What I was suggesting is to add one to the ManagedJobFactoryService, for example, queryJobs, which gets the list of jobs from a new method in the ManagedExecutableJobHome to get the resources map entries. Then the provider can invoke this new interface and it is isolated from the implementation details (file system or database persistence that we talked about earlier). I hope that doing something like this doesn't go against the overall architecture and design? I think the big question is, is it exposing too much information through a non-MDS interface.
Right, so you're advocating either exposing redundant information or requiring that the provider query the jobs to fill in the information using a job ID list to generate EPRs. The former is impractical and the latter is impossible (from a security point of view). I think if we are going to avoid this problem of openly accessing non-public sources of job data in our default provider, we either need to override any job data comming from the provider and insert our own within the Java code or throw out the idea of use the GLUECE RP and publish the job data in a completely different RP. I dislike just ignoring an RP that exists already for the purpose of exposing this information, so this brings up questions of who owns the provider invocation code and whether it can be dependent on the GRAM service code. Somebody from the MDS team needs to chime in about this.
Peter, I've stopped contributing and I appreciated your feedback and comments. We seem to need the ability to have lower level service interfaces as part of the GT4 definition which can be only invoked by the WSRF container itself and not by outside parties, that is, clients of the container. Similar in manner to Java's package level access for all classes in the package but are not accessible to those classes outside of the package. Thanks for listening and we'll leave it at that.
Reassigning to current GRAM developer to close/fix as appropriate.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363