Bug 931

Summary: improved jobmanagers
Product: GRAM Reporter: Jens-S. Vöckler <voeckler@cs.uchicago.edu>
Component: gt2 Gatekeeper/JobmanagerAssignee: Joe Bester <bester@mcs.anl.gov>
Status: RESOLVED FIXED    
Severity: minor CC: adesmet@cs.wisc.edu, bester@mcs.anl.gov, gmehta@isi.edu, jfrey@cs.wisc.edu
Priority: P2    
Version: 1.6   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Attachments: new methods in base class jobmanager
minor improvements to stdiomerger.
Major condor-jm improvements
updates to PBS jobmanager
LSF proposed diff (untested)

Description From 2003-05-06 15:07:48
For the distributed GT 2.4.0, I modified the JobManager.pm script to expose the
(old) module-local fork_and_exec_cmd and pipe_out_cmd as class methods (new) to
the child jobmanager scripts. I also added a method nfssync, which tries to
enforce an NFS update by touching a file (rather by calling utime() instead of
invoking a separate program). While I have not yet found all places where to add
NFS sync update methods, the gram scratch directory definitely does need an NFS
update after the scratch dir was created. 

I will append the files in the next step, as the default bug submission
interface does not allow me to attach files... I
------- Comment #1 From 2003-05-06 15:09:59 -------
Created an attachment (id=116) [details]
new methods in base class jobmanager

This adds the new methods nfssync( filename, docreate ) and converts
the fork_and_exec_cmd(..) and pipe_out_cmd(..) functions into methods.
Documentation for these methods is also being added. 
------- Comment #2 From 2003-05-06 15:11:56 -------
Created an attachment (id=117) [details]
minor improvements to stdiomerger.

Minor improvements in the stdiomerger, mostly cosmetic. 
------- Comment #3 From 2003-05-06 15:15:09 -------
Created an attachment (id=118) [details]
Major condor-jm improvements

This patch removes the evil IO::* module and replaces them with the native Perl
medthods. Furthermore, the newly available fork_and_exec_cmd and pipe_out_cmd
avoid unecessary /bin/sh invocations, and are used for job submission (instead
of system and backquote). Also some minor cosmetical improvements. Tried to add
the NFS sync as appropriate.
------- Comment #4 From 2003-05-06 15:17:42 -------
Created an attachment (id=119) [details]
updates to PBS jobmanager

Removed unecessary modules loads from the pbs module: The POSIX module is only
being called for the ceil() method - replaced with my own (to be heavy tested).
Removed all evil IO::* modules. Added nfssync, when I could see it. Replaced
subprocess invocation with more efficient fork_and_exec_cmd and pipe_out_cmd as
applicable. 

The code does run on dg0n13:2120/jobmanager-pbs
------- Comment #5 From 2003-05-08 11:21:41 -------
Jen,

Thanks for the patch submissions.  We are still focused on gt3, so it might be 
a few weeks until we can review these proposed patches.

-Stu
------- Comment #6 From 2003-05-08 11:37:31 -------
I have a LSF patch on my home system, which I will submit once I bootet that
one. The LSF patch is based on the experience from Condor/PBS, but since I don't
have an LSF system available to me, I can only extrapolate. I will publish the
LSF patch here soon. 
------- Comment #7 From 2003-05-11 16:33:16 -------
Created an attachment (id=122) [details]
LSF proposed diff (untested)

This is an update to the LSF jobmanager to use the new functionality of
NFS sync'ing and avoiding /bin/sh invocations. Also, the IO::File was
removed for improved resource efficiency. Note that I don't have an LSF
system with access to, so this is untested. 
------- Comment #8 From 2003-07-10 11:59:35 -------
These patches have been applied to the CVS trunk and tested. 
 
joe 
------- Comment #9 From 2003-10-13 16:39:39 -------
It appears as if these patches are still not in the 2.4.3 release, and people
are still stumbling over this problem. Or is my perception somehow skewed?
------- Comment #10 From 2003-10-13 16:58:33 -------
Should these go in the globus_2_4_branch as well as the trunk?