Bugzilla – Full Text Bug Listing
|Product:||GRAM||Reporter:||Jens-S. Vöckler <firstname.lastname@example.org>|
|Component:||gt2 Gatekeeper/Jobmanager||Assignee:||Joe Bester <email@example.com>|
|Severity:||minor||CC:||firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com|
new methods in base class jobmanager
minor improvements to stdiomerger.
Major condor-jm improvements
updates to PBS jobmanager
LSF proposed diff (untested)
For the distributed GT 2.4.0, I modified the JobManager.pm script to expose the (old) module-local fork_and_exec_cmd and pipe_out_cmd as class methods (new) to the child jobmanager scripts. I also added a method nfssync, which tries to enforce an NFS update by touching a file (rather by calling utime() instead of invoking a separate program). While I have not yet found all places where to add NFS sync update methods, the gram scratch directory definitely does need an NFS update after the scratch dir was created. I will append the files in the next step, as the default bug submission interface does not allow me to attach files... I
Created an attachment (id=116) [details] new methods in base class jobmanager This adds the new methods nfssync( filename, docreate ) and converts the fork_and_exec_cmd(..) and pipe_out_cmd(..) functions into methods. Documentation for these methods is also being added.
Created an attachment (id=117) [details] minor improvements to stdiomerger. Minor improvements in the stdiomerger, mostly cosmetic.
Created an attachment (id=118) [details] Major condor-jm improvements This patch removes the evil IO::* module and replaces them with the native Perl medthods. Furthermore, the newly available fork_and_exec_cmd and pipe_out_cmd avoid unecessary /bin/sh invocations, and are used for job submission (instead of system and backquote). Also some minor cosmetical improvements. Tried to add the NFS sync as appropriate.
Created an attachment (id=119) [details] updates to PBS jobmanager Removed unecessary modules loads from the pbs module: The POSIX module is only being called for the ceil() method - replaced with my own (to be heavy tested). Removed all evil IO::* modules. Added nfssync, when I could see it. Replaced subprocess invocation with more efficient fork_and_exec_cmd and pipe_out_cmd as applicable. The code does run on dg0n13:2120/jobmanager-pbs
Jen, Thanks for the patch submissions. We are still focused on gt3, so it might be a few weeks until we can review these proposed patches. -Stu
I have a LSF patch on my home system, which I will submit once I bootet that one. The LSF patch is based on the experience from Condor/PBS, but since I don't have an LSF system available to me, I can only extrapolate. I will publish the LSF patch here soon.
Created an attachment (id=122) [details] LSF proposed diff (untested) This is an update to the LSF jobmanager to use the new functionality of NFS sync'ing and avoiding /bin/sh invocations. Also, the IO::File was removed for improved resource efficiency. Note that I don't have an LSF system with access to, so this is untested.
These patches have been applied to the CVS trunk and tested. joe
It appears as if these patches are still not in the 2.4.3 release, and people are still stumbling over this problem. Or is my perception somehow skewed?