Bugzilla – Bug 3876
Automatic transfer of all files modified by a job in GRAM
Last modified: 2012-09-05 11:42:57
You need to log in before you can comment on or make changes to this bug.
A feature I'd like to see is an option to automatically stage out all files written by the job. There are a number of applications where what files written is highly dependent on some input files. The submitter may not know the exact list of files that will be written but wants all of them. To work properly, this would require that each job have its own execute directory, which conflicts somewhat with the current ability for the submitter to specify an arbitrary execute directory for the job. One way you could implement this in the current framework would be to introduce a staging/execute directory that's always created and unique per job. If the submitter uses that directory as the job's execute directory, they can use the automatically-transfer-all-ouput option. If they specify a different directory, they can't use the option. Or they're told they risk transferring files unrelated to the job.
What's the problem with telling the application to spit out it's data into a specific directory and then simply transfering that whole directory back?
The main problem I see with that idea is that most of the time, input and output files live in the same directory. You only want to transfer back the output files. Transferring the input files as well would be especially bad if they're very large.
Reassigning to current GRAM developer to close/fix as appropriate.
This functionality exists by specifying a directory to stage out condor-g manages this for the user.
See comment #2. This doesn't provide a way to transfer the output files but not the input files for the job.
I'm reviving this after a discussion with Jaime. One possibility would be to check the job's working dir for any files that have been modified after the LRM job submission took place. If so, then those files are included in the list of files to be staged. In the GRAM job description a new variable could be added that expands to the set of changed files. Something like GLOBUS_MODIFIED_JOB_FILES could be specified in the source URL along with a destination URL with the remote GridFTP dir where the files will be transferred to. This feature would have (would still be) useful for processing NanoHub jobs on OSG. Marking as 4.4 for now.
*** Bug 4397 has been marked as a duplicate of this bug. ***
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363