Bug 3384 - Inconsistent jobType/count parameter semantics
: Inconsistent jobType/count parameter semantics
Status: RESOLVED WONTFIX
: GRAM
wsrf managed execution job service
: 4.0.0
: All All
: P3 normal
: 4.2.1
Assigned To:
:
:
: 3569
:
  Show dependency treegraph
 
Reported: 2005-05-18 16:38 by
Modified: 2012-09-05 11:42 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-05-18 16:38:18
Here is a paste of an email from the dicuss list.  I'm not sure I agree with or
understand Emir's last 
opinion bullet, but I certainly agree that there should be consistency in the
adapters about how they 
interpret these values.  I would also argue that if we can't currently
accomodate all possibilities, we 
should rethink our job description elements to be more specific.  If an adapter
doesn't support a certain 
configuration, then it should return an error.

----------------------------------

Hello,

I tried to find a document which defines what multiple/single job mean when the
count > 1 but was 
unsuccessful.

Special problem is that different schedulers interpret this differently:

1. PBS
    - in case of multiple jobType starts the job <count> times
    - in case of single jobType reserves multiple processors and starts the job
on only once.

2. SGE
    - in case of multiple jobType reserves one node and starts the job <count>
times
    - in case of single jobType starts job array with <count> tasks

3. Condor
    - starts job <count> times in both cases.

My opinion is that
- one of these cases should start the same job <count> times (job array) and
- the other should enable user to request <count> processors and leave the user
possiblity to start the 
tasks itself. This option can be used for some alternative parallel jobs, like
Linda, Gaussian, etc.

In anyway it would be nice if all the adapters should then behave according to
the defined semantics.

Kind regards,
Emir Imamagic
------- Comment #1 From 2005-05-18 16:50:38 -------
If/when we end up switching to JSDL, this should be remedied since it has more
specific elements like 
CPUCount and ProcessCount.

I'm not sure what would be used for the SGE job array concept, though.  I'm not
even sure how that 
differes from the multiple-count method.
------- Comment #2 From 2005-05-18 16:54:28 -------
I think the PBS implementation is correct.

SGE seems broken and should follow the PBS case.

Condor seems a bit different.  Jaime can you comment here?  What are the options to how count is 
handled in Condor?
------- Comment #3 From 2005-05-18 16:57:49 -------
So this may be mostly a documentation issue.  Still, if Condor doesn't support
the single-multiple 
combination, then there should be an error code returned from the perl module
indicating that it isn't 
implemented.
------- Comment #4 From 2005-05-19 04:39:36 -------
Currently, Condor doesn't suport parallel jobs that aren't mpi or pvm (which is
what job_type multiple 
means, correct?). We are actively working on such support and it should be
ready "soon". So I'd say that 
job_type multiple should fail for Condor at present. Once our parallel support
is ready, we can work with 
you to enable job_type multiple. The changes should be simple (a couple
attributes added to the submit 
file).
------- Comment #5 From 2005-05-19 05:44:21 -------
I agree that JobManagers should act in a consistent manner, and that a
JobManager should return an error if a requested configuration is not supported.

The RSLv1 specification is documented at
http://www-fp.globus.org/gram/gram_rsl_parameters.html;  I am assuming that the
semantics of the fields as described haven't changed in later RSL
reincarnations.  (This really must be documented!) 

Here are what I think are the relevant excerpts:

(count=value)
    The number of executions of the executable.

    Default: 1

(jobType=single|multiple|mpi|condor)
    This specifies how the jobmanager should start the job.
      single -  Even if the count > 1, only start 1 process or thread
      multiple -  start count processes or threads
      mpi -  use the appropriate method (e.g. mpirun on SGI Origin or POE on IBM
SP) to start a program compiled with a vendor-provided MPI library.   Program is
started with count nodes.
      condor -  starts condor jobs in the "condor" universe.  (default is vanilla)

    Default: multiple

Based on this, I would assert:

 * The current SGE JobManager appears to be treating jobType differently from
the spec; it's treating 'single' as "Each job should be run individually" and
'multiple' as "Each job should be executed in a batched mode."  Unless the
semantics of these terms is to be rethought, this should be changed as it is
non-conformant.
  
 * The 'multiple' job type simply says "Run this process (count) times."
Strictly speaking, the SGE "multiple" semantics are valid but can be improved; 
I think that I should update the SGE JobManager to mimic the PBS
implementation's 'multiple' bahviour and submit an SGE JobArray with the
appropriate number of tasks.

(An SGE jobarray is simply an efficient means of submitting multiple instances
of a job to a cluster.  There are no special coallocation semantics associated
with this behaviour.)

 * The 'single' jobType semantics are also clear -- "Run exactly one instance of
the executable" -- but this seems to conflict with established bahviour.  

 * By this definition, each of the SGE, PBS and Condor JobManagers are broken --
the (count) value should be ignored for 'single' jobs and only one instance of
the executable started.  

 * Either the JobManagers should be corrected to match the spec, or the spec
should be updated to describe the desired behaviour.  (If you're planning to
switch to using the JSDL specification soon _anyway_, updating the RSL spec
probably isn't worth the effort.)

Thoughts?

David
-- 
David McBride <dwm@doc.ic.ac.uk>
Department of Computing, Imperial College, London
------- Comment #6 From 2005-05-19 13:00:00 -------
The way I interpret the GT2 RSL descriptions, multiple does not mean "submit N
copies of the same job" 
(i.e. Condor's "queue" directive).  It means "submit one job with
processCount=N" (or equivalent).  The 
common practice, though, seems to be the former.  I agree with David that the
semantics for single 
seem clear in the description, but are different in practice (if taking PBS as
the standard).

In the case of PBS, the extra single semantics of "reserve <count> CPUs" seems
utterly misplaced and 
belongs in a separate element like "cpuCount" or something.

I think that Condor's multiple implementation agrees with the accepted
semantics of multiple, but 
agree that it seems not to agree with the GT2 RSL description.  The single
semantics seem clearly 
broken according to the description.

I still don't see the difference between SGE's "submit multiple copies of the
job" versus "submit a 
homongenous job array".  They seem like they would achieve the same result.  So
in my mind either one 
would work for the multiple case.  It's the single case that seems clearly
broken if going by the RSL 
description.

So an important question is, do we need to support multiple processes per job? 
Is multiple copies of 
the job sufficient for now?  If so we should simply clarify the meaning of
"count" and be done with that 
issue (all RMs seem to be consistent here, AFAICT).

The real problem is what to do with jobType==single.  If we want support
immediately for multiple CPU 
reservations in PBS, then I think we need an additional element to indicate
this (say "cpuCount").  In 
general I think everybody needs to conform to the current description and
ignore "count" altogether.  If 
we really do need multiple processes per job, then I'd suggest altering the the
semantics of both single 
and multiple and adding an additonal element to allow for this, say
"processCount".  In other words, 
processCount would indicate the number of copies of the executable to start per
job submitted.

Thoughts?
------- Comment #7 From 2005-05-20 05:05:03 -------
Subject: Re:  Inconsistent jobType/count parameter semantics

> The way I interpret the GT2 RSL descriptions, multiple does not mean "submit N copies of the same job" 
> (i.e. Condor's "queue" directive).  It means "submit one job with processCount=N" (or equivalent).  

So, in the latter case you would have a single job which would execute N 
instances of a process on a single machine?  ie effectively running:

for i in (1..N); do
	executable &;
done

> I still don't see the difference between SGE's "submit multiple copies of the job" versus "submit a 
> homongenous job array".  They seem like they would achieve the same result.  

That's correct, except SGE can handle the job array more efficiently 
than N seperate identical job definitions.  Think Condor's "queue N" 
directive.

> So an important question is, do we need to support multiple processes per job?  Is multiple copies of 
> the job sufficient for now?  If so we should simply clarify the meaning of "count" and be done with that 
> issue (all RMs seem to be consistent here, AFAICT).

The jobs that run on our clusters generally only start one process per 
job (except for MPI jobs, naturally.)

> The real problem is what to do with jobType==single. 

Given the above definitions of 'single' and 'multiple' then the 
(jobType) field is effectively redundant; you may as well simply use 
(count) to specify exactly how many jobs you wish to be submitted and 
use some other common identifier for (jobType), eg 'standard'.

[ Even with that change, (count) is still overloaded -- in the 
'single' and 'multiple' jobTypes it indicates the number of jobs to 
submit (eg with Condor's "queue N") whereas for 'mpi' jobs it indicates 
the number of nodes that the single job should use.  But we can probably
live with that in the short-medium term.]

> In general I think everybody needs to conform to the current description and ignore "count" altogether. If 
> we really do need multiple processes per job, then I'd suggest altering the the semantics of both single 
> and multiple and adding an additonal element to allow for this, say "processCount".  In other words, 
> processCount would indicate the number of copies of the executable to start per job submitted.

That's certainly seems feasible.

Cheers,
David
------- Comment #8 From 2005-05-30 03:50:06 -------
Dear All,

As an author of GT4 - SGE interface created by Gridwise Technologies (a bit 
different from LeSC version) I am also entangled with this discussion.
Let me add some comments to this threat. 

In my opinion the RSL 2 description will be more convenient if the tags 
<executable> and <arguments> could be extended to the Array type (<executable> 
is now pathType. Array type request for <arguments> is a result of Array type 
of <executable>: for each executable we can define another set of parameters).
Assuming such expansion the jobType=single can be limited by definition to the 
count=1 (count>1 should not be considered) 
and single job script contains a list of all defined executables:
executable_1 parameter_1 &;
executable_2 parameter_2 &;
...

The case of jobType=multiple can be defined just as a Array job (in sense of 
SGE or "queue N" in Condor case).
where each subjob corresponds to the single job described above.

Another point which seems to be important is a minimal set of RSL elements 
which must be defined in the Globus job.
In our version of the the GT4 - SGE adapter we assume that only <executable> 
have to be determined by the user 
(job is started as a single with stdout/err created in the user's $HOME 
directory on the machine where the job is executed). 
I think this subject also have to be unified and each GT4 - GRAM interface 
should use the same, common solution. 


Best Regards,

Bogdan Lobodzinski
-----------------------
bogdan@gridwisetech.com
www.gridwisetech.com
------- Comment #9 From 2005-07-15 13:04:25 -------
Stu and I talked this over at length recently with the adapters' code up on our
screens, and I came up with the following notes.  The actual "Notes" section has
some comments on compliancy.

jobType definitions
-----------------------------------------------------------------------

'single'        - submit job with reservation of 'count' nodes and let 
                job be responsible for utilization of the nodes
'multiple'      - Assumes either shared memory (nodes = CPUs) or 
                cluster (nodes = hosts) (or based on setup options).
                If shared memory, reserve nodes and submit multiple 
                copies of the executable to one host.
                If cluster, submit multiple copies of the executable to 
                multiple hosts, originally based off of node reservations.
'mpi'           - 'multiple' but use mpirun or mpiexec to run executable
'condor'        - For Condor jobs use the 'standard' universe, otherwise
                the functionality is undefined.

Actual adapter implementations
-----------------------------------------------------------------------

Condor
        single          - "queue 1" multiple times (job array, INCORRECT)
        multiple        - "queue 1" multiple times (job array)

LSF
        single          - "BSUB -n" with one executable line
        multiple        - "BSUB -n" with multiple executable lines 
                        in job script (shared memory)

PBS
        single          - "-l nodes=" with one executable line
        multiple        - If shared memory, "-l nodes" with multiple 
                        executables lines in job script
                        Else (cluster), "-l nodes" based on CPUs per 
                        node. Use rsh or ssh in submitted job script to
                        start jobs on hosts from pbs_nodes file in round
                        robin fashion.

SGE (original)
        single          - "#\$ -t 1-" with one executable line
                        (job array, INCORRECT)
        multiple        - multiple executable lines in job script (shared
                        memory with no node reservation)

SGE (Ap Grid)
        single          - job array (INCORRECT)
        multiple        - Use rsh in submitted job script to
                        start jobs on hosts from machines file in round
                        robin fashion. (cluster with no node reservation)

Notes
-----------------------------------------------------------------------

- The 'single' mode still seems to be necessary, so we need to preserve 
this.

- Condor's 'single' implementation is incorrect and should instead
throw an error.

- SGE's 'single' implementation (both versions) is incorrect and
should instead throw an error or be fixed if possible.

- The two SGE adapter 'multiple' implementations should probably be 
merged.

- Should we add a third value to the jobType enumeration named 'array' since SGE
seems capable of three multiple submission types?  My concern here, though, is
that Condor uses a job array when count > 1 since it queues multiple copies of
the job rather than queuing one job and executing multiple processes on one
(shm) or more (cluster) nodes.  This would mean that one would have to select
'array' each time count > 1 instead of relying on the default 'multiple' to
avoid getting an error.
------- Comment #10 From 2005-07-15 13:20:36 -------
WRT my last Note bullet in comment #9, I'm thinking we could resolve this by
having a "defaultJobType" factory configuration option.  I've heard of people
always using 'single' or 'mpi', for example, so this would have wider
applicability besides being able to specify custom defaults for custom
adapters/factory types.  I've submitted an enhancement request (bug #3569) for
this feature.
------- Comment #11 From 2005-07-18 05:10:08 -------
Subject: Re:  Inconsistent jobType/count parameter semantics

On Fri, 2005-07-15 at 13:04 -0500, bugzilla-daemon@mcs.anl.gov wrote:

> - Should we add a third value to the jobType enumeration named 'array' since SGE
> seems capable of three multiple submission types?  

I do not think that an array job should be considered as being different
from a 'multiple' job.

An SGE job array is a mechanism for submitting multiple instances of a
particular job specification.  It is equivilent to submitting N
individual jobs to a queue.  

The only difference is that SGE can process job arrays more efficiently.

Cheers,
David
------- Comment #12 From 2005-07-18 11:05:50 -------
David,

I understand what the job array does, and that's exactly why I suggest adding an
additional job type.  Aside from SGE and Condor (and actually, SGE has been
using 'single' for job arrays, so it hasn't ever really been considered part of
'multiple'), 'multiple' means "one queued job, multiple processes".  SGE and
Condor job arrays mean "multiple queued copies of the job".  Unless you want to
be able to configure the adapter at configuration time to use only job arrays,
then there's currently no way to selectively use job arrays with 'multiple'
other than keeping it incorrectly as the 'single' implementation.  Of course the
inconsistent use of 'single' is partly what this bug is trying to solve.  So
since job arrays are fundamentally different from the perspective of queuing
jobs, I think it would make sense to add the new type and let SGE and Condor use
this without ambiguity or restriction of choice at runtime.

So I guess my question is, what's the problem with adding a new type if
'multiple' is over taxed?  Especially since SGE job arrays have been used with
'single', I'm not really understanding the resistance to the idea of a new
jobType value.

Peter
------- Comment #13 From 2005-11-06 22:33:47 -------
This comment is a little outside the flow of the rest of this bug, but I wanted
to add a note that AIST is 
interested in having the semantics of the jobtypes clear for the purpose of
Ninf-G development, which 
depends on some kind of consistent implementation of the "multiple" job type
among the different 
scheduler adapters.

I guess you can just consider this as a "vote" for standardization among the
implementations, regardless 
of how the difference between a multiple/array is eventually handled.
------- Comment #14 From 2006-10-02 12:43:10 -------
Especially considering that we are planning on leaving the 4.0 interface alone
for backward compaibility reasons, I think we'll add an extension called
"multipleType" that will differentiate between the shared memory, cluster, and
job array versions of multiple. Bug #3569 should incorporate the specification
of defaults per factory resource for multipleType in addition to jobType.
------- Comment #15 From 2009-03-26 20:23:37 -------
I wish to bump this thread because it seems that three years on there are still
issues with jobmanager semantics and side-effects from the lack of
non-functional specs.

There are at least a couple of obvious points to take away from the previous
discussion on this bug:
- the semantics of jobtypes are not well documented or understood, this must be
resolved
- jobmanagers should behave in a consistent manner, this necessitates them
returning failures for unsupported features rather than shoehorning the
requested feature into something that sounds roughly the same if you read the
spec backwards

I also wonder why there has been no discussion about the sanity of the default
jobtype=multiple, count=1. Surely jobtype=single is the least common
denominator between LRMs!?
------- Comment #16 From 2012-09-05 11:42:39 -------
Doing some bugzilla cleanup...  Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5.  Also, we're now tracking
issue in jira.  Any new issues should be added here:

http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363