Bug 2599 - CAMPAIGN: Concurrent Job Submission Capacity
: CAMPAIGN: Concurrent Job Submission Capacity
Status: RESOLVED FIXED
: GRAM
wsrf managed execution job service
: 3.9.5
: PC Linux
: P3 normal
: ---
Assigned To:
:
:
: 2677
:
  Show dependency treegraph
 
Reported: 2005-01-21 14:57 by
Modified: 2005-02-24 08:09 (History)


Attachments
logfile of /proc/[process id]/fds as the tests ran (52.20 KB, text/plain)
2005-01-27 17:34, Bob Gaffaney
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-01-21 14:57:10
Leader:          Bob Gaffaney

People:          Bob Gaffaney

Time Estimate:   5 Days

Description:

An important metric for the GRAM Service is its capacity for parallel
submissions. The purpose of this campaign is to determine this capacity for the
GT4 GRAM architecture.

Job submissions can be either synchronous or batch mode. Both will be tested.
Since the intent of this campaign is to establish the capacity of the Service it
is necessary to design tests that eliminate Client scalability from the results. 

Perl scripts that submit and monitor simple jobs will be written and will be run
in a way that meets the above criteria.

Java tools will be used to attempt to identify specific areas of the Toolkit
that have concurrency problems.

This bug will used to post a running commentary on progress and results of this
campaign.

Deliverables:

1) A report on GRAM Concurrent Job Submission Capacity
2) Perl Scripts used in the tests
3) Memory or Profiling reports on specific problem areas
3) Instructions describing how to set up and run the tests

Tasks:

1) Develop Perl Scripts
2) Run Tests
3) Produce Reports and documentation as descibed above
4) Submit bugs to Bugzilla for specific issues discovered in testing
5) Run tests under tools that can zero in on problem areas of the Toolkit
------- Comment #1 From 2005-01-21 15:11:27 -------
1/20/2005

I am running the container in one process on lucky0. In two other proceess on
the same machine I simultaneously run scritps that submit "sleep 100" batch jobs
as fast as the service will accept them. As each client process approached 500
jobs the service ran very slowly and finally crashed with out of memory errors.
This was at 512 jobs on one client and 482 on the other.

*****
1/20/2005

I restarted the container and repeated the above tests and got almost exactly
the same results - container ran out of memory at approximately 480 plus 520 jobs.

****
1/21/2005

I restarted the container and repeated the above tests, but running four clients
instead of two (all four still on the same machine) When the container crashed
the clients had submitted: 260; 265; 265 and 273 jobs. The magic number appears
to be right around 1000.
------- Comment #2 From 2005-01-21 16:02:52 -------
From Alain:

Do you we have numbers somewhere for the performance of GRAM 
when being attacked with a lots of jobs simultaneously?

Response:

Not yet - but I do want to collect some information on the affects of throttling
job submissions instead of as fast as the server will accept them.
 
------- Comment #3 From 2005-01-21 16:58:43 -------
Will leave open until campaign is complete
------- Comment #4 From 2005-01-27 15:51:05 -------
I set up concurrency test scripts in two processes to blast PBS jobs at a 
service. I got to 521 jobs for one process and 524 for the other before I hit 
OOM Problems  - the SOAP Axis Server shut down.
 
The service container is on the local machine, there is no staging and the JVM 
is running with the memory settings.
 
 
Bob
 
 
Here is my run string
************************
globusrun-ws -submit -batch -F 
https://lucky0:8444/wsrf/services/ManagedJobFactoryService -o epr_522 -
c /bin/sleep 2000 -Ft PBS

Here is the Service Output
******************************
java.lang.OutOfMemoryError
2005-01-27 14:13:59,059 INFO  authorization.ServiceAuthorizationChain [Thread-
5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.RuntimeException: Unable to invoke state transition method 
processCacheCleanUpState
        at org.globus.exec.service.exec.StateMachine.processState
(StateMachine.java:269)
        at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:93)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.globus.exec.service.exec.StateMachine.processState
(StateMachine.java:263)
        ... 1 more
Caused by: java.lang.OutOfMemoryError
2005-01-27 14:14:04,000 INFO  authorization.ServiceAuthorizationChain [Thread-
227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 14:14:12,561 INFO  authorization.ServiceAuthorizationChain [Thread-
35,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
2005-01-27 14:14:22,929 INFO  authorization.ServiceAuthorizationChain [Thread-
35,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 14:14:22,934 INFO  authorization.ServiceAuthorizationChain [Thread-
227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
2005-01-27 14:14:32,066 INFO  authorization.ServiceAuthorizationChain [Thread-
227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
java.lang.OutOfMemoryError
2005-01-27 14:14:45,854 INFO  authorization.ServiceAuthorizationChain [Thread-
1122,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
2005-01-27 14:14:56,672 INFO  authorization.ServiceAuthorizationChain [Thread-
1123,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
Stopped SOAP Axis server at: 
https://140.221.65.193:8444/wsrf/services/         
java.lang.OutOfMemoryError
------- Comment #5 From 2005-01-27 15:52:23 -------
It is looking like concurrency limitations are not scheduler dependent. I got 
almost exactly the same results for Condor as for PBS.
 
I repeated the same concurrency tests with the Condor scheduler: Clients on 
two processes submitting batch jobs as quickly as possible. The container ran 
out of memory after 512 jobs for one process and 524 for the other.
 
Bob
 
 
Here is a run string
**********************
 
globusrun-ws -submit -batch -F 
https://lucky0:8444/wsrf/services/ManagedJobFactoryService -o epr_525 -
c /bin/sleep 2000 -Ft Condor

Here is the output from the service
****************************************
 
2005-01-27 15:07:12,714 INFO  authorization.ServiceAuthorizationChain [Thread-
256,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
2005-01-27 15:07:22,552 INFO  authorization.ServiceAuthorizationChain [Thread-
6,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 15:07:31,978 INFO  authorization.ServiceAuthorizationChain [Thread-
5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 15:07:45,664 INFO  authorization.ServiceAuthorizationChain [Thread-
34,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.RuntimeException: Unable to invoke state transition method 
processStartState
        at org.globus.exec.service.exec.StateMachine.processState
(StateMachine.java:269)
        at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:93)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.globus.exec.service.exec.StateMachine.processState
(StateMachine.java:263)
        ... 1 more
Caused by: java.lang.OutOfMemoryError
2005-01-27 15:07:55,378 INFO  authorization.ServiceAuthorizationChain [Thread-
6,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 15:08:03,475 INFO  authorization.ServiceAuthorizationChain [Thread-
5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-01-27 15:08:09,319 INFO  authorization.ServiceAuthorizationChain [Thread-
34,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
java.lang.OutOfMemoryError
 
 
------- Comment #6 From 2005-01-27 16:04:09 -------
Per Jarek,

When you are running these tests (with jobs close to 1000) can you also
watch the /proc/<process id of JVM>/fd directory? This is to see how many
fds are opened at the same time. Maybe we are running into the limit of
opened fds per process. Also, increase the JVM heap size just to see if it
is really a memory issue or some other issue.

------- Comment #7 From 2005-01-27 17:34:03 -------
Created an attachment (id=495) [details]
logfile of /proc/[process id]/fds as the tests ran
------- Comment #8 From 2005-01-28 15:20:14 -------
Jarek pointed me at a new wrsf_core.jar which I dropped into the $G_L/lib to 
replace the one that was there. Jarek made changes that caused resources to be 
reclaimed sooner - these have now been committed to the trunk.

I repeated the fork parallel job tests and found that the service handled 
about 300 more jobs (674 for each of two processes) before it stopped. Note 
the service did not actually crash but Jarek and I confirmed that counter-
create failed:

Error: WSDLException (at
/wsdl:definitions/wsdl:import): faultCode=PARSER_ERROR: Problem parsing
'../../../wsrf/notification/WS-BaseN.wsdl'.: java.lang.OutOfMemoryError"

Command line is as before. Here are the processes that were running after it 
stopped responding:

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
gaffaney  3061  0.0  0.2 21936  564 ?        Ss   Jan22   0:06 /usr/bin/gnome-
session
gaffaney  3089  0.0  0.0  3708    4 ?        Ss   Jan22   0:00 /usr/bin/ssh-
agent -s
gaffaney  3116  0.0  0.0  3164    4 ?        S    Jan22   0:00 /usr/bin/dbus-
launch --exit-with-session /etc/X11/xinit/Xclients
gaffaney  3117  0.0  0.0  3664   28 ?        Ss   Jan22   0:00 dbus-daemon-1 --
fork --print-pid 8 --print-address 6 --session
gaffaney  3122  0.0  0.2 12312  608 ?        S    Jan22   
0:34 /usr/libexec/gconfd-2 13
gaffaney  3124  0.0  0.0  2992   24 ?        S    Jan22   0:00 /usr/bin/gnome-
keyring-daemon
gaffaney  3126  0.0  0.0  8144   64 ?        Ss   Jan22   
0:01 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=18
gaffaney  3128  0.0  0.2 20908  544 ?        S    Jan22   
0:06 /usr/libexec/gnome-settings-daemon --oaf-activate-
iid=OAFIID:GNOME_SettingsDaemon --oaf-ior-fd=22
gaffaney  3134  0.1  0.1  2768  288 ?        S    Jan22  
16:33 /usr/libexec/gam_server
gaffaney  3143  0.0  0.1  5784  320 ?        S    Jan22   2:08 xscreensaver -
nosplash
gaffaney  3170  0.0  1.3 13968 3384 ?        Ss   Jan22   3:12 metacity --sm-
save-file 1103595795-13321-2880982762.ms
gaffaney  3172  0.0  0.2 19624  540 ?        Ss   Jan22   0:03 gnome-volume-
manager --sm-config-prefix /gnome-volume-manager-BX8QFJ/ --sm-client-id 
117f000001000110358891100000132210001 --screen 0
gaffaney  3174  0.0  0.7 24608 1876 ?        Ss   Jan22   1:02 gnome-panel --
sm-config-prefix /gnome-panel-89cmJV/ --sm-client-id 
117f000001000110358891100000132210002 --screen 0 --profile default
gaffaney  3176  0.0  0.8 43640 2192 ?        Ssl  Jan22   0:40 nautilus --sm-
config-prefix /nautilus-sMQZZa/ --sm-client-id 
117f000001000110358891100000132210003 --screen 0 --no-default-window
gaffaney  3178  0.0  0.2 41236  748 ?        Ss   Jan22   0:31 eggcups --sm-
config-prefix /eggcups-jjXq59/ --sm-client-id 
117f000001000110358891200000132210004 --screen 0
gaffaney  3185  0.0  0.0 21588   68 ?        Sl   Jan22   
0:00 /usr/libexec/gnome-vfs-daemon --oaf-activate-
iid=OAFIID:GNOME_VFS_Daemon_Factory --oaf-ior-fd=28
gaffaney  3187  0.0  0.2 13008  676 ?        Ss   Jan22   1:24 /usr/bin/pam-
panel-icon --sm-client-id 117f000001000110358891200000132210005
gaffaney  3189  0.5  1.6 34840 4280 ?        RNs  Jan22  
43:57 /usr/bin/python /usr/bin/rhn-applet-gui --sm-config-prefix /rhn-applet-
JPJiZe/ --sm-client-id 117f000001000110358891800000132210006 --screen 0
gaffaney  3198  0.0  0.0  2840  112 ?        S    Jan22   
0:36 /usr/libexec/mapping-daemon
gaffaney  3219  0.0  1.0 22884 2756 ?        S    Jan22   
1:14 /usr/libexec/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory -
-oaf-ior-fd=32
gaffaney  3221  0.1  0.2 23480  636 ?        S    Jan22  
11:38 /usr/libexec/mixer_applet2 --oaf-activate-
iid=OAFIID:GNOME_MixerApplet_Factory --oaf-ior-fd=34
gaffaney  3223  0.0  0.7 21416 1888 ?        S    Jan22   
0:56 /usr/libexec/clock-applet --oaf-activate-
iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=36
gaffaney  3225  0.0  0.1 19020  472 ?        S    Jan22   
0:05 /usr/libexec/notification-area-applet --oaf-activate-
iid=OAFIID:GNOME_NotificationAreaApplet_Factory --oaf-ior-fd=38
gaffaney  3255  0.0  0.0 67372   68 ?        Sl   Jan22   
0:00 /usr/libexec/evolution-data-server-1.0 --oaf-activate-
iid=OAFIID:GNOME_Evolution_DataServer_InterfaceCheck --oaf-ior-fd=42
gaffaney  3262  0.0  0.1 65392  484 ?        Sl   Jan22   
0:04 /usr/libexec/evolution/2.0/evolution-alarm-notify --oaf-activate-
iid=OAFIID:GNOME_Evolution_Calendar_AlarmNotify_Factory:2.0 --oaf-ior-fd=44
gaffaney 30118  0.0  0.0  4992  140 ?        Ss   Jan23   0:00 ssh-agent
gaffaney 30779  0.0  0.0  4280  140 ?        Ss   Jan23   0:00 ssh-agent
gaffaney 30816  0.0  0.0  4536  140 ?        Ss   Jan23   0:00 ssh-agent
gaffaney 30854  0.0  0.0  5048  140 ?        Ss   Jan23   0:06 ssh-agent
gaffaney 27965  0.0  0.0  4904  140 ?        Ss   Jan24   0:03 ssh-agent
gaffaney 29958  0.0  0.0  4376  140 ?        Ss   Jan26   0:00 ssh-agent
gaffaney  5262  0.0  0.0  4828  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney  5456  0.0  0.0  3872  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney  5623  0.0  0.0  4232  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney 11575  0.0  0.0  4136  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney 12581  0.0  0.0  3824  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney 12679  0.0  0.0  4928  140 ?        Ss   Jan27   0:00 ssh-agent
gaffaney  5186  1.6  1.5 45076 3936 ?        Sl   09:51   2:15 gnome-terminal
gaffaney  5190  0.0  0.1  3380  272 ?        S    09:51   0:00 gnome-pty-helper
gaffaney  5191  0.0  0.0  5788  240 pts/0    Ss   09:51   0:01 bash
gaffaney  5227  0.0  0.0  4208  140 ?        Ss   09:53   0:00 ssh-agent
gaffaney  5299  0.0  0.0  6184  240 pts/1    Ss   10:01   0:00 bash
gaffaney  5321  0.0  0.0  4240  140 ?        Ss   10:01   0:00 ssh-agent
gaffaney  5342  0.0  3.3 430736 8576 pts/1   Sl+  10:02   
0:02 /home/gaffaney/j2sdk1.4.2_06/bin/java -Xmx256m -jar /usr/share/yjp-
3.2/bin/../lib/yjp.jar
gaffaney  5389  0.2 65.9 382680 168992 pts/0 Sl+  10:04   0:15 java -
Xrunyjpagent port=10000 -DGLOBUS_LOCATION=/home/gaffaney/trunk-012205 -
Djava.endorsed.dirs=/home/gaffaney/trunk-012205/endorsed -
DLD_LIBRARY_PATH=/home/gaffaney/trunk-012205/lib -
classpath /home/gaffaney/trunk-012205/lib/bootstrap.jar:/home/gaffaney/trunk-
012205/lib/cog-url.jar:/home/gaffaney/trunk-012205/lib/axis-url.jar 
org.globus.bootstrap.Bootstrap org.globus.wsrf.container.ServiceContainer
gaffaney  5411  0.0  0.1  4604  412 pts/0    S+   10:05   
0:02 /home/gaffaney/trunk-012205/libexec/globus-scheduler-event-generator -s 
fork -t 1106752148
gaffaney  5435  0.0  0.0  4604  240 pts/2    Ss   10:07   0:00 bash
gaffaney  5459  0.0  0.0  5824  240 pts/3    Ss   10:07   0:00 bash
gaffaney  5481  0.0  0.0  3648  140 ?        Ss   10:07   0:00 ssh-agent
gaffaney  5502  0.0  0.0  4424  140 ?        Ss   10:08   0:00 ssh-agent
gaffaney  5766  0.0  0.1  7456  268 pts/2    S+   10:20   0:02 
perl ../consub.pl 2000 2000
gaffaney  5827  0.0  0.1  8720  268 pts/3    S+   10:20   0:02 
perl ../consub.pl 2000 2000
gaffaney   768  0.0  0.2 16952  720 pts/2    S+   11:13   0:00 globusrun-ws -
submit -batch -F https://mozia:8443/wsrf/services/ManagedJobFactoryService -o 
epr_675 -c /bin/sleep 2000
gaffaney   794  0.0  0.2 16720  744 pts/3    S+   11:13   0:00 globusrun-ws -
submit -batch -F https://mozia:8443/wsrf/services/ManagedJobFactoryService -o 
epr_675 -c /bin/sleep 2000
gaffaney  1044  0.0  0.4  5336 1232 pts/4    Ss   11:19   0:01 bash
gaffaney  1410  0.0  0.3  2956  776 pts/4    R+   12:07   0:00 ps ux

------- Comment #9 From 2005-01-28 15:39:23 -------
yjp profiler snapshots for the last run are on wiggum:

/tmp/gaffaney/012805_after-startcontainer.memory
/tmp/gaffaney/012805_after-servicecrashed.memory
/tmp/gaffaney/012805_after-kill_dash_QUIT.memory

Note that these are huge (the last two are ~61M) which may dictate how/where 
you open them with the profiler
------- Comment #10 From 2005-01-28 17:26:04 -------
I am calling the tests Parallel Job Submission tests because concurrency has 
been previously used to desribe many processes submitting to the service, not 
a few processes submitting lots of jobs.

The three test scripts are checked into:

   ws-gram/service/java/test/scalability/bin 

and a short doc describing them is in 

   ws-gram/service/java/test/scalability and pasted below.


*******************************************
GRAM Parallel Job Submission Tests
1/28/2005 - R. Gaffaney

consub.pl, constat.pl and conkill.pl are perl scripts that provide a simple
way to submit and manage a stream of jobs.

Other than needing globusrun-ws in the path they have no dependencies. You
can run them from anywhere and it is suggested you create a directory for
each process you intend to use to submit jobs for reasons that will be
descibed below.

These are simple jobs have no staging and can be submitted to any installed
scheduler. Here is a typical run string:

    globusrun-ws -submit -batch -F \
    https://lucky0:8444/wsrf/services/ManagedJobFactoryService \
    -o epr_525 -c /bin/sleep 2000 -Ft Condor


consub.pl
---------

Jobs are submitted using consub.pl with the following command line:

    ./consub.pl Host:Port JobCount SleepTime Arg4 Arg5

    Host:Port      e.g. lucky0.mcs.anl.gov:8443
    JobCount       e.g. 1000
    SleepTime      e.g. 1000 (~16.7 Minutes)
    Arg4           Either nothing or -Ft
    Arg5           Either nothing or Fork | PBS | Condor | LHS

You can skip Arg4 and Arg5 for fork jobs.

In order to make sure the jobs stay active until they are all submitted make
SleepCount long enough to last beyond the time it takes to submit all jobs.

consub.pl captures the epr for each job in a file named "epr_N" where N is
from 0 to (JobCount - 1). It is necessary when submitting from multiple
processes to run in different directories since files would be overwritten.
It would certainly be possible fix this, but didn't seem worth the effort.


constat.pl
----------

constat.pl takes the following command line:

    ./constat.pl JobCount

It iterates through the epr files created by constat.pl and makes a
globusrun-ws call to collect the status of the job. It displays in the
following format:

    --------------------------------------------------
    Date/Time: Tue Jan 25 12:28:35 2005
    cccccccccccccccccccccccccccccccccccccccccccccccccc
    cccccccccccccccccccccccccccccddddddddddddddddddddd
    dddddddddddddddddddddddddddddddddddddddddddddddddd
    dddddddddddddddddddddddddddddddddddddddddddddddddd
    dddddddddddddddddddddddddddddddddddddddddddddddddd
    dddddddddddddddddddddddddddddddddddddddddddddddddd
    dddddddddddddddddddddddddddddddddddddddaaaaaaaaaaa
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Total: 500  Active: 161  Done: 260  Cleanup:  79
    --------------------------------------------------

It displays jobs 50 on a line showing that they are in either the 'a' for
active state, 'd' for done state or 'c' for cleanup state. Note that
constat.pl causes the service to allocate resources itself, so it can affect
the results of the test.

constat.pl waits ten seconds after it gets through the eprs and starts again.
It will do so until you CTRL-C out of it.


conkill.pl
----------

conkill.pl spins through the eprs and calls globusrun-ws -kill for each. The
command line is thus:

    ./conkill.pl JobCount

It also removes each epr_N file if it is successful. If the tests crash the
server the epr files have to be removed manually.



------- Comment #11 From 2005-02-01 16:20:11 -------
I rebuilt with Peter's changes and found that GRAM capacity for submissions 
has increased significantly. I submitted 4000 Condor batch jobs from a single 
process and the container did not have any problems.

So I blasted it from four processes simultaneously and the service failed at 
around 2850 total jobs. At this point Condor reported it had 23 of the jobs 
enqueued. 

Service log is below
********************

2005-02-01 15:52:14,966 ERROR factory.ManagedJobFactoryService [Thread-
95,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 27 more
2005-02-01 15:52:15,253 INFO  authorization.ServiceAuthorizationChain [Thread-
94,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-02-01 15:52:15,287 ERROR factory.ManagedJobFactoryService [Thread-
94,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 27 more
2005-02-01 15:52:15,393 ERROR factory.ManagedJobFactoryService [Thread-
93,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 27 more
2005-02-01 15:52:15,459 INFO  authorization.ServiceAuthorizationChain [Thread-
95,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-02-01 15:52:15,689 ERROR container.GSIServiceThread [Thread-
99,process:117] Error processing request
java.lang.NullPointerException
        at org.globus.gsi.CertificateRevocationLists.reload
(CertificateRevocationLists.java:104)
        at org.globus.gsi.CertificateRevocationLists.getDefault
(CertificateRevocationLists.java:179)
        at 
org.globus.gsi.CertificateRevocationLists.getDefaultCertificateRevocationLists
(CertificateRevocationLists.java:168)
        at org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain
(GlobusGSSContextImpl.java:689)
        at org.globus.gsi.gssapi.GlobusGSSContextImpl.acceptSecContext
(GlobusGSSContextImpl.java:295)
        at org.globus.gsi.gssapi.net.GssSocket.authenticateServer
(GssSocket.java:119)
        at org.globus.gsi.gssapi.net.GssSocket.startHandshake
(GssSocket.java:137)
        at org.globus.gsi.gssapi.net.GssSocket.getOutputStream
(GssSocket.java:155)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:88)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
2005-02-01 15:52:15,772 INFO  authorization.ServiceAuthorizationChain [Thread-
94,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. 
Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job}
createManagedJob".
2005-02-01 15:52:15,797 ERROR factory.ManagedJobFactoryService [Thread-
95,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 27 more
2005-02-01 15:52:15,810 ERROR factory.ManagedJobFactoryService [Thread-
94,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 27 more
2005-02-01 15:53:57,476 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 15:58:56,908 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 15:58:57,486 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 15:58:57,534 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:03:56,917 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:03:57,495 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:03:57,542 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:08:56,925 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:08:57,504 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:08:57,550 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:13:56,934 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:13:57,513 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:13:57,558 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:14:10,435 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:14:10,458 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:14:12,499 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:14:12,519 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:16:50,467 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:16:52,466 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:16:52,486 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:178)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:16:54,530 ERROR exec.ManagedExecutableJobHome [Thread-
4,jobStateChanged:417] Unable to deliver state change notification -- resource 
associated with job does not exist
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.utils.PersistenceHelper.load
(PersistenceHelper.java:133)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load
(PersistentManagedExecutableJobResource.java:232)
        at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad
(ResourceHomeImpl.java:236)
        at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271)
        at org.globus.wsrf.impl.ResourceHomeImpl.find
(ResourceHomeImpl.java:256)
        at 
org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged
(ManagedExecutableJobHome.java:413)
        at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent
(JobStateMonitor.java:438)
        at org.globus.exec.monitoring.JobStateMonitor.addEvent
(JobStateMonitor.java:416)
        at org.globus.exec.monitoring.SchedulerEventGenerator.run
(SchedulerEventGenerator.java:166)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
        ... 12 more
2005-02-01 16:18:56,943 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:18:57,522 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)
2005-02-01 16:18:57,566 WARN  usefulrp.GLUEResourceProperty [GLUE refresher 
0,runScript:285] Script Execution error when executing 
shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs
java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:143)
        at java.lang.Runtime.execInternal(Native Method)
        at java.lang.Runtime.exec(Runtime.java:566)
        at java.lang.Runtime.exec(Runtime.java:428)
        at java.lang.Runtime.exec(Runtime.java:364)
        at java.lang.Runtime.exec(Runtime.java:326)
        at 
org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript
(GLUEResourceProperty.java:272)
        at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run
(GLUEResourceProperty.java:236)
        at java.lang.Thread.run(Thread.java:534)

------- Comment #12 From 2005-02-01 16:59:15 -------
Btw, I fixed that NPE in org.globus.gsi.CertificateRevocationLists.reload(). 
That is caused by the I/O problems (too many files open)
------- Comment #13 From 2005-02-01 19:31:09 -------
To see what role Condor submission times play in overall gram submission times 
I ran a script that submits Condor jobs directly using condor_submit. The jobs 
were the same (/bin/sleep 2000) as for the gram scheduler tests.

It took a 1' 47" to submit 500 jobs which works out to about 280 jobs a 
minute. This compares to about 18 jobs a minute with gram - the delays are 
definately on the gram side.
------- Comment #14 From 2005-02-01 20:34:11 -------
I ran the tests using the PBS scheduler using four clients. The clients each 
got to about 800 jobs before the service reported the error below. 

qstat showed 33 jobs enqueued of which 6 were active.


************************************************************8
2005-02-01 20:25:21,135 ERROR factory.ManagedJobFactoryService [Thread-
102,createManagedJob:312] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is:

        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:576)
        at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize
(ManagedExecutableJobResource.java:198)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState
(ManagedExecutableJobResource.java:139)
        at 
org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW
holeState(PersistentManagedExecutableJobResource.java:137)
        at org.globus.exec.service.exec.ManagedExecutableJobHome.create
(ManagedExecutableJobHome.java:239)
        at 
org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob
(ManagedJobFactoryService.java:262)
        at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod
(RPCProvider.java:384)
        at org.globus.axis.providers.RPCProvider.invokeMethodSub
(RPCProvider.java:104)
        at org.globus.axis.providers.PrivilegedInvokeMethodAction.run
(PrivilegedInvokeMethodAction.java:39)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:379)
        at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49)
        at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84)
        at org.globus.axis.providers.RPCProvider.invokeMethod
(RPCProvider.java:94)
        at org.apache.axis.providers.java.RPCProvider.processMessage
(RPCProvider.java:281)
        at org.apache.axis.providers.java.JavaProvider.invoke
(JavaProvider.java:319)
        at org.apache.axis.strategies.InvocationStrategy.visit
(InvocationStrategy.java:32)
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
        at org.apache.axis.handlers.soap.SOAPService.invoke
(SOAPService.java:450)
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
        at org.globus.wsrf.container.ServiceThread.doPost
(ServiceThread.java:647)
        at org.globus.wsrf.container.ServiceThread.process
(ServiceThread.java:378)
        at org.globus.wsrf.container.GSIServiceThread.process
(GSIServiceThread.java:124)
        at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281)
Caused by:
        at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown 
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at java.lang.Class.newInstance0(Class.java:308)
        at java.lang.Class.newInstance(Class.java:261)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:70)
        at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault
(FaultUtils.java:88)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri
ng(ManagedExecutableJobResource.java:931)
        at 
org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap
(ManagedExecutableJobResource.java:573)
------- Comment #15 From 2005-02-02 11:54:39 -------
Bob,

Stu and I discussed changing the number of RunQueue threads to get more submissions happening at 
once.  The change to the code is really easy:

Edit line 22 of ws-gram/service/java/source/org/globus/exec/service/exec/RunQueue.java, look for 
NUM_RUN_QUEUES and change the value from 1 to 16.  Recompile the service package.

This will give you 16 RunQueue threads.  Then please rerun the test and report the results.  Thanks!
------- Comment #16 From 2005-02-02 23:05:14 -------
I made that change that Peter suggested bumping NUM_RUN_QUEUES from 1 to 16 in 
RunQueue.java.

The items in the condor queue now keep up with the job submissions. For 
example, with about 450 jobs submitted from each of four threads (= 1800) 
condor_q showed 1773 jobs in its queue.

From the time stamps in the condor_q listing I can see that between 98 and 102 
jobs per minute are being added to the queue.

This change seems like a keeper to me.
------- Comment #17 From 2005-02-03 00:08:32 -------
I disagree with hard-coding the gram code with 16 threads! That's way too many
-
 that's more what even the container uses for its thread pool by default. I 
think at least this value should be configurable and by default set to 
something smaller. More tests should be run with less RunQueue threads. The 16 
threads just means the gram can process 16 job submits at a time... I think the 
real problem might be how certain state changes are processed. They just block 
for too long. For example, the StateMachine.procesSubmitState() starts a submit 
script in the background (in a separate thread) but blocks until it finishes 
(therefore, blocking everything that's in the queue). Why run it in a separate 
thread then anyway? Or why not change that function to just start the submit 
script in background thread (as it is now) and let that thread do the right 
state processing at the end? That way gram could support more then 16 job 
submits at a time... The same issue applies to mergeStdout() and cacheCleanup
(). 
------- Comment #18 From 2005-02-03 06:42:18 -------
Jarek makes some points - I'll let the gram architects and developers probe 
for that right balance. But as an experiment this test shows that with the 
current design if things get moved out to the scheduler quickly it takes  
stress off gram.

8000 jobs were submitted, 2000 from each of four threads, with no reported 
errors by the container or the clients. Since these are long sleep jobs and my 
Condor installation only runs them four at a time there were mostly still in 
the condor queue in the morning:

     7994 jobs; 7994 idle, 0 running, 0 held

------- Comment #19 From 2005-02-03 07:44:06 -------
Not to familiar with the internal gram state transitions but even though 
condor_q is showing all of the 8000 jobs in its queue a scan (globusrun-ws -
status) shows them all as unsubmitted except the few that are done or active. 

The good news is that none is being rejected now. Below is the output from one 
client:
*******************

--------------------------------------------------
Date/Time: Thu Feb  3 06:43:17 2005
ccccacuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu

Total: 2000 Active: 1 Done: 0 Cleanup: 5 Unsub: 1994 Pending: 0 Rejected: 0
--------------------------------------------------
------- Comment #20 From 2005-02-03 09:13:17 -------
Jarek,

I suggested the exact same thing.  Stu's argument against this was that by fixing the 
processSubmitState method, this would allow the StateMachine to overload the scheduler since it can 
submit jobs as fast as it can background the task.  In the end, this could easily start a lot more 
JobManagerScript threads then the 16 RunQueue threads anyway.  By controlling the number of queues, 
we can limit the number of submits done simultaneously.

I don't mind making this value configurable.  I actually kind of expected this to be configurable at some 
point.  This was just a test to see if it alleviated the problem we were having.  The number of threads 
was determined by a simple calculation that 16 * time-to-submit-through-gram = time-to-submit-
manually. It's not the most beautiful fix, but I don't want to delve into Karl's suggested rate limiting 
campaigns for 4.0 if I can help it.
------- Comment #21 From 2005-02-03 10:17:19 -------
I'd like to retest with 4 threads and see how that performs.  And choose one of
these as the default for 
4.0.  We can work on a more elegant and efficient method as Jarek suggests for
4.2.
------- Comment #22 From 2005-02-04 16:39:22 -------
with run queue threads = 4, I duplicated Bobs test of 4 client submitting at
the same time.  After 1000 
jobs, I killed all the clients.  Condor reported 500 jobs.  After waiting a few
minutes (maybe 5-10) all 
1000 jobs eventually were submitted to condor.

Shows that for this load/burst, the number of run queue threads needs to be
higher.  I will rerun with 8 
and see what happens.

here is a top snapshot about after about 500 jobs were submitted:
121 processes: 117 sleeping, 3 running, 1 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total  118.2%    0.0%   78.6%   0.0%     0.4%    0.0%    1.8%
           cpu00   58.5%    0.0%   40.2%   0.0%     0.1%    0.0%    0.9%
           cpu01   59.9%    0.0%   38.2%   0.1%     0.3%    0.1%    0.9%
Mem:   515760k av,  495832k used,   19928k free,       0k shrd,   55632k buff
       265380k active,             167928k inactive
Swap: 1048552k av,   33584k used, 1014968k free                  226028k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
20336 smartin   15   0 16252 6780 15576 S     2.7  1.3   0:00   1 globusrun-ws
20338 smartin   15   0 16252 6780 15576 S     2.7  1.3   0:00   1 globusrun-ws
20358 smartin   15   0 16252 6780 15576 S     2.5  1.3   0:00   0 globusrun-ws
25485 condor    15   0 82296  75M  5024 S     1.7 15.0   7:58   1 condor_schedd
20387 smartin   19   0 16120 5676 15576 R     1.7  1.1   0:00   1 globusrun-ws
30268 condor    16   0  5476 1744  4676 S     0.7  0.3  25:51   1 condor_negoti
20054 smartin   16   0  4268 1244  3956 R     0.7  0.2   0:00   1 top
 2332 root      15   0 18568 1696 12240 S     0.3  0.3 628:46   1 X
10817 smartin   16   0  7356 3660  5028 S     0.3  0.7   0:00   0 perl
11044 condor    17   0  5512 2152  4840 S     0.3  0.4   0:00   0 condor_starte
30267 condor    16   0  5620 1924  4640 S     0.1  0.3   5:35   1 condor_collec
10978 smartin   25  10  5348 2220  4800 S N   0.1  0.4   0:00   1 condor_shadow
20393 smartin   19   0     0    0     0 Z     0.1  0.0   0:00   0 perl <defunct
------- Comment #23 From 2005-02-07 16:51:24 -------
I just changed the default thread count to 16.  This is should be sufficient
until we can implement a better rate limiting scheme after 4.0.
------- Comment #24 From 2005-02-24 08:09:33 -------
Think we can mark this campaign as closed. The test scripts are in:

    ws-gram/service/java/test/scalability/bin

and instructions for running them are in:

    ws-gram/service/java/test/scalability