Bugzilla – Bug 2599
CAMPAIGN: Concurrent Job Submission Capacity
Last modified: 2005-02-24 08:09:33
You need to log in before you can comment on or make changes to this bug.
Leader: Bob Gaffaney People: Bob Gaffaney Time Estimate: 5 Days Description: An important metric for the GRAM Service is its capacity for parallel submissions. The purpose of this campaign is to determine this capacity for the GT4 GRAM architecture. Job submissions can be either synchronous or batch mode. Both will be tested. Since the intent of this campaign is to establish the capacity of the Service it is necessary to design tests that eliminate Client scalability from the results. Perl scripts that submit and monitor simple jobs will be written and will be run in a way that meets the above criteria. Java tools will be used to attempt to identify specific areas of the Toolkit that have concurrency problems. This bug will used to post a running commentary on progress and results of this campaign. Deliverables: 1) A report on GRAM Concurrent Job Submission Capacity 2) Perl Scripts used in the tests 3) Memory or Profiling reports on specific problem areas 3) Instructions describing how to set up and run the tests Tasks: 1) Develop Perl Scripts 2) Run Tests 3) Produce Reports and documentation as descibed above 4) Submit bugs to Bugzilla for specific issues discovered in testing 5) Run tests under tools that can zero in on problem areas of the Toolkit
1/20/2005 I am running the container in one process on lucky0. In two other proceess on the same machine I simultaneously run scritps that submit "sleep 100" batch jobs as fast as the service will accept them. As each client process approached 500 jobs the service ran very slowly and finally crashed with out of memory errors. This was at 512 jobs on one client and 482 on the other. ***** 1/20/2005 I restarted the container and repeated the above tests and got almost exactly the same results - container ran out of memory at approximately 480 plus 520 jobs. **** 1/21/2005 I restarted the container and repeated the above tests, but running four clients instead of two (all four still on the same machine) When the container crashed the clients had submitted: 260; 265; 265 and 273 jobs. The magic number appears to be right around 1000.
From Alain: Do you we have numbers somewhere for the performance of GRAM when being attacked with a lots of jobs simultaneously? Response: Not yet - but I do want to collect some information on the affects of throttling job submissions instead of as fast as the server will accept them.
Will leave open until campaign is complete
I set up concurrency test scripts in two processes to blast PBS jobs at a service. I got to 521 jobs for one process and 524 for the other before I hit OOM Problems - the SOAP Axis Server shut down. The service container is on the local machine, there is no staging and the JVM is running with the memory settings. Bob Here is my run string ************************ globusrun-ws -submit -batch -F https://lucky0:8444/wsrf/services/ManagedJobFactoryService -o epr_522 - c /bin/sleep 2000 -Ft PBS Here is the Service Output ****************************** java.lang.OutOfMemoryError 2005-01-27 14:13:59,059 INFO authorization.ServiceAuthorizationChain [Thread- 5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.RuntimeException: Unable to invoke state transition method processCacheCleanUpState at org.globus.exec.service.exec.StateMachine.processState (StateMachine.java:269) at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:93) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.globus.exec.service.exec.StateMachine.processState (StateMachine.java:263) ... 1 more Caused by: java.lang.OutOfMemoryError 2005-01-27 14:14:04,000 INFO authorization.ServiceAuthorizationChain [Thread- 227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 14:14:12,561 INFO authorization.ServiceAuthorizationChain [Thread- 35,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError 2005-01-27 14:14:22,929 INFO authorization.ServiceAuthorizationChain [Thread- 35,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 14:14:22,934 INFO authorization.ServiceAuthorizationChain [Thread- 227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError 2005-01-27 14:14:32,066 INFO authorization.ServiceAuthorizationChain [Thread- 227,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError java.lang.OutOfMemoryError 2005-01-27 14:14:45,854 INFO authorization.ServiceAuthorizationChain [Thread- 1122,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError 2005-01-27 14:14:56,672 INFO authorization.ServiceAuthorizationChain [Thread- 1123,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". Stopped SOAP Axis server at: https://140.221.65.193:8444/wsrf/services/ java.lang.OutOfMemoryError
It is looking like concurrency limitations are not scheduler dependent. I got almost exactly the same results for Condor as for PBS. I repeated the same concurrency tests with the Condor scheduler: Clients on two processes submitting batch jobs as quickly as possible. The container ran out of memory after 512 jobs for one process and 524 for the other. Bob Here is a run string ********************** globusrun-ws -submit -batch -F https://lucky0:8444/wsrf/services/ManagedJobFactoryService -o epr_525 - c /bin/sleep 2000 -Ft Condor Here is the output from the service **************************************** 2005-01-27 15:07:12,714 INFO authorization.ServiceAuthorizationChain [Thread- 256,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError 2005-01-27 15:07:22,552 INFO authorization.ServiceAuthorizationChain [Thread- 6,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 15:07:31,978 INFO authorization.ServiceAuthorizationChain [Thread- 5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 15:07:45,664 INFO authorization.ServiceAuthorizationChain [Thread- 34,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.RuntimeException: Unable to invoke state transition method processStartState at org.globus.exec.service.exec.StateMachine.processState (StateMachine.java:269) at org.globus.exec.service.exec.RunQueue.run(RunQueue.java:93) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.globus.exec.service.exec.StateMachine.processState (StateMachine.java:263) ... 1 more Caused by: java.lang.OutOfMemoryError 2005-01-27 15:07:55,378 INFO authorization.ServiceAuthorizationChain [Thread- 6,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 15:08:03,475 INFO authorization.ServiceAuthorizationChain [Thread- 5,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-01-27 15:08:09,319 INFO authorization.ServiceAuthorizationChain [Thread- 34,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". java.lang.OutOfMemoryError
Per Jarek, When you are running these tests (with jobs close to 1000) can you also watch the /proc/<process id of JVM>/fd directory? This is to see how many fds are opened at the same time. Maybe we are running into the limit of opened fds per process. Also, increase the JVM heap size just to see if it is really a memory issue or some other issue.
Created an attachment (id=495) [details] logfile of /proc/[process id]/fds as the tests ran
Jarek pointed me at a new wrsf_core.jar which I dropped into the $G_L/lib to replace the one that was there. Jarek made changes that caused resources to be reclaimed sooner - these have now been committed to the trunk. I repeated the fork parallel job tests and found that the service handled about 300 more jobs (674 for each of two processes) before it stopped. Note the service did not actually crash but Jarek and I confirmed that counter- create failed: Error: WSDLException (at /wsdl:definitions/wsdl:import): faultCode=PARSER_ERROR: Problem parsing '../../../wsrf/notification/WS-BaseN.wsdl'.: java.lang.OutOfMemoryError" Command line is as before. Here are the processes that were running after it stopped responding: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND gaffaney 3061 0.0 0.2 21936 564 ? Ss Jan22 0:06 /usr/bin/gnome- session gaffaney 3089 0.0 0.0 3708 4 ? Ss Jan22 0:00 /usr/bin/ssh- agent -s gaffaney 3116 0.0 0.0 3164 4 ? S Jan22 0:00 /usr/bin/dbus- launch --exit-with-session /etc/X11/xinit/Xclients gaffaney 3117 0.0 0.0 3664 28 ? Ss Jan22 0:00 dbus-daemon-1 -- fork --print-pid 8 --print-address 6 --session gaffaney 3122 0.0 0.2 12312 608 ? S Jan22 0:34 /usr/libexec/gconfd-2 13 gaffaney 3124 0.0 0.0 2992 24 ? S Jan22 0:00 /usr/bin/gnome- keyring-daemon gaffaney 3126 0.0 0.0 8144 64 ? Ss Jan22 0:01 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=18 gaffaney 3128 0.0 0.2 20908 544 ? S Jan22 0:06 /usr/libexec/gnome-settings-daemon --oaf-activate- iid=OAFIID:GNOME_SettingsDaemon --oaf-ior-fd=22 gaffaney 3134 0.1 0.1 2768 288 ? S Jan22 16:33 /usr/libexec/gam_server gaffaney 3143 0.0 0.1 5784 320 ? S Jan22 2:08 xscreensaver - nosplash gaffaney 3170 0.0 1.3 13968 3384 ? Ss Jan22 3:12 metacity --sm- save-file 1103595795-13321-2880982762.ms gaffaney 3172 0.0 0.2 19624 540 ? Ss Jan22 0:03 gnome-volume- manager --sm-config-prefix /gnome-volume-manager-BX8QFJ/ --sm-client-id 117f000001000110358891100000132210001 --screen 0 gaffaney 3174 0.0 0.7 24608 1876 ? Ss Jan22 1:02 gnome-panel -- sm-config-prefix /gnome-panel-89cmJV/ --sm-client-id 117f000001000110358891100000132210002 --screen 0 --profile default gaffaney 3176 0.0 0.8 43640 2192 ? Ssl Jan22 0:40 nautilus --sm- config-prefix /nautilus-sMQZZa/ --sm-client-id 117f000001000110358891100000132210003 --screen 0 --no-default-window gaffaney 3178 0.0 0.2 41236 748 ? Ss Jan22 0:31 eggcups --sm- config-prefix /eggcups-jjXq59/ --sm-client-id 117f000001000110358891200000132210004 --screen 0 gaffaney 3185 0.0 0.0 21588 68 ? Sl Jan22 0:00 /usr/libexec/gnome-vfs-daemon --oaf-activate- iid=OAFIID:GNOME_VFS_Daemon_Factory --oaf-ior-fd=28 gaffaney 3187 0.0 0.2 13008 676 ? Ss Jan22 1:24 /usr/bin/pam- panel-icon --sm-client-id 117f000001000110358891200000132210005 gaffaney 3189 0.5 1.6 34840 4280 ? RNs Jan22 43:57 /usr/bin/python /usr/bin/rhn-applet-gui --sm-config-prefix /rhn-applet- JPJiZe/ --sm-client-id 117f000001000110358891800000132210006 --screen 0 gaffaney 3198 0.0 0.0 2840 112 ? S Jan22 0:36 /usr/libexec/mapping-daemon gaffaney 3219 0.0 1.0 22884 2756 ? S Jan22 1:14 /usr/libexec/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory - -oaf-ior-fd=32 gaffaney 3221 0.1 0.2 23480 636 ? S Jan22 11:38 /usr/libexec/mixer_applet2 --oaf-activate- iid=OAFIID:GNOME_MixerApplet_Factory --oaf-ior-fd=34 gaffaney 3223 0.0 0.7 21416 1888 ? S Jan22 0:56 /usr/libexec/clock-applet --oaf-activate- iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=36 gaffaney 3225 0.0 0.1 19020 472 ? S Jan22 0:05 /usr/libexec/notification-area-applet --oaf-activate- iid=OAFIID:GNOME_NotificationAreaApplet_Factory --oaf-ior-fd=38 gaffaney 3255 0.0 0.0 67372 68 ? Sl Jan22 0:00 /usr/libexec/evolution-data-server-1.0 --oaf-activate- iid=OAFIID:GNOME_Evolution_DataServer_InterfaceCheck --oaf-ior-fd=42 gaffaney 3262 0.0 0.1 65392 484 ? Sl Jan22 0:04 /usr/libexec/evolution/2.0/evolution-alarm-notify --oaf-activate- iid=OAFIID:GNOME_Evolution_Calendar_AlarmNotify_Factory:2.0 --oaf-ior-fd=44 gaffaney 30118 0.0 0.0 4992 140 ? Ss Jan23 0:00 ssh-agent gaffaney 30779 0.0 0.0 4280 140 ? Ss Jan23 0:00 ssh-agent gaffaney 30816 0.0 0.0 4536 140 ? Ss Jan23 0:00 ssh-agent gaffaney 30854 0.0 0.0 5048 140 ? Ss Jan23 0:06 ssh-agent gaffaney 27965 0.0 0.0 4904 140 ? Ss Jan24 0:03 ssh-agent gaffaney 29958 0.0 0.0 4376 140 ? Ss Jan26 0:00 ssh-agent gaffaney 5262 0.0 0.0 4828 140 ? Ss Jan27 0:00 ssh-agent gaffaney 5456 0.0 0.0 3872 140 ? Ss Jan27 0:00 ssh-agent gaffaney 5623 0.0 0.0 4232 140 ? Ss Jan27 0:00 ssh-agent gaffaney 11575 0.0 0.0 4136 140 ? Ss Jan27 0:00 ssh-agent gaffaney 12581 0.0 0.0 3824 140 ? Ss Jan27 0:00 ssh-agent gaffaney 12679 0.0 0.0 4928 140 ? Ss Jan27 0:00 ssh-agent gaffaney 5186 1.6 1.5 45076 3936 ? Sl 09:51 2:15 gnome-terminal gaffaney 5190 0.0 0.1 3380 272 ? S 09:51 0:00 gnome-pty-helper gaffaney 5191 0.0 0.0 5788 240 pts/0 Ss 09:51 0:01 bash gaffaney 5227 0.0 0.0 4208 140 ? Ss 09:53 0:00 ssh-agent gaffaney 5299 0.0 0.0 6184 240 pts/1 Ss 10:01 0:00 bash gaffaney 5321 0.0 0.0 4240 140 ? Ss 10:01 0:00 ssh-agent gaffaney 5342 0.0 3.3 430736 8576 pts/1 Sl+ 10:02 0:02 /home/gaffaney/j2sdk1.4.2_06/bin/java -Xmx256m -jar /usr/share/yjp- 3.2/bin/../lib/yjp.jar gaffaney 5389 0.2 65.9 382680 168992 pts/0 Sl+ 10:04 0:15 java - Xrunyjpagent port=10000 -DGLOBUS_LOCATION=/home/gaffaney/trunk-012205 - Djava.endorsed.dirs=/home/gaffaney/trunk-012205/endorsed - DLD_LIBRARY_PATH=/home/gaffaney/trunk-012205/lib - classpath /home/gaffaney/trunk-012205/lib/bootstrap.jar:/home/gaffaney/trunk- 012205/lib/cog-url.jar:/home/gaffaney/trunk-012205/lib/axis-url.jar org.globus.bootstrap.Bootstrap org.globus.wsrf.container.ServiceContainer gaffaney 5411 0.0 0.1 4604 412 pts/0 S+ 10:05 0:02 /home/gaffaney/trunk-012205/libexec/globus-scheduler-event-generator -s fork -t 1106752148 gaffaney 5435 0.0 0.0 4604 240 pts/2 Ss 10:07 0:00 bash gaffaney 5459 0.0 0.0 5824 240 pts/3 Ss 10:07 0:00 bash gaffaney 5481 0.0 0.0 3648 140 ? Ss 10:07 0:00 ssh-agent gaffaney 5502 0.0 0.0 4424 140 ? Ss 10:08 0:00 ssh-agent gaffaney 5766 0.0 0.1 7456 268 pts/2 S+ 10:20 0:02 perl ../consub.pl 2000 2000 gaffaney 5827 0.0 0.1 8720 268 pts/3 S+ 10:20 0:02 perl ../consub.pl 2000 2000 gaffaney 768 0.0 0.2 16952 720 pts/2 S+ 11:13 0:00 globusrun-ws - submit -batch -F https://mozia:8443/wsrf/services/ManagedJobFactoryService -o epr_675 -c /bin/sleep 2000 gaffaney 794 0.0 0.2 16720 744 pts/3 S+ 11:13 0:00 globusrun-ws - submit -batch -F https://mozia:8443/wsrf/services/ManagedJobFactoryService -o epr_675 -c /bin/sleep 2000 gaffaney 1044 0.0 0.4 5336 1232 pts/4 Ss 11:19 0:01 bash gaffaney 1410 0.0 0.3 2956 776 pts/4 R+ 12:07 0:00 ps ux
yjp profiler snapshots for the last run are on wiggum: /tmp/gaffaney/012805_after-startcontainer.memory /tmp/gaffaney/012805_after-servicecrashed.memory /tmp/gaffaney/012805_after-kill_dash_QUIT.memory Note that these are huge (the last two are ~61M) which may dictate how/where you open them with the profiler
I am calling the tests Parallel Job Submission tests because concurrency has been previously used to desribe many processes submitting to the service, not a few processes submitting lots of jobs. The three test scripts are checked into: ws-gram/service/java/test/scalability/bin and a short doc describing them is in ws-gram/service/java/test/scalability and pasted below. ******************************************* GRAM Parallel Job Submission Tests 1/28/2005 - R. Gaffaney consub.pl, constat.pl and conkill.pl are perl scripts that provide a simple way to submit and manage a stream of jobs. Other than needing globusrun-ws in the path they have no dependencies. You can run them from anywhere and it is suggested you create a directory for each process you intend to use to submit jobs for reasons that will be descibed below. These are simple jobs have no staging and can be submitted to any installed scheduler. Here is a typical run string: globusrun-ws -submit -batch -F \ https://lucky0:8444/wsrf/services/ManagedJobFactoryService \ -o epr_525 -c /bin/sleep 2000 -Ft Condor consub.pl --------- Jobs are submitted using consub.pl with the following command line: ./consub.pl Host:Port JobCount SleepTime Arg4 Arg5 Host:Port e.g. lucky0.mcs.anl.gov:8443 JobCount e.g. 1000 SleepTime e.g. 1000 (~16.7 Minutes) Arg4 Either nothing or -Ft Arg5 Either nothing or Fork | PBS | Condor | LHS You can skip Arg4 and Arg5 for fork jobs. In order to make sure the jobs stay active until they are all submitted make SleepCount long enough to last beyond the time it takes to submit all jobs. consub.pl captures the epr for each job in a file named "epr_N" where N is from 0 to (JobCount - 1). It is necessary when submitting from multiple processes to run in different directories since files would be overwritten. It would certainly be possible fix this, but didn't seem worth the effort. constat.pl ---------- constat.pl takes the following command line: ./constat.pl JobCount It iterates through the epr files created by constat.pl and makes a globusrun-ws call to collect the status of the job. It displays in the following format: -------------------------------------------------- Date/Time: Tue Jan 25 12:28:35 2005 cccccccccccccccccccccccccccccccccccccccccccccccccc cccccccccccccccccccccccccccccddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Total: 500 Active: 161 Done: 260 Cleanup: 79 -------------------------------------------------- It displays jobs 50 on a line showing that they are in either the 'a' for active state, 'd' for done state or 'c' for cleanup state. Note that constat.pl causes the service to allocate resources itself, so it can affect the results of the test. constat.pl waits ten seconds after it gets through the eprs and starts again. It will do so until you CTRL-C out of it. conkill.pl ---------- conkill.pl spins through the eprs and calls globusrun-ws -kill for each. The command line is thus: ./conkill.pl JobCount It also removes each epr_N file if it is successful. If the tests crash the server the epr files have to be removed manually.
I rebuilt with Peter's changes and found that GRAM capacity for submissions has increased significantly. I submitted 4000 Condor batch jobs from a single process and the container did not have any problems. So I blasted it from four processes simultaneously and the service failed at around 2850 total jobs. At this point Condor reported it had 23 of the jobs enqueued. Service log is below ******************** 2005-02-01 15:52:14,966 ERROR factory.ManagedJobFactoryService [Thread- 95,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 27 more 2005-02-01 15:52:15,253 INFO authorization.ServiceAuthorizationChain [Thread- 94,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-02-01 15:52:15,287 ERROR factory.ManagedJobFactoryService [Thread- 94,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 27 more 2005-02-01 15:52:15,393 ERROR factory.ManagedJobFactoryService [Thread- 93,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 27 more 2005-02-01 15:52:15,459 INFO authorization.ServiceAuthorizationChain [Thread- 95,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-02-01 15:52:15,689 ERROR container.GSIServiceThread [Thread- 99,process:117] Error processing request java.lang.NullPointerException at org.globus.gsi.CertificateRevocationLists.reload (CertificateRevocationLists.java:104) at org.globus.gsi.CertificateRevocationLists.getDefault (CertificateRevocationLists.java:179) at org.globus.gsi.CertificateRevocationLists.getDefaultCertificateRevocationLists (CertificateRevocationLists.java:168) at org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain (GlobusGSSContextImpl.java:689) at org.globus.gsi.gssapi.GlobusGSSContextImpl.acceptSecContext (GlobusGSSContextImpl.java:295) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer (GssSocket.java:119) at org.globus.gsi.gssapi.net.GssSocket.startHandshake (GssSocket.java:137) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream (GssSocket.java:155) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:88) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) 2005-02-01 15:52:15,772 INFO authorization.ServiceAuthorizationChain [Thread- 94,authorize:281] Authorized "/DC=org/DC=doegrids/OU=People/CN=Robert C. Gaffaney" to invoke "{http://www.globus.org/namespaces/2004/10/gram/job} createManagedJob". 2005-02-01 15:52:15,797 ERROR factory.ManagedJobFactoryService [Thread- 95,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 27 more 2005-02-01 15:52:15,810 ERROR factory.ManagedJobFactoryService [Thread- 94,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 27 more 2005-02-01 15:53:57,476 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 15:58:56,908 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 15:58:57,486 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 15:58:57,534 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:03:56,917 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:03:57,495 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:03:57,542 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:08:56,925 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:08:57,504 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:08:57,550 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:13:56,934 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:13:57,513 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:13:57,558 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:14:10,435 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:14:10,458 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:14:12,499 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:14:12,519 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:16:50,467 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:16:52,466 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:16:52,486 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:178) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:16:54,530 ERROR exec.ManagedExecutableJobHome [Thread- 4,jobStateChanged:417] Unable to deliver state change notification -- resource associated with job does not exist org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.utils.PersistenceHelper.load (PersistenceHelper.java:133) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.load (PersistentManagedExecutableJobResource.java:232) at org.globus.wsrf.impl.ResourceHomeImpl.createNewInstanceAndLoad (ResourceHomeImpl.java:236) at org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:271) at org.globus.wsrf.impl.ResourceHomeImpl.find (ResourceHomeImpl.java:256) at org.globus.exec.service.exec.ManagedExecutableJobHome.jobStateChanged (ManagedExecutableJobHome.java:413) at org.globus.exec.monitoring.JobStateMonitor.dispatchEvent (JobStateMonitor.java:438) at org.globus.exec.monitoring.JobStateMonitor.addEvent (JobStateMonitor.java:416) at org.globus.exec.monitoring.SchedulerEventGenerator.run (SchedulerEventGenerator.java:166) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573) ... 12 more 2005-02-01 16:18:56,943 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-fork java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:18:57,522 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-condor java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534) 2005-02-01 16:18:57,566 WARN usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:285] Script Execution error when executing shell /home/gaffaney/trunk-012805/libexec/globus-scheduler-provider-pbs java.io.IOException: java.io.IOException: Too many open files at java.lang.UNIXProcess.<init>(UNIXProcess.java:143) at java.lang.Runtime.execInternal(Native Method) at java.lang.Runtime.exec(Runtime.java:566) at java.lang.Runtime.exec(Runtime.java:428) at java.lang.Runtime.exec(Runtime.java:364) at java.lang.Runtime.exec(Runtime.java:326) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.runScript (GLUEResourceProperty.java:272) at org.globus.mds.usefulrp.GLUEResourceProperty$PeriodicExecutor.run (GLUEResourceProperty.java:236) at java.lang.Thread.run(Thread.java:534)
Btw, I fixed that NPE in org.globus.gsi.CertificateRevocationLists.reload(). That is caused by the I/O problems (too many files open)
To see what role Condor submission times play in overall gram submission times I ran a script that submits Condor jobs directly using condor_submit. The jobs were the same (/bin/sleep 2000) as for the gram scheduler tests. It took a 1' 47" to submit 500 jobs which works out to about 280 jobs a minute. This compares to about 18 jobs a minute with gram - the delays are definately on the gram side.
I ran the tests using the PBS scheduler using four clients. The clients each got to about 800 jobs before the service reported the error below. qstat showed 33 jobs enqueued of which 6 were active. ************************************************************8 2005-02-01 20:25:21,135 ERROR factory.ManagedJobFactoryService [Thread- 102,createManagedJob:312] Job creation failed. org.globus.wsrf.ResourceException: ; nested exception is: at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:576) at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize (ManagedExecutableJobResource.java:198) at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState (ManagedExecutableJobResource.java:139) at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeW holeState(PersistentManagedExecutableJobResource.java:137) at org.globus.exec.service.exec.ManagedExecutableJobHome.create (ManagedExecutableJobHome.java:239) at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob (ManagedJobFactoryService.java:262) at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.axis.providers.java.RPCProvider.invokeMethod (RPCProvider.java:384) at org.globus.axis.providers.RPCProvider.invokeMethodSub (RPCProvider.java:104) at org.globus.axis.providers.PrivilegedInvokeMethodAction.run (PrivilegedInvokeMethodAction.java:39) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:379) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:49) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:84) at org.globus.axis.providers.RPCProvider.invokeMethod (RPCProvider.java:94) at org.apache.axis.providers.java.RPCProvider.processMessage (RPCProvider.java:281) at org.apache.axis.providers.java.JavaProvider.invoke (JavaProvider.java:319) at org.apache.axis.strategies.InvocationStrategy.visit (InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.handlers.soap.SOAPService.invoke (SOAPService.java:450) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285) at org.globus.wsrf.container.ServiceThread.doPost (ServiceThread.java:647) at org.globus.wsrf.container.ServiceThread.process (ServiceThread.java:378) at org.globus.wsrf.container.GSIServiceThread.process (GSIServiceThread.java:124) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:281) Caused by: at sun.reflect.GeneratedConstructorAccessor62.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at java.lang.Class.newInstance0(Class.java:308) at java.lang.Class.newInstance(Class.java:261) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:482) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:70) at org.globus.exec.utils.FaultUtils.createServiceLevelAgreementFault (FaultUtils.java:88) at org.globus.exec.service.exec.ManagedExecutableJobResource.resolveVariableInStri ng(ManagedExecutableJobResource.java:931) at org.globus.exec.service.exec.ManagedExecutableJobResource.initVariableMap (ManagedExecutableJobResource.java:573)
Bob, Stu and I discussed changing the number of RunQueue threads to get more submissions happening at once. The change to the code is really easy: Edit line 22 of ws-gram/service/java/source/org/globus/exec/service/exec/RunQueue.java, look for NUM_RUN_QUEUES and change the value from 1 to 16. Recompile the service package. This will give you 16 RunQueue threads. Then please rerun the test and report the results. Thanks!
I made that change that Peter suggested bumping NUM_RUN_QUEUES from 1 to 16 in RunQueue.java. The items in the condor queue now keep up with the job submissions. For example, with about 450 jobs submitted from each of four threads (= 1800) condor_q showed 1773 jobs in its queue. From the time stamps in the condor_q listing I can see that between 98 and 102 jobs per minute are being added to the queue. This change seems like a keeper to me.
I disagree with hard-coding the gram code with 16 threads! That's way too many - that's more what even the container uses for its thread pool by default. I think at least this value should be configurable and by default set to something smaller. More tests should be run with less RunQueue threads. The 16 threads just means the gram can process 16 job submits at a time... I think the real problem might be how certain state changes are processed. They just block for too long. For example, the StateMachine.procesSubmitState() starts a submit script in the background (in a separate thread) but blocks until it finishes (therefore, blocking everything that's in the queue). Why run it in a separate thread then anyway? Or why not change that function to just start the submit script in background thread (as it is now) and let that thread do the right state processing at the end? That way gram could support more then 16 job submits at a time... The same issue applies to mergeStdout() and cacheCleanup ().
Jarek makes some points - I'll let the gram architects and developers probe for that right balance. But as an experiment this test shows that with the current design if things get moved out to the scheduler quickly it takes stress off gram. 8000 jobs were submitted, 2000 from each of four threads, with no reported errors by the container or the clients. Since these are long sleep jobs and my Condor installation only runs them four at a time there were mostly still in the condor queue in the morning: 7994 jobs; 7994 idle, 0 running, 0 held
Not to familiar with the internal gram state transitions but even though condor_q is showing all of the 8000 jobs in its queue a scan (globusrun-ws - status) shows them all as unsubmitted except the few that are done or active. The good news is that none is being rejected now. Below is the output from one client: ******************* -------------------------------------------------- Date/Time: Thu Feb 3 06:43:17 2005 ccccacuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu Total: 2000 Active: 1 Done: 0 Cleanup: 5 Unsub: 1994 Pending: 0 Rejected: 0 --------------------------------------------------
Jarek, I suggested the exact same thing. Stu's argument against this was that by fixing the processSubmitState method, this would allow the StateMachine to overload the scheduler since it can submit jobs as fast as it can background the task. In the end, this could easily start a lot more JobManagerScript threads then the 16 RunQueue threads anyway. By controlling the number of queues, we can limit the number of submits done simultaneously. I don't mind making this value configurable. I actually kind of expected this to be configurable at some point. This was just a test to see if it alleviated the problem we were having. The number of threads was determined by a simple calculation that 16 * time-to-submit-through-gram = time-to-submit- manually. It's not the most beautiful fix, but I don't want to delve into Karl's suggested rate limiting campaigns for 4.0 if I can help it.
I'd like to retest with 4 threads and see how that performs. And choose one of these as the default for 4.0. We can work on a more elegant and efficient method as Jarek suggests for 4.2.
with run queue threads = 4, I duplicated Bobs test of 4 client submitting at the same time. After 1000 jobs, I killed all the clients. Condor reported 500 jobs. After waiting a few minutes (maybe 5-10) all 1000 jobs eventually were submitted to condor. Shows that for this load/burst, the number of run queue threads needs to be higher. I will rerun with 8 and see what happens. here is a top snapshot about after about 500 jobs were submitted: 121 processes: 117 sleeping, 3 running, 1 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 118.2% 0.0% 78.6% 0.0% 0.4% 0.0% 1.8% cpu00 58.5% 0.0% 40.2% 0.0% 0.1% 0.0% 0.9% cpu01 59.9% 0.0% 38.2% 0.1% 0.3% 0.1% 0.9% Mem: 515760k av, 495832k used, 19928k free, 0k shrd, 55632k buff 265380k active, 167928k inactive Swap: 1048552k av, 33584k used, 1014968k free 226028k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 20336 smartin 15 0 16252 6780 15576 S 2.7 1.3 0:00 1 globusrun-ws 20338 smartin 15 0 16252 6780 15576 S 2.7 1.3 0:00 1 globusrun-ws 20358 smartin 15 0 16252 6780 15576 S 2.5 1.3 0:00 0 globusrun-ws 25485 condor 15 0 82296 75M 5024 S 1.7 15.0 7:58 1 condor_schedd 20387 smartin 19 0 16120 5676 15576 R 1.7 1.1 0:00 1 globusrun-ws 30268 condor 16 0 5476 1744 4676 S 0.7 0.3 25:51 1 condor_negoti 20054 smartin 16 0 4268 1244 3956 R 0.7 0.2 0:00 1 top 2332 root 15 0 18568 1696 12240 S 0.3 0.3 628:46 1 X 10817 smartin 16 0 7356 3660 5028 S 0.3 0.7 0:00 0 perl 11044 condor 17 0 5512 2152 4840 S 0.3 0.4 0:00 0 condor_starte 30267 condor 16 0 5620 1924 4640 S 0.1 0.3 5:35 1 condor_collec 10978 smartin 25 10 5348 2220 4800 S N 0.1 0.4 0:00 1 condor_shadow 20393 smartin 19 0 0 0 0 Z 0.1 0.0 0:00 0 perl <defunct
I just changed the default thread count to 16. This is should be sufficient until we can implement a better rate limiting scheme after 4.0.
Think we can mark this campaign as closed. The test scripts are in: ws-gram/service/java/test/scalability/bin and instructions for running them are in: ws-gram/service/java/test/scalability