Bug 5015 - null pointer exception when submitting job to nightly build
: null pointer exception when submitting job to nightly build
wsrf managed job factory service
: development
: PC Windows NT
: P3 normal
: 4.1.1
Assigned To:
  Show dependency treegraph
Reported: 2007-02-16 10:28 by
Modified: 2007-02-28 01:28 (History)

container log w/ the null pointer exception in it (88.96 KB, application/octet-stream)
2007-02-16 10:29, Adam Bazinet
the job description file (1.01 KB, text/xml)
2007-02-16 10:30, Adam Bazinet


You need to log in before you can comment on or make changes to this bug.

Description From 2007-02-16 10:28:01
I get a null pointer exception, and a
bit before that, there is a warning:

2007-02-16T10:39:52,701-05:00 WARN  factory.ManagedJobFactoryService4_0
[ServiceThread-25,createManagedJob:363] ComputingElement/any is null

I notice now there is a v4_2 ManagedJobFactoryService, but I'm just hitting up
the usual (4_0 ?) one.  Anyway, I'll attach the container log with GRAM debug
turned on.  I have tried submitting a job programmatically, from a remote 4.1.0
host using globusrun-ws, and locally using the nightly build's globusrun-ws
program, and I get this exception with each and every method.  I have also been
wondering, is the globusrun-ws command line tool still the preferred one for
job submission?  Thanks...

------- Comment #1 From 2007-02-16 10:29:13 -------
Created an attachment (id=1189) [details]
container log w/ the null pointer exception in it
------- Comment #2 From 2007-02-16 10:30:15 -------
Created an attachment (id=1190) [details]
the job description file
------- Comment #3 From 2007-02-19 00:54:02 -------
The warning
  2007-02-16T10:39:52,701-05:00 WARN  factory.ManagedJobFactoryService4_0
  [ServiceThread-25,createManagedJob:363] ComputingElement/any is null
can be ignored; i'll change that to a warning soon.

v4_2/ManagedJobFactoryService is the Job Factory Service which accepts JSDL
job descriptions. So you're right when you use the usual one to submit jobs
described in the WS-GRAM job description.

For simple job submission the globusrun-ws command ist the preferred one.
For concurrent job submissions Condor-G can be used. But i'm not sure if this
can be done with a container from cvs HEAD so far.

Now, the nullpointer exception: It's the extensions element in the job
description that causes the problem. I guess you submit to Condor if you use
the should_transfer_files and when_to_transfer_output extensions?
Otherwise you don't need them.
I found that the problem seems to occur as soon as more than one extension
element is specified in the job description document. We didn't know that so
far and I'll have a look at that but can't tell you when right now.
Maybe you can avoid using them until we fixed that?
Thanks for posting that bug!
------- Comment #4 From 2007-02-19 06:41:08 -------
Unfortunately we rely on the extensions block for submitting to Condor, PBS,
etc... but thank you for the other information.  Concurrent job submission with
Condor-G is something that interests me and I wouldn't mind knowing if there
are any docs about what you are referring to.  Finally, if there is any
documentation describing what the advantages of switching to JSDL are, perhaps
I could start future-proofing us.
------- Comment #5 From 2007-02-28 01:20:01 -------
i fixed that NullPointerException some minutes ago. Also the warning
in your initial description is a debug statement now.
If you want to test it, get WS-GRAM from cvs HEAD (cvs co ws-gram).
And unfortunately you should recreate your database for WS-GRAM persistency
data once more. Some datatypes had been wrong for MySQL and PostgreSQL which
caused serialization problems.
Regarding Condor-G: We use it as client to test WS-GRAM in large, concurrent
job submission and advise to use it for this scenario. We wrote some of the 
experiences we gained during these tests to a recommendation-paper. I'll send
it to you via email, see if it's useful for you.
Regarding JSDL: From my point of view there's no real advantage, but some
communities want it, and it's a standard for a job description language.
So i guess it's a good thing to support it.
I'll mark this bug as fixed and close it. Don't hesitate to open new ones
if you find things that don't work; we're glad to have persons like you who
point us to things we didn't find so far.