Bugzilla – Bug 3529
setup/postinstall fatal errors should be warnings
Last modified: 2012-09-05 11:42:45
You need to
before you can comment on or make changes to this bug.
As requested, these types of errors in the scheduler setup program should be
changed to warnings.
Appropriate WARN messages should be displayed at setup time and when the
container is started up. A
fault message should be returned for a submitted job when the gram service
knows it is not properly
setup. Copy the method used for the handing of mpirun in the Fork setup.
From: JP Navarro <email@example.com>
I think this check doesn't make sense at software installation
time. Perhaps a warning might be appropriate so the deployer
knows that something needs to be done later, or rerun, or that
when the component runs it will look in that directory for files,
would be ok.
I'm installing this software on IA64 and the machine where the
server_logs are located is IA32 so the only way I could satisfy
the requirement so the install doesn't fail is to create that
directory to make the installer happy.
running /soft/globus-4.0.0-gcc-t1/setup/globus/setup-seg-pbs.pl..[ Changing to
/var/spool/pbs/server_logs does not exist.
Re-run this setup package with PBS_HOME environment variable
pointing to the directory containing the PBS server-logs subdirectory
I don't understand why installing PBS first is a problem. If the server_logs
directory doesn't exist, then the SEG won't work. That's why the check is there.
This comes up a lot when we generate binaries and when we do nightly testing.
Here are the two
We'd like to build the entire toolkit, then run all the tests. That includes
building the scheduler
adapters on machines that don't really have the schedulers installed. If the
adapter spit out a WARNING
and did not finalize itself, we would see that it wasn't setup via gpt-verify,
but would still be able to
continue on our merry way. When it throws an ERROR, it kills the whole
postinstall stack behind it, so
we can't finish postinstalling.
Same thing. One site would like to build binaries to distribute to remote
sites. Right now they have to
make a source install, not test it, then reinstall from binaries without the
adapters they just built so
their postinstall can succeed.
In both cases I'm not suggesting that the package call the GPT finalize call to
say that it is done, just
that it not stall the other postinstall scripts that could succeed just because
it couldn't. I realize that
nightly testing and binaries are slightly exceptional use cases, and would be
able to deal with this bug
being closed. It would just be very convenient for me if it worked otherwise.
If that results in a net loss
for our user community, then nevermind.
Although we do have instances where people choose the PBS adapter in the
mistaken belief that it
*provides* PBS. While they can gpt-uninstall the offending setup package, I
don't think we have any
instructions about how to do that. Such users tend to reinstall out of
The Argonne TeraGrid cluster is an example of how this check doesn't make sense. We have
two architectures (ia32 and ia64) and install PBS software on each separately, and then create
the local /var/spool/pbs directory on a single ia32 machine where we run the PBS server.
When we install Globus software we do it on machines that don't have the /var/spool/pbs
directory. As a matter of fact, it's impossible to install our ia64 software on the machine
that will run the PBS server (and has the /var/spool/pbs) because it's ia32.
This is just one scenario. Even on a homogeneous cluster Globus installation may happen
on a different machine than the PBS server, so this check really doesn't make sense. This
should not be a software installation check. It should be a run-time requirement/check.
I still don't understand how you make GRAM work if the PBS log directories
aren't on the same machine as GRAM is installed. I have to be missing something
because getting SEG to work in your scenario sounds physcially impossible unless
you have a shared filesystem or hacked GRAM to start the SEG via ssh (or
Why is gpt-postinstall being run when creating binary bundles? I thought that
All that said, thinking about this more I realized that assuming we get around
to enabling SEG startup through ssh, this check will not make sense in that
scenario either and will have to be removed eventually. I'm just trying to
understand the current complaints for my own sanity.
Sorry I didn't explain it well.
The problem is that we install the software on one machine to NFS space, and then run
GRAM out of NFS on a different machine. So the underlying assumption that "the machine
where the software installation happens is the machine where the software will run" isn't
I will make the changes to the setup packages so that they do not error out,
rather give warnings.
Created an attachment (id=841) [details]
Patch to setup-globus-job-manager-pbs.pl to make not finding pbs a warning, not
Created an attachment (id=842) [details]
patch to setup-seg-pbs.pl to make not finding PBS a warning, not an error
Created an attachment (id=843) [details]
patch to setup-globus-scheduler-provider-pbs.pl to make not finding pbs a
warning, not an error
I have attached patches to the various pbs jobmanager setup packages to make
condition where PBS (or the PBS log file) is not found a non-fatal warning
leaves the setup package still not-setup.
These (or something similar) need to be committed to the globus_4_0_community
branch to be in the Teragrid CTSSv3 -r3 installer, but I do not have commit
access to this part of CVS anymore.
I just applied these to the commununity branch. They should probably be
applied to the 4 0 branch and
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5. Also, we're now tracking
issue in jira. Any new issues should be added here: