Bug 3529 - setup/postinstall fatal errors should be warnings
: setup/postinstall fatal errors should be warnings
Status: RESOLVED WONTFIX
: GRAM
wsrf scheduler interface
: 4.0.0
: All All
: P3 normal
: 4.0.5
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-06-29 10:25 by
Modified: 2012-09-05 11:42 (History)


Attachments
Patch to setup-globus-job-manager-pbs.pl to make not finding pbs a warning, not an error (457 bytes, patch)
2006-02-06 10:00, Eric Blau
Details
patch to setup-seg-pbs.pl to make not finding PBS a warning, not an error (319 bytes, patch)
2006-02-06 10:01, Eric Blau
Details
patch to setup-globus-scheduler-provider-pbs.pl to make not finding pbs a warning, not an error (647 bytes, patch)
2006-02-06 10:02, Eric Blau
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-06-29 10:25:41
As requested, these types of errors in the scheduler setup program should be
changed to warnings.  
Appropriate WARN messages should be displayed at setup time and when the
container is started up.  A 
fault message should be returned for a submitted job when the gram service
knows it is not properly 
setup.  Copy the method used for the handing of mpirun in the Fork setup.

>>>>>
From: JP Navarro <navarro@mcs.anl.gov>
Hey Stu,

I think this check doesn't make sense at software installation
time.  Perhaps a warning might be appropriate so the deployer
knows that something needs to be done later, or rerun, or that
when the component runs it will look in that directory for files,
would be ok.

I'm installing this software on IA64 and the machine where the
server_logs are located is IA32 so the only way I could satisfy
the requirement so the install doesn't fail is to create that
directory to make the installer happy.

JP

running /soft/globus-4.0.0-gcc-t1/setup/globus/setup-seg-pbs.pl..[ Changing to
/soft/globus
-4.0.0-gcc-t1/setup/globus ]
/var/spool/pbs/server_logs does not exist.
Re-run this setup package with PBS_HOME environment variable
pointing to the directory containing the PBS server-logs subdirectory

<<<<<
------- Comment #1 From 2005-11-02 19:34:48 -------
I don't understand why installing PBS first is a problem.  If the server_logs
directory doesn't exist, then the SEG won't work.  That's why the check is there.
------- Comment #2 From 2005-11-03 09:14:32 -------
This comes up a lot when we generate binaries and when we do nightly testing. 
Here are the two 
scenarios:

Testing.
We'd like to build the entire toolkit, then run all the tests.  That includes
building the scheduler 
adapters on machines that don't really have the schedulers installed.  If the
adapter spit out a WARNING 
and did not finalize itself, we would see that it wasn't setup via gpt-verify,
but would still be able to 
continue on our merry way.  When it throws an ERROR, it kills the whole
postinstall stack behind it, so 
we can't finish postinstalling.

Binaries.
Same thing.  One site would like to build binaries to distribute to remote
sites.  Right now they have to 
make a source install, not test it, then reinstall from binaries without the
adapters they just built so 
their postinstall can succeed.

In both cases I'm not suggesting that the package call the GPT finalize call to
say that it is done, just 
that it not stall the other postinstall scripts that could succeed just because
it couldn't.  I realize that 
nightly testing and binaries are slightly exceptional use cases, and would be
able to deal with this bug 
being closed.  It would just be very convenient for me if it worked otherwise. 
If that results in a net loss 
for our user community, then nevermind.

Although we do have instances where people choose the PBS adapter in the
mistaken belief that it 
*provides* PBS.  While they can gpt-uninstall the offending setup package, I
don't think we have any 
instructions about how to do that.  Such users tend to reinstall out of
frustration.
------- Comment #3 From 2005-11-03 10:22:22 -------
Peter,

The Argonne TeraGrid cluster is an example of how this check doesn't make sense.  We have
two architectures (ia32 and ia64) and install PBS software on each separately, and then create
the local /var/spool/pbs directory on a single ia32 machine where we run the PBS server.

When we install Globus software we do it on machines that don't have the /var/spool/pbs
directory. As a matter of fact, it's impossible to install our ia64 software on the machine
that will run the PBS server (and has the /var/spool/pbs) because it's ia32.

This is just one scenario. Even on a homogeneous cluster Globus installation may happen
on a different machine than the PBS server, so this check really doesn't make sense.  This
should not be a software installation check. It should be a run-time requirement/check.

JP
------- Comment #4 From 2005-11-08 13:11:20 -------
JP,

I still don't understand how you make GRAM work if the PBS log directories
aren't on the same machine as GRAM is installed.  I have to be missing something
because getting SEG to work in your scenario sounds physcially impossible unless
you have a shared filesystem or hacked GRAM to start the SEG via ssh (or
something similar).


Charles,

Why is gpt-postinstall being run when creating binary bundles?  I thought that
prohibited relocatability.


All that said, thinking about this more I realized that assuming we get around
to enabling SEG startup through ssh, this check will not make sense in that
scenario either and will have to be removed eventually.  I'm just trying to
understand the current complaints for my own sanity.
------- Comment #5 From 2005-11-08 13:22:56 -------
Peter,

Sorry I didn't explain it well.

The problem is that we install the software on one machine to NFS space, and then run
GRAM out of NFS on a different machine. So the underlying assumption that "the machine
where the software installation happens is the machine where the software will run" isn't
valid.
------- Comment #6 From 2005-12-16 10:38:45 -------
I will make the changes to the setup packages so that they do not error out,
but
rather give warnings.  
------- Comment #7 From 2006-02-06 10:00:47 -------
Created an attachment (id=841) [details]
Patch to setup-globus-job-manager-pbs.pl to make not finding pbs a warning, not
an error
------- Comment #8 From 2006-02-06 10:01:30 -------
Created an attachment (id=842) [details]
patch to setup-seg-pbs.pl to make not finding PBS a warning, not an error
------- Comment #9 From 2006-02-06 10:02:25 -------
Created an attachment (id=843) [details]
patch to setup-globus-scheduler-provider-pbs.pl to make not finding pbs a
warning, not an error
------- Comment #10 From 2006-02-06 10:05:23 -------
I have attached patches to the various pbs jobmanager setup packages to make
the
condition where PBS (or the PBS log file) is not found a non-fatal warning
(that
leaves the setup package still not-setup.

These (or something similar) need to be committed to the globus_4_0_community
branch to be in the Teragrid CTSSv3 -r3 installer, but I do not have commit
access to this part of CVS anymore. 
------- Comment #11 From 2006-02-06 15:01:19 -------
I just applied these to the commununity branch.  They should probably be
applied to the 4 0 branch and 
trunk too.
------- Comment #12 From 2012-09-05 11:42:45 -------
Doing some bugzilla cleanup...  Resolving old GRAM3 and GRAM4 issues that are
no longer relevant since we've moved on to GRAM5.  Also, we're now tracking
issue in jira.  Any new issues should be added here:

http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363