Bugzilla – Bug 3529
setup/postinstall fatal errors should be warnings
Last modified: 2012-09-05 11:42:45
You need to log in before you can comment on or make changes to this bug.
As requested, these types of errors in the scheduler setup program should be changed to warnings. Appropriate WARN messages should be displayed at setup time and when the container is started up. A fault message should be returned for a submitted job when the gram service knows it is not properly setup. Copy the method used for the handing of mpirun in the Fork setup. >>>>> From: JP Navarro <navarro@mcs.anl.gov> Hey Stu, I think this check doesn't make sense at software installation time. Perhaps a warning might be appropriate so the deployer knows that something needs to be done later, or rerun, or that when the component runs it will look in that directory for files, would be ok. I'm installing this software on IA64 and the machine where the server_logs are located is IA32 so the only way I could satisfy the requirement so the install doesn't fail is to create that directory to make the installer happy. JP running /soft/globus-4.0.0-gcc-t1/setup/globus/setup-seg-pbs.pl..[ Changing to /soft/globus -4.0.0-gcc-t1/setup/globus ] /var/spool/pbs/server_logs does not exist. Re-run this setup package with PBS_HOME environment variable pointing to the directory containing the PBS server-logs subdirectory <<<<<
I don't understand why installing PBS first is a problem. If the server_logs directory doesn't exist, then the SEG won't work. That's why the check is there.
This comes up a lot when we generate binaries and when we do nightly testing. Here are the two scenarios: Testing. We'd like to build the entire toolkit, then run all the tests. That includes building the scheduler adapters on machines that don't really have the schedulers installed. If the adapter spit out a WARNING and did not finalize itself, we would see that it wasn't setup via gpt-verify, but would still be able to continue on our merry way. When it throws an ERROR, it kills the whole postinstall stack behind it, so we can't finish postinstalling. Binaries. Same thing. One site would like to build binaries to distribute to remote sites. Right now they have to make a source install, not test it, then reinstall from binaries without the adapters they just built so their postinstall can succeed. In both cases I'm not suggesting that the package call the GPT finalize call to say that it is done, just that it not stall the other postinstall scripts that could succeed just because it couldn't. I realize that nightly testing and binaries are slightly exceptional use cases, and would be able to deal with this bug being closed. It would just be very convenient for me if it worked otherwise. If that results in a net loss for our user community, then nevermind. Although we do have instances where people choose the PBS adapter in the mistaken belief that it *provides* PBS. While they can gpt-uninstall the offending setup package, I don't think we have any instructions about how to do that. Such users tend to reinstall out of frustration.
Peter, The Argonne TeraGrid cluster is an example of how this check doesn't make sense. We have two architectures (ia32 and ia64) and install PBS software on each separately, and then create the local /var/spool/pbs directory on a single ia32 machine where we run the PBS server. When we install Globus software we do it on machines that don't have the /var/spool/pbs directory. As a matter of fact, it's impossible to install our ia64 software on the machine that will run the PBS server (and has the /var/spool/pbs) because it's ia32. This is just one scenario. Even on a homogeneous cluster Globus installation may happen on a different machine than the PBS server, so this check really doesn't make sense. This should not be a software installation check. It should be a run-time requirement/check. JP
JP, I still don't understand how you make GRAM work if the PBS log directories aren't on the same machine as GRAM is installed. I have to be missing something because getting SEG to work in your scenario sounds physcially impossible unless you have a shared filesystem or hacked GRAM to start the SEG via ssh (or something similar). Charles, Why is gpt-postinstall being run when creating binary bundles? I thought that prohibited relocatability. All that said, thinking about this more I realized that assuming we get around to enabling SEG startup through ssh, this check will not make sense in that scenario either and will have to be removed eventually. I'm just trying to understand the current complaints for my own sanity.
Peter, Sorry I didn't explain it well. The problem is that we install the software on one machine to NFS space, and then run GRAM out of NFS on a different machine. So the underlying assumption that "the machine where the software installation happens is the machine where the software will run" isn't valid.
I will make the changes to the setup packages so that they do not error out, but rather give warnings.
Created an attachment (id=841) [details] Patch to setup-globus-job-manager-pbs.pl to make not finding pbs a warning, not an error
Created an attachment (id=842) [details] patch to setup-seg-pbs.pl to make not finding PBS a warning, not an error
Created an attachment (id=843) [details] patch to setup-globus-scheduler-provider-pbs.pl to make not finding pbs a warning, not an error
I have attached patches to the various pbs jobmanager setup packages to make the condition where PBS (or the PBS log file) is not found a non-fatal warning (that leaves the setup package still not-setup. These (or something similar) need to be committed to the globus_4_0_community branch to be in the Teragrid CTSSv3 -r3 installer, but I do not have commit access to this part of CVS anymore.
I just applied these to the commununity branch. They should probably be applied to the 4 0 branch and trunk too.
Doing some bugzilla cleanup... Resolving old GRAM3 and GRAM4 issues that are no longer relevant since we've moved on to GRAM5. Also, we're now tracking issue in jira. Any new issues should be added here: http://jira.globus.org/secure/VersionBoard.jspa?selectedProjectId=10363