Bug 660 - NFS vs. local installation
: NFS vs. local installation
Status: ASSIGNED
: Installation
Install
: unspecified
: All other
: P2 enhancement
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2003-01-30 20:34 by
Modified: 2009-03-13 15:01 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2003-01-30 20:34:57
References:

<http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=135>
(This bug report was recently closed; in my opinion, it should not
have been.  I don't see an option to re-open it.)

<http://www-unix.globus.org/mail_archive/discuss/2003/01/msg00265.html>
(A description of the problem I recently posted to discuss@globus.org.)

It has been said that Globus 2.X is designed to be locally installed
(unlike Globus 1.X, which has the install/deploy mechanism -- overly
complex, but it worked).

Here at SDSC, most applications, including Globus, are installed
in a large NFS-mounted filesystem that's shared by several hundred
workstations.

In one possible scenario, I install Globus 2.2.3 in
/usr/local/apps/globus-2.2.3, which is on an NFS filesystem.
There are, say, 100 workstations (all with the same hardware and OS)
that need shared access to the Globus installation for the client
programs and libraries.  In addition, there are, say, 3 server systems
sharing the same Globus installation and providing Globus services
(gsigatekeeper, gsiftp, gris).

If I don't pay attention to the NFS issues, I get all three servers
trying to write to the same $GLOBUS_LOCATION/var/globus-gatekeeper.log
file.  I don't want three systems trying to write simultaneously to
the same log file, but the filesystem is configured to map "root" to
"nobody" *and* it's mounted read-only, so none of them have permission
to write to the log file anyway.

This is a well-known problem with a known workaround: make the
var subdirectory a symlink to a locally mounted filesystem, say,
/scratch/slocal/globus-2.2.3/var.  All three servers have to have the
exact same directory path for the symlink target (which is ok for SDSC,
but could be a problem for some).  Nothing under the var subdirectory
needs to be visible to the client workstations, so var can just be
a dangling symlink as seen from all systems other than the servers.

If that were the only issue, I wouldn't mind, but we're not done yet.
There are several files under $GLOBUS_LOCATION/etc that need to be
distinct for each server system.  Charles Bacon has said that
the only such files are:

    globus-job-manager.conf
    grid-info-resource-ldif.conf
    grid-info-resource-register.conf
    grid-info.conf

At the time, I pointed out that the LDAP certificate and key also
needed to be distinct for each server, but they've since been moved to
/etc/grid-security/ldap .  I suspect that the set of files under etc
that need to be localized is a moving target, changing from one release
to the next.  Even if it doesn't change, it's not well documented,
and there's no direct support for this kind of thing in the Globus
installation procedures; all this stuff has to be done manually.

I've thought of making the entire $GLOBUS_LOCATION/etc directory a
symlink to a local directory, as I do for $G_L/var -- but unlike
$G_L/var, the $G_L/etc directory contains things that need to be
visible to the clients.  It's practical to replicate the etc directory
across the 3 server systems, but not across the 100 client systems.

Something else that nobody has mentioned so far: there's also a
$GLOBUS_LOCATION/tmp directory.  On the installations I've checked,
it just contains an empty subdirectory called "gram_job_state".
I don't know what it's used for, or whether it needs to be localized
for each server, or whether having it on a read-only filesystem is
going to cause problems.

The fact that we keep thinking of new instances of the problem
months after the initial response tells me that this area needs to
be cleaned up.

So, there are (at least) three relevant classes of files under
$GLOBUS_LOCATION:

1. Read-only files that are needed for both client and server systems,
   and can be shared by all systems.
   Examples:
      etc/* (with some exceptions)
      bin/*
      sbin/*
      libexec/*
      lib/*
      include/*

2. Read-only files that need to be localized for each server, which
   are not needed by clients.
   Examples:
      etc/globus-job-manager.conf
      etc/grid-info-resource-ldif.conf
      etc/grid-info-resource-register.conf
      etc/grid-info.conf

3. Writable files that need to be localized for each server, which
   are not needed by clients.
   Examples:
      var/globus-gatekeeper.log
      tmp/* (???)

If these classes of files could be separated into distinct directories,
it would go a long way towards making NFS installations easier.
(The distinction between classes 2 and 3 may not be important; 

My suggestion:

Step 0: Given the current architecture, clearly document in the Admin
Guide (<http://www.globus.org/gt2.2/admin/guide-install.html>), in a
step-by-step series of instructions, what needs to be done to share a
Globus installation on a (possibly read-only) NFS-mounted filesystem.
This would include an exhaustive list of which files under $G_L/etc
need to be localized.

Step 1: Move all the shareable files in $GLOBUS_LOCATION/etc into
the $GLOBUS_LOCATION/share directory.  (The name "share" is highly
suggestive, don't you think?)  The $GLOBUS_LOCATION/etc directory would
then contain *only* read-only files that are not needed by clients.

Given this arrangement, I could do a Globus installation, then copy
the etc, var, and possibly tmp directories to a local filesystem on
each server, and replace the original etc, var, and tmp directories
with symlinks (i.e., extend what I do now for $G_L/var to $G_L/etc
and $G_L/tmp).  I would no longer have to keep track of the arbitrary
subset of files under $G_L/etc that need to be localized.

Step 2: Make the Globus installation procedure handle this
automatically.  If I set a new environment variable, $GLOBUS_LOCAL_DIR,
pointing to a directory on a local filesystem, the installation
procedure would automatically create the proper subdirectories as
symlinks.  Now I don't even have to remember that var, etc, and tmp
are the directories I need to set up; the installation procedure would
handle this for me.  (This could be done either in gpt-{build,install}
or in gpt-postinstall.)  If $GLOBUS_LOCAL_DIR is not set, everything
is done the same way it is now.

There are other possible approaches, including the Globus-1.1.X style
"deploy" directory.  I've tried to suggest an approach that's as
close as possible to the current architecture.
------- Comment #1 From 2003-02-24 10:37:57 -------
This guide would be a good idea.  Right now, however, we are responding to
cluster/NFS queries with a recommendation to just install into NFS, have a
single gatekeeper, and use ganglia on the backend.  That particular setup
requires no extra steps.
------- Comment #2 From 2003-02-24 13:19:53 -------
This problem report has nothing to do with clusters.

The situation I'm facing involves multiple client systems and a few server
systems sharing a single Globus installation.
------- Comment #3 From 2003-02-24 13:35:28 -------
Understood.  I was not attempting to address all of the needs in your response
with my cluster response.  A common case of what you're describing is a cluster
install, and there is a good way to do that without modification.  The rest will
have to wait on more documentation.  When that exists, this bug will move from
assigned to fixed.
------- Comment #4 From 2003-02-24 15:26:29 -------
Sorry, but your response really didn't address anything I was asking about.
I understand how to to do installations on clusters.

The current procedure for installing Globus on a shared NFS-mounted filesystem
is poorly documented, confusing, and clunky.  Making it merely confusing
and clunky would be an improvement, but not enough of one to justify closing
this bug report.

In the meantime, I could really use a definitive list of which directories
and files need to be localized for each server system and are not needed by
client systems.  (I'm assuming that client-only systems will not be bothered
if these files are missing.)

So far, I know about the $G_L/var directory and the following files under
$G_L/etc:

    globus-job-manager.conf
    grid-info-resource-ldif.conf
    grid-info-resource-register.conf
    grid-info.conf

Is this a complete list?

In particular, please provide information about the $G_L/tmp directory.
What is it for?  Does it need to be writeable by server systems?  Does
it need to be visible to client systems?
------- Comment #5 From 2003-03-25 18:17:04 -------
I understand that there needs to be a distinct $GLOBUS_LOCATION/tmp directory
for each server system (for the tmp/gram_job_state subdirectory), so add that
to the list.

Also (though this isn't strictly a Globus issue), if you install GSI-OpenSSH,
the $GLOBUS_LOCATION/etc/ssh directory needs to be *partially* localized for
each system running the ssshd.  In particular, there need to be distinct copies
of the key files (6 of them) -- but the ssh_config file needs to be visible on
all systems running the ssh client.  I'm fairly sure that the moduli and
ssh_prng files can be shared among all client and server systems, as can the
ssh_config and sshd_config file unless there's a need for system-specific
customization.

Will somebody please take a look at this and tell me whether there's anything
I haven't thought of?
------- Comment #6 From 2003-03-27 18:21:33 -------
Further investigation shows that the ssh key files may not be a problem.
I've found that I can just create them as symbolic links to the system
ssh key files.

For example, if the system ssh keys are in the /etc/ssh directory, I can
do something like the following:

    cd $GLOBUS_LOCATION/etc/ssh
    rm -f *key*
    ln -s /etc/ssh/*key* .

Even if the $GLOBUS_LOCATION/etc/ssh directory is shared across multiple
systems (via an NFS mount), the symlinks will correctly point to the local
keys on each system.
------- Comment #7 From 2003-05-30 04:22:51 -------
It may be worse than I thought.

I have an NMI 2.1 installation on a shared NFS filesystem, visible
to numerous client machines and, so far, to a single server machine.
The server is named giis (it happens to be a GIIS server, but that's
beside the point for now).  One of the client machines, which I'll 
use as a example, is elmak.

I want to install some Globus services on another system, orion.

I temporarily shut down services on giis and replaced some files and
directories (based on the list in this bug report) with symlinks into
a directory under /var/globus, which is on a local non-NFS filesystem.

On giis, I created /var/globus and copied the existing files to the
appropriate locations.  On orion, I did the same thing, editing the
*.conf files to refer to the correct hostname.

Now the following files and directories are symlinks into /var/globus:

    etc/globus-job-manager.conf
    etc/grid-info-resource-ldif.conf
    etc/grid-info-resource-register.conf
    etc/grid-info.conf
    tmp/
    var/

On giis and orion, all these files and directories exist on local disk,
which is what I need for the services to work properly.

On elmak (and other client machines), I have not created a /var/globus
directory, and the linked files do not exist.

Now I try to do a grid-info-search:

elmak% grid-info-search -h giis.npaci.edu -b 'Mds-Vo-name=npaci, o=Grid' -x -now
rap '(objectClass=MdsHost)' dn
/usr/local/apps/nmi-2.1/bin/grid-info-search: /usr/local/apps/nmi-2.1/etc/grid-i
nfo.conf: not found

Apparently $GLOBUS_LOCATION/etc/grid-info.conf needs to be visible
on each client system, and must be unique for each server system.

In fact, it may need to be unique on each client system.  After I put
everything back the way it was (etc/*.conf on the shared filesystem),
I ran "grid-info-search" with no arguments on elmak; it gave me
information about giis.sdsc.edu.  (I suppose that makes some sense,
since there is no MDS service on elmak.)

Hmm.  Now that I think about this, my guess is that grid-info.conf
is used only by the client.  If my guess is correct, I can leave
grid-info.conf on the shared filesystem (i.e., it shouldn't be on the  
list).  The only drawback of this is that all grid-info-search queries 
default to the information specified in the shared grid-info.conf file,
rather than the local host.
------- Comment #8 From 2003-05-30 22:29:19 -------
I've made a little more progress figuring this stuff out.

It seems that grid-info.conf does not need to be localized.  The main
effect of sharing a single copy among all clients is that the host
option for grid-info-search is going to be the same for all client
systems.  I don't think that's a problem.  (I usually specify a
host anyway).

However, there seem to be two more files that need to be localized,
beyond what we've figured out so far.

Recall that I had originally installed Globus (actually NMI 2.1) on
giis.sdsc.edu (aka giis.npaci.edu), which is the NPACI GIIS server;
I'm now trying to set up some Globus services (primarily gridftp)
on orion.sdsc.edu.

The files grid-info-slapd.conf and grid-info-site-policy.conf both
contain settings that are appropriate only for the GIIS server (I
don't want GRIS's reporting to orion, for example).

So, the current set of things that need to be localized *seems* to be:

    the var directory
    the tmp directory
    etc/globus-job-manager.conf
    etc/grid-info-resource-ldif.conf
    etc/grid-info-resource-register.conf
    etc/grid-info-site-policy.conf
    etc/grid-info-slapd.conf

(but *not* etc/grid-info.conf).

Let me emphasize again that organizing everything so that shared
files are in one subdirectory and server-only files are in another
subdirectory would have saved me a great deal of time.

(My gpt-wizard tool, <http://www.sdsc.edu/~kst/gpt-wizard/>, optionally
handle this stuff automatically.  The current version handles the set
of directories and files that I thought were needed as of a couple
of months ago; I'll update it once my understanding stabilizes.) 
------- Comment #9 From 2003-08-29 16:30:50 -------
tmp/gram_job_state is needed, and is _not_ automatically recreated by the 
jobmanager. If $G_L/tmp is symlinked to /tmp and /tmp is cleaned during boot, 
make sure that /tmp/gram_job_state is added again. 
------- Comment #10 From 2009-03-13 15:01:49 -------
Reassigning to gt-dev@globus.org pending triage of these bugs following my
departure