Bug 1928 - Strange problems: Busy wait and address in use
: Strange problems: Busy wait and address in use
Status: RESOLVED FIXED
: GridFTP
GridFTP
: 3.9.2
: PC Linux
: P3 normal
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2004-09-14 21:18 by
Modified: 2004-10-25 09:46 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2004-09-14 21:18:40
I've installed the gridftp server from 3.9.2.  It calls itself 0.6, and I
compiled it with the install script (unthreaded).

I'm getting some strange problems.
I was trying to run it as a standalone server, (not through xinetd), so I had a
config file which specifies the port as 2811.  If I tried to connect to it, it
would send an end-of-file, then the server would go into an endless busy loop,
eating up all the cpu time.  This doesn't happen if I run it from the command
line with no port setting.

The command I used was this:
globus-url-copy gsiftp://grid.host.org:2811/tmp/blah file:///tmp/blah

The output from the client was:
error: an end-of-file was reached
globus_xio: An end of file occurred

The logfile says:
Wed Sep 15 11:56:34 2004 :: Could not start server:
globus_xio: globus_l_xio_tcp_create_listener failed.
globus_xio: A system call failed: Address already in use

(There's only this one server running on this port.  Nothing else has it.)

The stacktrace going into the busy loop looks like this:

select(9, [3 8], [], NULL, NULL)        = 1 (in [8])
accept(8, 0, NULL)                      = 9
fcntl64(9, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1095213394, 761391}, NULL) = 0
fcntl64(9, F_SETFD, FD_CLOEXEC)         = 0
rt_sigaction(SIGCHLD, {0xba787a, [], 0}, {SIG_DFL}, 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xf6ffc0c8) = 6910
close(9)                                = 0
fcntl64(8, F_GETFL)                     = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(8, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1095213394, 770381}, NULL) = 0
gettimeofday({1095213394, 770593}, NULL) = 0
select(9, [3 8], [], NULL, NULL)        = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
write(4, "\0", 1)                       = 1
sigreturn()                             = ? (mask now [])
gettimeofday({1095213394, 874179}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 1 (in [3], left {0, 0})
read(3, "\0", 64)                       = 1
gettimeofday({1095213394, 875942}, NULL) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG) = 6910
gettimeofday({1095213394, 876491}, NULL) = 0
gettimeofday({1095213394, 876691}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 877538}, NULL) = 0
gettimeofday({1095213394, 877736}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 878594}, NULL) = 0
gettimeofday({1095213394, 878794}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 879693}, NULL) = 0
gettimeofday({1095213394, 879901}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 881179}, NULL) = 0
gettimeofday({1095213394, 881399}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 882245}, NULL) = 0
gettimeofday({1095213394, 882446}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 883298}, NULL) = 0
gettimeofday({1095213394, 883493}, NULL) = 0
select(9, [3 8], [], NULL, {0, 0})      = 0 (Timeout)
gettimeofday({1095213394, 884349}, NULL) = 0

And so forth, endlessly.

Another problem is mentioned above:
If I specify a port in a config file, then on every connection attempt, the
server logs:
Wed Sep 15 11:56:34 2004 :: Could not start server:
globus_xio: globus_l_xio_tcp_create_listener failed.
globus_xio: A system call failed: Address already in use

This doesn't happen if I start it up and let it choose it's own random port.  In
that case I get a gsi error:
error: an authorization operation failed
globus_gsi_gssapi: Authorization denied: The name of the remote entity
(/C=AU/O=ORGFOO/OU=BARUNIT/CN=host/grid.host.org), and the expected name for the
remote entity (/C=AU/O=ORGFOO/CN=Damon Smith) do not match

I also don't understand why globus-url-copy would want _my_ DN as the name of
the remote entity, but that may be an unrelated issue.
------- Comment #1 From 2004-09-22 00:15:27 -------
Boy this is one _unstable_ piece of code:

I reinstalled from the latest gt3.9.2-wsrf-source-installer, and the gridftp
server now works, but has the same "cpuburn" problem.

Test 1:

If I run the server as root in daemon mode (-s here, but also with -S):
/opt/globus4/sbin/globus-gridftp-server -p 2811 -d 3 -s
And as a user:
globus-url-copy file:///home/damon/bin/textfile
gsiftp://localhost/home/damon/testfile33

It transfers the file ok, then the server goes into the previously mentioned loop.
(this command works fine and does nothing wierd when I substitute localhost for
another box running gt3.2.1)

After this, I can transfer another file ok, but it takes a long time, because
the cpu is being completely used up by the parent process.  After that, the
parent continues to take almost all cpu time as before.
------- Comment #2 From 2004-09-22 10:24:55 -------
I haven't witnessed this behavior.  What platform are you running this on?  
I've created an updated package that can be installed with this script:
http://www-unix.mcs.anl.gov/~mlink/gridftp_server-09-03-build.sh which fixes
most of the known problems with the 3.9.2 released (alpha) server.  Please let
me know if you run into the same problems, though I'm not sure that much of the
code relating to the connect/accept process has changed.

Mike
------- Comment #3 From 2004-10-25 09:46:49 -------
This should no longer be an issue.  Please reopen this bug if you experience
similar problems with the upcoming 3.9.3 release.

Mike