Bugzilla – Bug 1928
Strange problems: Busy wait and address in use
Last modified: 2004-10-25 09:46:49
You need to log in before you can comment on or make changes to this bug.
I've installed the gridftp server from 3.9.2. It calls itself 0.6, and I compiled it with the install script (unthreaded). I'm getting some strange problems. I was trying to run it as a standalone server, (not through xinetd), so I had a config file which specifies the port as 2811. If I tried to connect to it, it would send an end-of-file, then the server would go into an endless busy loop, eating up all the cpu time. This doesn't happen if I run it from the command line with no port setting. The command I used was this: globus-url-copy gsiftp://grid.host.org:2811/tmp/blah file:///tmp/blah The output from the client was: error: an end-of-file was reached globus_xio: An end of file occurred The logfile says: Wed Sep 15 11:56:34 2004 :: Could not start server: globus_xio: globus_l_xio_tcp_create_listener failed. globus_xio: A system call failed: Address already in use (There's only this one server running on this port. Nothing else has it.) The stacktrace going into the busy loop looks like this: select(9, [3 8], [], NULL, NULL) = 1 (in [8]) accept(8, 0, NULL) = 9 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 gettimeofday({1095213394, 761391}, NULL) = 0 fcntl64(9, F_SETFD, FD_CLOEXEC) = 0 rt_sigaction(SIGCHLD, {0xba787a, [], 0}, {SIG_DFL}, 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf6ffc0c8) = 6910 close(9) = 0 fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) fcntl64(8, F_SETFL, O_RDWR|O_NONBLOCK) = 0 gettimeofday({1095213394, 770381}, NULL) = 0 gettimeofday({1095213394, 770593}, NULL) = 0 select(9, [3 8], [], NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- write(4, "\0", 1) = 1 sigreturn() = ? (mask now []) gettimeofday({1095213394, 874179}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 1 (in [3], left {0, 0}) read(3, "\0", 64) = 1 gettimeofday({1095213394, 875942}, NULL) = 0 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG) = 6910 gettimeofday({1095213394, 876491}, NULL) = 0 gettimeofday({1095213394, 876691}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 877538}, NULL) = 0 gettimeofday({1095213394, 877736}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 878594}, NULL) = 0 gettimeofday({1095213394, 878794}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 879693}, NULL) = 0 gettimeofday({1095213394, 879901}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 881179}, NULL) = 0 gettimeofday({1095213394, 881399}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 882245}, NULL) = 0 gettimeofday({1095213394, 882446}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 883298}, NULL) = 0 gettimeofday({1095213394, 883493}, NULL) = 0 select(9, [3 8], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1095213394, 884349}, NULL) = 0 And so forth, endlessly. Another problem is mentioned above: If I specify a port in a config file, then on every connection attempt, the server logs: Wed Sep 15 11:56:34 2004 :: Could not start server: globus_xio: globus_l_xio_tcp_create_listener failed. globus_xio: A system call failed: Address already in use This doesn't happen if I start it up and let it choose it's own random port. In that case I get a gsi error: error: an authorization operation failed globus_gsi_gssapi: Authorization denied: The name of the remote entity (/C=AU/O=ORGFOO/OU=BARUNIT/CN=host/grid.host.org), and the expected name for the remote entity (/C=AU/O=ORGFOO/CN=Damon Smith) do not match I also don't understand why globus-url-copy would want _my_ DN as the name of the remote entity, but that may be an unrelated issue.
Boy this is one _unstable_ piece of code: I reinstalled from the latest gt3.9.2-wsrf-source-installer, and the gridftp server now works, but has the same "cpuburn" problem. Test 1: If I run the server as root in daemon mode (-s here, but also with -S): /opt/globus4/sbin/globus-gridftp-server -p 2811 -d 3 -s And as a user: globus-url-copy file:///home/damon/bin/textfile gsiftp://localhost/home/damon/testfile33 It transfers the file ok, then the server goes into the previously mentioned loop. (this command works fine and does nothing wierd when I substitute localhost for another box running gt3.2.1) After this, I can transfer another file ok, but it takes a long time, because the cpu is being completely used up by the parent process. After that, the parent continues to take almost all cpu time as before.
I haven't witnessed this behavior. What platform are you running this on? I've created an updated package that can be installed with this script: http://www-unix.mcs.anl.gov/~mlink/gridftp_server-09-03-build.sh which fixes most of the known problems with the 3.9.2 released (alpha) server. Please let me know if you run into the same problems, though I'm not sure that much of the code relating to the connect/accept process has changed. Mike
This should no longer be an issue. Please reopen this bug if you experience similar problems with the upcoming 3.9.3 release. Mike