Bugzilla – Bug 2194
globus-libc-hostname hangs
Last modified: 2004-11-04 12:54:00
You need to log in before you can comment on or make changes to this bug.
When I run the threaded flavor of globus-libc-hostname it hangs This is a show stopper because it means gpt-postinstall hangs, so I can't finish my installation. Here is a stack trace of all of the threads (gdb) thread apply all where Thread 3 (process 686 thread 0xd03): #0 0x90016f48 in semaphore_wait_signal_trap () #1 0x9000e790 in _pthread_cond_wait () #2 0x000cfc20 in globus_cond_wait (cv=0xe11b0, mut=0xe1184) at globus_thread_pthreads.c:928 #3 0x000d0730 in globus_l_thread_pool_thread_start (user_arg=0x400d00) at globus_thread_pool.c:247 #4 0x000cee58 in thread_starter (temparg=0x1800220) at globus_thread_pthreads.c:497 #5 0x900246e8 in _pthread_body () Thread 2 (process 686 thread 0xb07): #0 0x90016f48 in semaphore_wait_signal_trap () #1 0x9000e790 in _pthread_cond_wait () #2 0x000cfc20 in globus_cond_wait (cv=0xe0090, mut=0xe0064) at globus_thread_pthreads.c:928 #3 0x000af638 in globus_l_callback_thread_signal_poll (user_arg=0x0) at globus_callback_threads.c:2766 #4 0x000cee58 in thread_starter (temparg=0x1800210) at globus_thread_pthreads.c:497 #5 0x900246e8 in _pthread_body () Thread 1 (process 686 thread 0x703): #0 0x90016f48 in semaphore_wait_signal_trap () #1 0x9000e790 in _pthread_cond_wait () #2 0x000cfc20 in globus_cond_wait (cv=0xe0090, mut=0xe0064) at globus_thread_pthreads.c:928 #3 0x000ab448 in globus_l_callback_deactivate () at globus_callback_threads.c:532 #4 0x000c119c in globus_module_deactivate (module_descriptor=0xdf020) at globus_module.c:300 #5 0x000b0bb4 in globus_l_common_deactivate () at globus_common.c:148 #6 0x000c119c in globus_module_deactivate (module_descriptor=0xdf084) at globus_module.c:300 #7 0x00002d60 in main () at globus_libc_hostname.c:29
Looks like I can reproduce this bug on a Linux system also. For some reason, I'm not having the gpt-postinstall problem, but that might be because sbin/globus-hostname calls the non-threaded version. I guess it has to do with the order the installer script builds the bundles. System: Debian Sarge x86 Starting program: /nfs/software/gt4-testgrid/install/libexec/gcc32dbgpthr/shared/globus-libc-hostname [Thread debugging using libthread_db enabled] [New Thread 1075499776 (LWP 5239)] [New Thread 1083890608 (LWP 5242)] [New Thread 1092279216 (LWP 5243)] ned-0.isi.edu Program received signal SIG32, Real-time event 32. [Switching to Thread 1083890608 (LWP 5242)] 0x40067115 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 (gdb) thread apply all where Thread 3 (Thread 1092279216 (LWP 5243)): #0 0x40067115 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #1 0x40042828 in globus_cond_wait (cv=0x4004e83c, mut=0x4004e824) at globus_thread_pthreads.c:928 #2 0x4002b3ec in globus_l_callback_thread_poll (user_arg=0x4004e800) at globus_callback_threads.c:2373 #3 0x40042f28 in globus_l_thread_pool_thread_start (user_arg=0x804bd38) at globus_thread_pool.c:206 #4 0x40041ec3 in thread_starter (temparg=0x804a198) at globus_thread_pthreads.c:497 #5 0x400649b4 in start_thread () from /lib/tls/libpthread.so.0 #6 0x00000000 in ?? () Thread 2 (Thread 1083890608 (LWP 5242)): #0 0x40067115 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #1 0x40042828 in globus_cond_wait (cv=0x4004e8c0, mut=0x4004e88c) at globus_thread_pthreads.c:928 #2 0x4002bcab in globus_l_callback_thread_signal_poll (user_arg=0x0) at globus_callback_threads.c:2766 #3 0x40041ec3 in thread_starter (temparg=0x804a188) at globus_thread_pthreads.c:497 #4 0x400649b4 in start_thread () from /lib/tls/libpthread.so.0 #5 0x00000000 in ?? () Thread 1 (Thread 1075499776 (LWP 5239)): #0 0x40067115 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #1 0x40042828 in globus_cond_wait (cv=0x4004e8c0, mut=0x4004e88c) at globus_thread_pthreads.c:928 #2 0x4002887d in globus_l_callback_deactivate () at globus_callback_threads.c:532 #3 0x40038814 in globus_module_deactivate (module_descriptor=0x4004d650) at globus_module.c:300 #4 0x4002ce5d in globus_l_common_deactivate () at globus_common.c:148 #5 0x40038814 in globus_module_deactivate (module_descriptor=0x80498d8) at globus_module.c:300 #6 0x08048684 in main () at globus_libc_hostname.c:29 #0 0x40067115 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
The linux system is due to buggy libcs with the new NPTL threading library and can be 'fixed' by setting LD_ASSUME_KERNEL=2.2.5. This forces linux to use the older LinuxThreads. Since it appears I am making use of a poorly implemented feature, I am going to try and find some work arounds. Joe
Some workarounds for mac osx's lack of posix cancelation points. Some of these workarounds are likely to help with the older, buggy linux thread libraries also. ftp://ftp.globus.org/pub/gt3/3.9/3.9.3/contrib/globus_common-7.4.tar.gz Let me know if you have any problems. Joe
ok. workaround works and fix is in cvs. will make it in next release. Joe