Bug 5938

Summary: sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL on Solaris 10
Product: Toolkit Internals Reporter: Scott Koranda <skoranda@gravity.phys.uwm.edu>
Component: globus_commonAssignee: Mike Link <mlink@mcs.anl.gov>
Status: RESOLVED FIXED    
Severity: blocker CC: anderson@ligo.caltech.edu, bresnaha@mcs.anl.gov, meder@mcs.anl.gov
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   

Description From 2008-03-21 11:06:27
The GT 4.0.6 binary bundle for Solaris 9 was installed on a Solaris 10 box. The
globus-rls-server process and other processes that do socket IO get into tight
spin loops and consume CPU even when nothing should be happening (no socket
IO).

By running truss on any of these processes one sees a continuous stream of
these messages:

/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL

We also compiled GT 4.0.6 from source on Solaris 10 using 

[grid@ldas-cit ~]$ gcc --version
gcc (GCC) 4.1.1

and saw the same problem.

I will try to send some stack traces.
------- Comment #1 From 2008-03-21 14:41:43 -------
It looks like solaris doesn't like to sigwait() on an empty set of signals,
which is the case when the app doesn't use the signal library.  I need to look
into this more, but it looks like simply adding an innocuous signal to the
default set will workaround the problem, and shouldn't have any ill effects.

http://www-unix.mcs.anl.gov/~mlink/bugs/5938_prelim_workaround.patch
http://www-unix.mcs.anl.gov/~mlink/bugs/globus_common-7.29.tar.gz
(use -force with gpt-build, I didn't update the version number)
------- Comment #2 From 2008-03-21 15:55:07 -------
Looks like this worked before 4.0.6, but the fix for 5481 restricted the code
that fixes it to AIX.  I've updated it to all non-linux arches and committed a
fix to HEAD and globus_4_0_branch.

update package for 4.0.x:
http://www-unix.mcs.anl.gov/~mlink/bugs/globus_common-7.30.tar.gz