Bug 5938 - sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL on Solaris 10
: sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL on Solaris 10
Status: RESOLVED FIXED
: Toolkit Internals
globus_common
: unspecified
: Sun Solaris
: P3 blocker
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-03-21 11:06 by
Modified: 2008-03-21 15:55 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-03-21 11:06:27
The GT 4.0.6 binary bundle for Solaris 9 was installed on a Solaris 10 box. The
globus-rls-server process and other processes that do socket IO get into tight
spin loops and consume CPU even when nothing should be happening (no socket
IO).

By running truss on any of these processes one sees a continuous stream of
these messages:

/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL
/2:     sigtimedwait(0xFEA7BE80, 0xFEA7BE00, 0x00000000) Err#22 EINVAL

We also compiled GT 4.0.6 from source on Solaris 10 using 

[grid@ldas-cit ~]$ gcc --version
gcc (GCC) 4.1.1

and saw the same problem.

I will try to send some stack traces.
------- Comment #1 From 2008-03-21 14:41:43 -------
It looks like solaris doesn't like to sigwait() on an empty set of signals,
which is the case when the app doesn't use the signal library.  I need to look
into this more, but it looks like simply adding an innocuous signal to the
default set will workaround the problem, and shouldn't have any ill effects.

http://www-unix.mcs.anl.gov/~mlink/bugs/5938_prelim_workaround.patch
http://www-unix.mcs.anl.gov/~mlink/bugs/globus_common-7.29.tar.gz
(use -force with gpt-build, I didn't update the version number)
------- Comment #2 From 2008-03-21 15:55:07 -------
Looks like this worked before 4.0.6, but the fix for 5481 restricted the code
that fixes it to AIX.  I've updated it to all non-linux arches and committed a
fix to HEAD and globus_4_0_branch.

update package for 4.0.x:
http://www-unix.mcs.anl.gov/~mlink/bugs/globus_common-7.30.tar.gz