linuxthreads/FAQ.html

   1 <HTML>
   2 <HEAD>
   3 <TITLE>LinuxThreads Frequently Asked Questions</TITLE>
   4 </HEAD>
   5 <BODY>
   6 <H1 ALIGN=center>LinuxThreads Frequently Asked Questions <BR>
   7                  (with answers)</H1>
   8 <H2 ALIGN=center>[For LinuxThreads version 0.8]</H2>
   9
  10 <HR><P>
  11
  12 <A HREF="#A">A. The big picture</A><BR>
  13 <A HREF="#B">B. Getting more information</A><BR>
  14 <A HREF="#C">C. Issues related to the C library</A><BR>
  15 <A HREF="#D">D. Problems, weird behaviors, potential bugs</A><BR>
  16 <A HREF="#E">E. Missing functions, wrong types, etc</A><BR>
  17 <A HREF="#F">F. C++ issues</A><BR>
  18 <A HREF="#G">G. Debugging LinuxThreads programs</A><BR>
  19 <A HREF="#H">H. Compiling multithreaded code; errno madness</A><BR>
  20 <A HREF="#I">I. X-Windows and other libraries</A><BR>
  21 <A HREF="#J">J. Signals and threads</A><BR>
  22 <A HREF="#K">K. Internals of LinuxThreads</A><P>
  23
  24 <HR>
  25 <P>
  26
  27 <H2><A NAME="A">A. The big picture</A></H2>
  28
  29 <H4><A NAME="A.1">A.1: What is LinuxThreads?</A></H4>
  30
  31 LinuxThreads is a Linux library for multi-threaded programming.
  32 It implements the Posix 1003.1c API (Application Programming
  33 Interface) for threads.  It runs on any Linux system with kernel 2.0.0
  34 or more recent, and a suitable C library (see section <A HREF="C">C</A>).
  35 <P>
  36
  37 <H4><A NAME="A.2">A.2: What are threads?</A></H4>
  38
  39 A thread is a sequential flow of control through a program.
  40 Multi-threaded programming is, thus, a form of parallel programming
  41 where several threads of control are executing concurrently in the
  42 program.  All threads execute in the same memory space, and can
  43 therefore work concurrently on shared data.<P>
  44
  45 Multi-threaded programming differs from Unix-style multi-processing in
  46 that all threads share the same memory space (and a few other system
  47 resources, such as file descriptors), instead of running in their own
  48 memory space as is the case with Unix processes.<P>
  49
  50 Threads are useful for two reasons.  First, they allow a program to
  51 exploit multi-processor machines: the threads can run in parallel on
  52 several processors, allowing a single program to divide its work
  53 between several processors, thus running faster than a single-threaded
  54 program, which runs on only one processor at a time.  Second, some
  55 programs are best expressed as several threads of control that
  56 communicate together, rather than as one big monolithic sequential
  57 program.  Examples include server programs, overlapping asynchronous
  58 I/O, and graphical user interfaces.<P>
  59
  60 <H4><A NAME="A.3">A.3: What is POSIX 1003.1c?</A></H4>
  61
  62 It's an API for multi-threaded programming standardized by IEEE as
  63 part of the POSIX standards.  Most Unix vendors have endorsed the
  64 POSIX 1003.1c standard.  Implementations of the 1003.1c API are
  65 already available under Sun Solaris 2.5, Digital Unix 4.0,
  66 Silicon Graphics IRIX 6, and should soon be available from other
  67 vendors such as IBM and HP.  More generally, the 1003.1c API is
  68 replacing relatively quickly the proprietary threads library that were
  69 developed previously under Unix, such as Mach cthreads, Solaris
  70 threads, and IRIX sprocs.  Thus, multithreaded programs using the
  71 1003.1c API are likely to run unchanged on a wide variety of Unix
  72 platforms.<P>
  73
  74 <H4><A NAME="A.4">A.4: What is the status of LinuxThreads?</A></H4>
  75
  76 LinuxThreads implements almost all of Posix 1003.1c, as well as a few
  77 extensions.  The only part of LinuxThreads that does not conform yet
  78 to Posix is signal handling (see section <A HREF="#J">J</A>).  Apart
  79 from the signal stuff, all the Posix 1003.1c base functionality,
  80 as well as a number of optional extensions, are provided and conform
  81 to the standard (to the best of my knowledge).
  82 The signal stuff is hard to get right, at least without special kernel
  83 support, and while I'm definitely looking at ways to implement the
  84 Posix behavior for signals, this might take a long time before it's
  85 completed.<P>
  86
  87 <H4><A NAME="A.5">A.5: How stable is LinuxThreads?</A></H4>
  88
  89 The basic functionality (thread creation and termination, mutexes,
  90 conditions, semaphores) is very stable.  Several industrial-strength
  91 programs, such as the AOL multithreaded Web server, use LinuxThreads
  92 and seem quite happy about it.  There used to be some rough edges in
  93 the LinuxThreads / C library interface with libc 5, but glibc 2
  94 fixes all of those problems and is now the standard C library on major
  95 Linux distributions (see section <A HREF="#C">C</A>). <P>
  96
  97 <HR>
  98 <P>
  99
 100 <H2><A NAME="B">B.  Getting more information</A></H2>
 101
 102 <H4><A NAME="B.1">B.1: What are good books and other sources of
 103 information on POSIX threads?</A></H4>
 104
 105 The FAQ for comp.programming.threads lists several books:
 106 <A HREF="http://www.serpentine.com/~bos/threads-faq/">http://www.serpentine.com/~bos/threads-faq/</A>.<P>
 107
 108 There are also some online tutorials. Follow the links from the
 109 LinuxThreads web page:
 110 <A HREF="http://pauillac.inria.fr/~xleroy/linuxthreads">http://pauillac.inria.fr/~xleroy/linuxthreads</A>.<P>
 111
 112 <H4><A NAME="B.2">B.2: I'd like to be informed of future developments on
 113 LinuxThreads. Is there a mailing list for this purpose?</A></H4>
 114
 115 I post LinuxThreads-related announcements on the newsgroup
 116 <A HREF="news:comp.os.linux.announce">comp.os.linux.announce</A>,
 117 and also on the mailing list
 118 <code>linux-threads@magenet.com</code>.
 119 You can subscribe to the latter by writing
 120 <A HREF="mailto:majordomo@magenet.com">majordomo@magenet.com</A>.<P>
 121
 122 <H4><A NAME="B.3">B.3: What are good places for discussing
 123 LinuxThreads?</A></H4>
 124
 125 For questions about programming with POSIX threads in general, use
 126 the newsgroup
 127 <A HREF="news:comp.programming.threads">comp.programming.threads</A>.
 128 Be sure you read the
 129 <A HREF="http://www.serpentine.com/~bos/threads-faq/">FAQ</A>
 130 for this group before you post.<P>
 131
 132 For Linux-specific questions, use
 133 <A
 134 HREF="news:comp.os.linux.development.apps">comp.os.linux.development.apps</A>
 135 and <A
 136 HREF="news:comp.os.linux.development.kernel">comp.os.linux.development.kernel</A>.
 137 The latter is especially appropriate for questions relative to the
 138 interface between the kernel and LinuxThreads.<P>
 139
 140 <H4><A NAME="B.4">B.4: How should I report a possible bug in
 141 LinuxThreads?</A></H4>
 142
 143 If you're using glibc 2, the best way by far is to use the
 144 <code>glibcbug</code> script to mail a bug report to the glibc
 145 maintainers. <P>
 146
 147 If you're using an older libc, or don't have the <code>glibcbug</code>
 148 script on your machine, then e-mail me directly
 149 (<code>Xavier.Leroy@inria.fr</code>).  <P>
 150
 151 In both cases, before sending the bug report, make sure that it is not
 152 addressed already in this FAQ.  Also, try to send a short program that
 153 reproduces the weird behavior you observed. <P>
 154
 155 <H4><A NAME="B.5">B.5: I'd like to read the POSIX 1003.1c standard. Is
 156 it available online?</A></H4>
 157
 158 Unfortunately, no.  POSIX standards are copyrighted by IEEE, and
 159 IEEE does not distribute them freely.  You can buy paper copies from
 160 IEEE, but the price is fairly high ($120 or so). If you disagree with
 161 this policy and you're an IEEE member, be sure to let them know.<P>
 162
 163 On the other hand, you probably don't want to read the standard.  It's
 164 very hard to read, written in standard-ese, and targeted to
 165 implementors who already know threads inside-out.  A good book on
 166 POSIX threads provides the same information in a much more readable form.
 167 I can personally recommend Dave Butenhof's book, <CITE>Programming
 168 with POSIX threads</CITE> (Addison-Wesley). Butenhof was part of the
 169 POSIX committee and also designed the Digital Unix implementations of
 170 POSIX threads, and it shows.<P>
 171
 172 Another good source of information is the X/Open Group Single Unix
 173 specification which is available both
 174 <A HREF="http://www.rdg.opengroup.org/onlinepubs/7908799/index.html">on-line</A>
 175 and as a
 176 <A HREF="http://www.UNIX-systems.org/gosolo2/">book and CD/ROM</A>.
 177 That specification includes pretty much all the POSIX standards,
 178 including 1003.1c, with some extensions and clarifications.<P>
 179
 180 <HR>
 181 <P>
 182
 183 <H2><A NAME="C">C.  Issues related to the C library</A></H2>
 184
 185 <H4><A NAME="C.1">C.1: Which version of the C library should I use
 186 with LinuxThreads?</A></H4>
 187
 188 The best choice by far is glibc 2, a.k.a. libc 6.  It offers very good
 189 support for multi-threading, and LinuxThreads has been closely
 190 integrated with glibc 2.  The glibc 2 distribution contains the
 191 sources of a specially adapted version of LinuxThreads.<P>
 192
 193 glibc 2 comes preinstalled as the default C library on several Linux
 194 distributions, such as RedHat 5 and up, and Debian 2.
 195 Those distributions include the version of LinuxThreads matching
 196 glibc 2.<P>
 197
 198 <H4><A NAME="C.2">C.2: My system has libc 5 preinstalled, not glibc
 199 2.  Can I still use LinuxThreads?</H4>
 200
 201 Yes, but you're likely to run into some problems, as libc 5 only
 202 offers minimal support for threads and contains some bugs that affect
 203 multithreaded programs. <P>
 204
 205 The versions of libc 5 that work best with LinuxThreads are
 206 libc 5.2.18 on the one hand, and libc 5.4.12 or later on the other hand.
 207 Avoid 5.3.12 and 5.4.7: these have problems with the per-thread errno
 208 variable. <P>
 209
 210 <H4><A NAME="C.3">C.3: So, should I switch to glibc 2, or stay with a
 211 recent libc 5?</A></H4>
 212
 213 I'd recommend you switch to glibc 2.  Even for single-threaded
 214 programs, glibc 2 is more solid and more standard-conformant than libc
 215 5.  And the shortcomings of libc 5 almost preclude any serious
 216 multi-threaded programming.<P>
 217
 218 Switching an already installed
 219 system from libc 5 to glibc 2 is not completely straightforward.
 220 See the <A HREF="http://sunsite.unc.edu/LDP/HOWTO/Glibc2-HOWTO.html">Glibc2
 221 HOWTO</A> for more information.  Much easier is (re-)installing a
 222 Linux distribution based on glibc 2, such as RedHat 6.<P>
 223
 224 <H4><A NAME="C.4">C.4: Where can I find glibc 2 and the version of
 225 LinuxThreads that goes with it?</A></H4>
 226
 227 On <code>prep.ai.mit.edu</code> and its many, many mirrors around the world.
 228 See <A
 229 HREF="http://www.gnu.org/order/ftp.html">http://www.gnu.org/order/ftp.html</A>
 230 for a list of mirrors.<P>
 231
 232 <H4><A NAME="C.5">C.5: Where can I find libc 5 and the version of
 233 LinuxThreads that goes with it?</A></H4>
 234
 235 For libc 5, see <A HREF="ftp://sunsite.unc.edu/pub/Linux/devel/GCC/"><code>ftp://sunsite.unc.edu/pub/Linux/devel/GCC/</code></A>.<P>
 236
 237 For the libc 5 version of LinuxThreads, see
 238 <A HREF="ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/">ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/</A>.<P>
 239
 240 <H4><A NAME="C.6">C.6: How can I recompile the glibc 2 version of the
 241 LinuxThreads sources?</A></H4>
 242
 243 You must transfer the whole glibc sources, then drop the LinuxThreads
 244 sources in the <code>linuxthreads/</code> subdirectory, then recompile
 245 glibc as a whole.  There are now too many inter-dependencies between
 246 LinuxThreads and glibc 2 to allow separate re-compilation of LinuxThreads.
 247 <P>
 248
 249 <H4><A NAME="C.7">C.7: What is the correspondence between LinuxThreads
 250 version numbers, libc version numbers, and RedHat version
 251 numbers?</A></H4>
 252
 253 Here is a summary. (Information on Linux distributions other than
 254 RedHat are welcome.)<P>
 255
 256 <TABLE>
 257 <TR><TD>LinuxThreads </TD> <TD>C library</TD> <TD>RedHat</TD></TR>
 258 <TR><TD>0.7, 0.71 (for libc 5)</TD> <TD>libc 5.x</TD> <TD>RH 4.2</TD></TR>
 259 <TR><TD>0.7, 0.71 (for glibc 2)</TD> <TD>glibc 2.0.x</TD> <TD>RH 5.x</TD></TR>
 260 <TR><TD>0.8</TD> <TD>glibc 2.1.1</TD> <TD>RH 6.0</TD></TR>
 261 <TR><TD>0.8</TD> <TD>glibc 2.1.2</TD> <TD>not yet released</TD></TR>
 262 </TABLE>
 263 <P>
 264
 265 <HR>
 266 <P>
 267
 268 <H2><A NAME="D">D. Problems, weird behaviors, potential bugs</A></H2>
 269
 270 <H4><A NAME="D.1">D.1: When I compile LinuxThreads, I run into problems in
 271 file <code>libc_r/dirent.c</code></A></H4>
 272
 273 You probably mean:
 274 <PRE>
 275         libc_r/dirent.c:94: structure has no member named `dd_lock'
 276 </PRE>
 277 I haven't actually seen this problem, but several users reported it.
 278 My understanding is that something is wrong in the include files of
 279 your Linux installation (<code>/usr/include/*</code>). Make sure
 280 you're using a supported version of the libc 5 library. (See question <A
 281 HREF="#C.2">C.2</A>).<P>
 282
 283 <H4><A NAME="D.2">D.2: When I compile LinuxThreads, I run into problems with
 284 <CODE>/usr/include/sched.h</CODE>: there are several occurrences of
 285 <CODE>_p</CODE> that the C compiler does not understand</A></H4>
 286
 287 Yes, <CODE>/usr/include/sched.h</CODE> that comes with libc 5.3.12 is broken.
 288 Replace it with the <code>sched.h</code> file contained in the
 289 LinuxThreads distribution.  But really you should not be using libc
 290 5.3.12 with LinuxThreads! (See question <A HREF="#C.2">C.1</A>.)<P>
 291
 292 <H4><A NAME="D.3">D.3: My program does <CODE>fdopen()</CODE> on a file
 293 descriptor opened on a pipe.  When I link it with LinuxThreads,
 294 <CODE>fdopen()</CODE> always returns NULL!</A></H4>
 295
 296 You're using one of the buggy versions of libc (5.3.12, 5.4.7., etc).
 297 See question <A HREF="#C.1">C.1</A> above.<P>
 298
 299 <H4><A NAME="D.4">D.4: My program creates a lot of threads, and after
 300 a while <CODE>pthread_create()</CODE> no longer returns!</A></H4>
 301
 302 This is known bug in the version of LinuxThreads that comes with glibc
 303 2.1.1.  An upgrade to 2.1.2 is recommended. <P>
 304
 305 <H4><A NAME="D.5">D.5: When I'm running a program that creates N
 306 threads, <code>top</code> or <code>ps</code>
 307 display N+2 processes that are running my program. What do all these
 308 processes correspond to?</A></H4>
 309
 310 Due to the general "one process per thread" model, there's one process
 311 for the initial thread and N processes for the threads it created
 312 using <CODE>pthread_create</CODE>.  That leaves one process
 313 unaccounted for.  That extra process corresponds to the "thread
 314 manager" thread, a thread created internally by LinuxThreads to handle
 315 thread creation and thread termination.  This extra thread is asleep
 316 most of the time.
 317
 318 <H4><A NAME="D.6">D.6: Scheduling seems to be very unfair when there
 319 is strong contention on a mutex: instead of giving the mutex to each
 320 thread in turn, it seems that it's almost always the same thread that
 321 gets the mutex. Isn't this completely broken behavior?</A></H4>
 322
 323 That behavior has mostly disappeared in recent releases of
 324 LinuxThreads (version 0.8 and up).  It was fairly common in older
 325 releases, though.
 326
 327 What happens in LinuxThreads 0.7 and before is the following: when a
 328 thread unlocks a mutex, all other threads that were waiting on the
 329 mutex are sent a signal which makes them runnable.  However, the
 330 kernel scheduler may or may not restart them immediately.  If the
 331 thread that unlocked the mutex tries to lock it again immediately
 332 afterwards, it is likely that it will succeed, because the threads
 333 haven't yet restarted.  This results in an apparently very unfair
 334 behavior, when the same thread repeatedly locks and unlocks the mutex,
 335 while other threads can't lock the mutex.<P>
 336
 337 In LinuxThreads 0.8 and up, <code>pthread_unlock</code> restarts only
 338 one waiting thread, and pre-assign the mutex to that thread.  Hence,
 339 if the thread that unlocked the mutex tries to lock it again
 340 immediately, it will block until other waiting threads have had a
 341 chance to lock and unlock the mutex.  This results in much fairer
 342 scheduling.<P>
 343
 344 Notice however that even the old "unfair" behavior is perfectly
 345 acceptable with respect to the POSIX standard: for the default
 346 scheduling policy, POSIX makes no guarantees of fairness, such as "the
 347 thread waiting for the mutex for the longest time always acquires it
 348 first".  Properly written multithreaded code avoids that kind of heavy
 349 contention on mutexes, and does not run into fairness problems.  If
 350 you need scheduling guarantees, you should consider using the
 351 real-time scheduling policies <code>SCHED_RR</code> and
 352 <code>SCHED_FIFO</code>, which have precisely defined scheduling
 353 behaviors. <P>
 354
 355 <H4><A NAME="D.7">D.7: I have a simple test program with two threads
 356 that do nothing but <CODE>printf()</CODE> in tight loops, and from the
 357 printout it seems that only one thread is running, the other doesn't
 358 print anything!</A></H4>
 359
 360 Again, this behavior is characteristic of old releases of LinuxThreads
 361 (0.7 and before); more recent versions (0.8 and up) should not exhibit
 362 this behavior.<P>
 363
 364 The reason for this behavior is explained in
 365 question <A HREF="#D.6">D.6</A> above: <CODE>printf()</CODE> performs
 366 locking on <CODE>stdout</CODE>, and thus your two threads contend very
 367 heavily for the mutex associated with <CODE>stdout</CODE>.  But if you
 368 do some real work between two calls to <CODE>printf()</CODE>, you'll
 369 see that scheduling becomes much smoother.<P>
 370
 371 <H4><A NAME="D.8">D.8: I've looked at <code>&lt;pthread.h&gt;</code>
 372 and there seems to be a gross error in the <code>pthread_cleanup_push</code>
 373 macro: it opens a block with <code>{</code> but does not close it!
 374 Surely you forgot a <code>}</code> at the end of the macro, right?
 375 </A></H4>
 376
 377 Nope.  That's the way it should be.  The closing brace is provided by
 378 the <code>pthread_cleanup_pop</code> macro.  The POSIX standard
 379 requires <code>pthread_cleanup_push</code> and
 380 <code>pthread_cleanup_pop</code> to be used in matching pairs, at the
 381 same level of brace nesting.  This allows
 382 <code>pthread_cleanup_push</code> to open a block in order to
 383 stack-allocate some data structure, and
 384 <code>pthread_cleanup_pop</code> to close that block.  It's ugly, but
 385 it's the standard way of implementing cleanup handlers.<P>
 386
 387 <H4><A NAME="D.9">D.9: I tried to use real-time threads and my program
 388 loops like crazy and freezes the whole machine!</A></H4>
 389
 390 Versions of LinuxThreads prior to 0.8 are susceptible to ``livelocks''
 391 (one thread loops, consuming 100% of the CPU time) in conjunction with
 392 real-time scheduling.  Since real-time threads and processes have
 393 higher priority than normal Linux processes, all other processes on
 394 the machine, including the shell, the X server, etc, cannot run and
 395 the machine appears frozen.<P>
 396
 397 The problem is fixed in LinuxThreads 0.8.<P>
 398
 399 <H4><A NAME="D.10">D.10: My application needs to create thousands of
 400 threads, or maybe even more.  Can I do this with
 401 LinuxThreads?</A></H4>
 402
 403 No.  You're going to run into several hard limits:
 404 <UL>
 405 <LI>Each thread, from the kernel's standpoint, is one process.  Stock
 406 Linux kernels are limited to at most 512 processes for the super-user,
 407 and half this number for regular users.  This can be changed by
 408 changing <code>NR_TASKS</code> in <code>include/linux/tasks.h</code>
 409 and recompiling the kernel.  On the x86 processors at least,
 410 architectural constraints seem to limit <code>NR_TASKS</code> to 4090
 411 at most.
 412 <LI>LinuxThreads contains a table of all active threads.  This table
 413 has room for 1024 threads at most.  To increase this limit, you must
 414 change <code>PTHREAD_THREADS_MAX</code> in the LinuxThreads sources
 415 and recompile.
 416 <LI>By default, each thread reserves 2M of virtual memory space for
 417 its stack.  This space is just reserved; actual memory is allocated
 418 for the stack on demand.  But still, on a 32-bit processor, the total
 419 virtual memory space available for the stacks is on the order of 1G,
 420 meaning that more than 500 threads will have a hard time fitting in.
 421 You can overcome this limitation by moving to a 64-bit platform, or by
 422 allocating smaller stacks yourself using the <code>setstackaddr</code>
 423 attribute.
 424 <LI>Finally, the Linux kernel contains many algorithms that run in
 425 time proportional to the number of process table entries.  Increasing
 426 this number drastically will slow down the kernel operations
 427 noticeably.
 428 </UL>
 429 (Other POSIX threads libraries have similar limitations, by the way.)
 430 For all those reasons, you'd better restructure your application so
 431 that it doesn't need more than, say, 100 threads.  For instance,
 432 in the case of a multithreaded server, instead of creating a new
 433 thread for each connection, maintain a fixed-size pool of worker
 434 threads that pick incoming connection requests from a queue.<P>
 435
 436 <HR>
 437 <P>
 438
 439 <H2><A NAME="E">E. Missing functions, wrong types, etc</A></H2>
 440
 441 <H4><A NAME="E.1">E.1: Where is <CODE>pthread_yield()</CODE> ? How
 442 comes LinuxThreads does not implement it?</A></H4>
 443
 444 Because it's not part of the (final) POSIX 1003.1c standard.
 445 Several drafts of the standard contained <CODE>pthread_yield()</CODE>,
 446 but then the POSIX guys discovered it was redundant with
 447 <CODE>sched_yield()</CODE> and dropped it.  So, just use
 448 <CODE>sched_yield()</CODE> instead.
 449
 450 <H4><A NAME="E.2">E.2: I've found some type errors in
 451 <code>&lt;pthread.h&gt;</code>.
 452 For instance, the second argument to <CODE>pthread_create()</CODE>
 453 should be a <CODE>pthread_attr_t</CODE>, not a
 454 <CODE>pthread_attr_t *</CODE>. Also, didn't you forget to declare
 455 <CODE>pthread_attr_default</CODE>?</A></H4>
 456
 457 No, I didn't.  What you're describing is draft 4 of the POSIX
 458 standard, which is used in OSF DCE threads.  LinuxThreads conforms to the
 459 final standard.  Even though the functions have the same names as in
 460 draft 4 and DCE, their calling conventions are slightly different.  In
 461 particular, attributes are passed by reference, not by value, and
 462 default attributes are denoted by the NULL pointer.  Since draft 4/DCE
 463 will eventually disappear, you'd better port your program to use the
 464 standard interface.<P>
 465
 466 <H4><A NAME="E.3">E.3: I'm porting an application from Solaris and I
 467 have to rename all thread functions from <code>thr_blah</code> to
 468 <CODE>pthread_blah</CODE>.  This is very annoying.  Why did you change
 469 all the function names?</A></H4>
 470
 471 POSIX did it.  The <code>thr_*</code> functions correspond to Solaris
 472 threads, an older thread interface that you'll find only under
 473 Solaris.  The <CODE>pthread_*</CODE> functions correspond to POSIX
 474 threads, an international standard available for many, many platforms.
 475 Even Solaris 2.5 and later support the POSIX threads interface.  So,
 476 do yourself a favor and rewrite your code to use POSIX threads: this
 477 way, it will run unchanged under Linux, Solaris, and quite a lot of
 478 other platforms.<P>
 479
 480 <H4><A NAME="E.4">E.4: How can I suspend and resume a thread from
 481 another thread? Solaris has the <CODE>thr_suspend()</CODE> and
 482 <CODE>thr_resume()</CODE> functions to do that; why don't you?</A></H4>
 483
 484 The POSIX standard provides <B>no</B> mechanism by which a thread A can
 485 suspend the execution of another thread B, without cooperation from B.
 486 The only way to implement a suspend/restart mechanism is to have B
 487 check periodically some global variable for a suspend request
 488 and then suspend itself on a condition variable, which another thread
 489 can signal later to restart B.<P>
 490
 491 Notice that <CODE>thr_suspend()</CODE> is inherently dangerous and
 492 prone to race conditions.  For one thing, there is no control on where
 493 the target thread stops: it can very well be stopped in the middle of
 494 a critical section, while holding mutexes.  Also, there is no
 495 guarantee on when the target thread will actually stop.  For these
 496 reasons, you'd be much better off using mutexes and conditions
 497 instead.  The only situations that really require the ability to
 498 suspend a thread are debuggers and some kind of garbage collectors.<P>
 499
 500 If you really must suspend a thread in LinuxThreads, you can send it a
 501 <CODE>SIGSTOP</CODE> signal with <CODE>pthread_kill</CODE>. Send
 502 <CODE>SIGCONT</CODE> for restarting it.
 503 Beware, this is specific to LinuxThreads and entirely non-portable.
 504 Indeed, a truly conforming POSIX threads implementation will stop all
 505 threads when one thread receives the <CODE>SIGSTOP</CODE> signal!
 506 One day, LinuxThreads will implement that behavior, and the
 507 non-portable hack with <CODE>SIGSTOP</CODE> won't work anymore.<P>
 508
 509 <H4><A NAME="E.5">E.5: Does LinuxThreads implement
 510 <CODE>pthread_attr_setstacksize()</CODE> and
 511 <CODE>pthread_attr_setstackaddr()</CODE>?</A></H4>
 512
 513 These optional functions are provided in recent versions of
 514 LinuxThreads (0.8 and up).  Earlier releases did not provide these
 515 optional components of the POSIX standard.<P>
 516
 517 Even if <CODE>pthread_attr_setstacksize()</CODE> and
 518 <CODE>pthread_attr_setstackaddr()</CODE> are now provided, we still
 519 recommend that you do not use them unless you really have strong
 520 reasons for doing so.  The default stack allocation strategy for
 521 LinuxThreads is nearly optimal: stacks start small (4k) and
 522 automatically grow on demand to a fairly large limit (2M).
 523 Moreover, there is no portable way to estimate the stack requirements
 524 of a thread, so setting the stack size yourself makes your program
 525 less reliable and non-portable.<P>
 526
 527 <H4><A NAME="E.6">E.6: LinuxThreads does not support the
 528 <CODE>PTHREAD_SCOPE_PROCESS</CODE> value of the "contentionscope"
 529 attribute.  Why? </A></H4>
 530
 531 With a "one-to-one" model, as in LinuxThreads (one kernel execution
 532 context per thread), there is only one scheduler for all processes and
 533 all threads on the system.  So, there is no way to obtain the behavior of
 534 <CODE>PTHREAD_SCOPE_PROCESS</CODE>.
 535
 536 <H4><A NAME="E.7">E.7: LinuxThreads does not implement process-shared
 537 mutexes, conditions, and semaphores. Why?</A></H4>
 538
 539 This is another optional component of the POSIX standard.  Portable
 540 applications should test <CODE>_POSIX_THREAD_PROCESS_SHARED</CODE>
 541 before using this facility.
 542 <P>
 543 The goal of this extension is to allow different processes (with
 544 different address spaces) to synchronize through mutexes, conditions
 545 or semaphores allocated in shared memory (either SVR4 shared memory
 546 segments or <CODE>mmap()</CODE>ed files).
 547 <P>
 548 The reason why this does not work in LinuxThreads is that mutexes,
 549 conditions, and semaphores are not self-contained: their waiting
 550 queues contain pointers to linked lists of thread descriptors, and
 551 these pointers are meaningful only in one address space.
 552 <P>
 553 Matt Messier and I spent a significant amount of time trying to design a
 554 suitable mechanism for sharing waiting queues between processes.  We
 555 came up with several solutions that combined two of the following
 556 three desirable features, but none that combines all three:
 557 <UL>
 558 <LI>allow sharing between processes having different UIDs
 559 <LI>supports cancellation
 560 <LI>supports <CODE>pthread_cond_timedwait</CODE>
 561 </UL>
 562 We concluded that kernel support is required to share mutexes,
 563 conditions and semaphores between processes.  That's one place where
 564 Linus Torvalds's intuition that "all we need in the kernel is
 565 <CODE>clone()</CODE>" fails.
 566 <P>
 567 Until suitable kernel support is available, you'd better use
 568 traditional interprocess communications to synchronize different
 569 processes: System V semaphores and message queues, or pipes, or sockets.
 570 <P>
 571
 572 <HR>
 573 <P>
 574
 575 <H2><A NAME="F">F. C++ issues</A></H2>
 576
 577 <H4><A NAME="F.1">F.1: Are there C++ wrappers for LinuxThreads?</A></H4>
 578
 579 Douglas Schmidt's ACE library contains, among a lot of other
 580 things, C++ wrappers for LinuxThreads and quite a number of other
 581 thread libraries.  Check out
 582 <A HREF="http://www.cs.wustl.edu/~schmidt/ACE.html">http://www.cs.wustl.edu/~schmidt/ACE.html</A><P>
 583
 584 <H4><A NAME="F.2">F.2: I'm trying to use LinuxThreads from a C++
 585 program, and the compiler complains about the third argument to
 586 <CODE>pthread_create()</CODE> !</A></H4>
 587
 588 You're probably trying to pass a class member function or some
 589 other C++ thing as third argument to <CODE>pthread_create()</CODE>.
 590 Recall that <CODE>pthread_create()</CODE> is a C function, and it must
 591 be passed a C function as third argument.<P>
 592
 593 <H4><A NAME="F.3">F.3: I'm trying to use LinuxThreads in conjunction
 594 with libg++, and I'm having all sorts of trouble.</A></H4>
 595
 596 >From what I understand, thread support in libg++ is completely broken,
 597 especially with respect to locking of iostreams.  H.J.Lu wrote:
 598 <BLOCKQUOTE>
 599 If you want to use thread, I can only suggest egcs and glibc. You
 600 can find egcs at
 601 <A HREF="http://www.cygnus.com/egcs">http://www.cygnus.com/egcs</A>.
 602 egcs has libsdtc++, which is MT safe under glibc 2. If you really
 603 want to use the libg++, I have a libg++ add-on for egcs.
 604 </BLOCKQUOTE>
 605 <HR>
 606 <P>
 607
 608 <H2><A NAME="G">G. Debugging LinuxThreads programs</A></H2>
 609
 610 <H4><A NAME="G.1">G.1: Can I debug LinuxThreads program using gdb?</A></H4>
 611
 612 Yes, but not with the stock gdb 4.17.  You need a specially patched
 613 version of gdb 4.17 developed by Eric Paire and colleages at The Open
 614 Group, Grenoble.  The patches against gdb 4.17 are available at
 615 <A HREF="http://www.gr.opengroup.org/java/jdk/linux/debug.htm"><code>http://www.gr.opengroup.org/java/jdk/linux/debug.htm</code></A>.
 616 Precompiled binaries of the patched gdb are available in RedHat's RPM
 617 format at <A
 618 HREF="http://odin.appliedtheory.com/"><code>http://odin.appliedtheory.com/</code></A>.<P>
 619
 620 Some Linux distributions provide an already-patched version of gdb;
 621 others don't.  For instance, the gdb in RedHat 5.2 is thread-aware,
 622 but apparently not the one in RedHat 6.0.  Just ask (politely) the
 623 makers of your Linux distributions to please make sure that they apply
 624 the correct patches to gdb.<P>
 625
 626 <H4><A NAME="G.2">G.2: Does it work with post-mortem debugging?</A></H4>
 627
 628 Not very well.  Generally, the core file does not correspond to the
 629 thread that crashed.  The reason is that the kernel will not dump core
 630 for a process that shares its memory with other processes, such as the
 631 other threads of your program.  So, the thread that crashes silently
 632 disappears without generating a core file.  Then, all other threads of
 633 your program die on the same signal that killed the crashing thread.
 634 (This is required behavior according to the POSIX standard.)  The last
 635 one that dies is no longer sharing its memory with anyone else, so the
 636 kernel generates a core file for that thread.  Unfortunately, that's
 637 not the thread you are interested in.
 638
 639 <H4><A NAME="G.3">G.3: Any other ways to debug multithreaded programs, then?</A></H4>
 640
 641 Assertions and <CODE>printf()</CODE> are your best friends.  Try to debug
 642 sequential parts in a single-threaded program first.  Then, put
 643 <CODE>printf()</CODE> statements all over the place to get execution traces.
 644 Also, check invariants often with the <CODE>assert()</CODE> macro.  In truth,
 645 there is no other effective way (save for a full formal proof of your
 646 program) to track down concurrency bugs.  Debuggers are not really
 647 effective for subtle concurrency problems, because they disrupt
 648 program execution too much.<P>
 649
 650 <HR>
 651 <P>
 652
 653 <H2><A NAME="H">H. Compiling multithreaded code; errno madness</A></H2>
 654
 655 <H4><A NAME="H.1">H.1: You say all multithreaded code must be compiled
 656 with <CODE>_REENTRANT</CODE> defined. What difference does it make?</A></H4>
 657
 658 It affects include files in three ways:
 659 <UL>
 660 <LI> The include files define prototypes for the reentrant variants of
 661 some of the standard library functions,
 662 e.g. <CODE>gethostbyname_r()</CODE> as a reentrant equivalent to
 663 <CODE>gethostbyname()</CODE>.<P>
 664
 665 <LI> If <CODE>_REENTRANT</CODE> is defined, some
 666 <code>&lt;stdio.h&gt;</code> functions are no longer defined as macros,
 667 e.g. <CODE>getc()</CODE> and <CODE>putc()</CODE>. In a multithreaded
 668 program, stdio functions require additional locking, which the macros
 669 don't perform, so we must call functions instead.<P>
 670
 671 <LI> More importantly, <code>&lt;errno.h&gt;</code> redefines errno when
 672 <CODE>_REENTRANT</CODE> is
 673 defined, so that errno refers to the thread-specific errno location
 674 rather than the global errno variable.  This is achieved by the
 675 following <code>#define</code> in <code>&lt;errno.h&gt;</code>:
 676 <PRE>
 677         #define errno (*(__errno_location()))
 678 </PRE>
 679 which causes each reference to errno to call the
 680 <CODE>__errno_location()</CODE> function for obtaining the location
 681 where error codes are stored.  libc provides a default definition of
 682 <CODE>__errno_location()</CODE> that always returns
 683 <code>&errno</code> (the address of the global errno variable). Thus,
 684 for programs not linked with LinuxThreads, defining
 685 <CODE>_REENTRANT</CODE> makes no difference w.r.t. errno processing.
 686 But LinuxThreads redefines <CODE>__errno_location()</CODE> to return a
 687 location in the thread descriptor reserved for holding the current
 688 value of errno for the calling thread.  Thus, each thread operates on
 689 a different errno location.
 690 </UL>
 691 <P>
 692
 693 <H4><A NAME="H.2">H.2: Why is it so important that each thread has its
 694 own errno variable? </A></H4>
 695
 696 If all threads were to store error codes in the same, global errno
 697 variable, then the value of errno after a system call or library
 698 function returns would be unpredictable:  between the time a system
 699 call stores its error code in the global errno and your code inspects
 700 errno to see which error occurred, another thread might have stored
 701 another error code in the same errno location. <P>
 702
 703 <H4><A NAME="H.3">H.3: What happens if I link LinuxThreads with code
 704 not compiled with <CODE>-D_REENTRANT</CODE>?</A></H4>
 705
 706 Lots of trouble.  If the code uses <CODE>getc()</CODE> or
 707 <CODE>putc()</CODE>, it will perform I/O without proper interlocking
 708 of the stdio buffers; this can cause lost output, duplicate output, or
 709 just crash other stdio functions.  If the code consults errno, it will
 710 get back the wrong error code.  The following code fragment is a
 711 typical example:
 712 <PRE>
 713         do {
 714           r = read(fd, buf, n);
 715           if (r == -1) {
 716             if (errno == EINTR)   /* an error we can handle */
 717               continue;
 718             else {                /* other errors are fatal */
 719               perror("read failed");
 720               exit(100);
 721             }
 722           }
 723         } while (...);
 724 </PRE>
 725 Assume this code is not compiled with <CODE>-D_REENTRANT</CODE>, and
 726 linked with LinuxThreads.  At run-time, <CODE>read()</CODE> is
 727 interrupted.  Since the C library was compiled with
 728 <CODE>-D_REENTRANT</CODE>, <CODE>read()</CODE> stores its error code
 729 in the location pointed to by <CODE>__errno_location()</CODE>, which
 730 is the thread-local errno variable.  Then, the code above sees that
 731 <CODE>read()</CODE> returns -1 and looks up errno.  Since
 732 <CODE>_REENTRANT</CODE> is not defined, the reference to errno
 733 accesses the global errno variable, which is most likely 0.  Hence the
 734 code concludes that it cannot handle the error and stops.<P>
 735
 736 <H4><A NAME="H.4">H.4: With LinuxThreads, I can no longer use the signals
 737 <code>SIGUSR1</code> and <code>SIGUSR2</code> in my programs! Why? </A></H4>
 738
 739 The short answer is: because the Linux kernel you're using does not
 740 support realtime signals.  <P>
 741
 742 LinuxThreads needs two signals for its internal operation.
 743 One is used to suspend and restart threads blocked on mutex, condition
 744 or semaphore operations.  The other is used for thread
 745 cancellation.<P>
 746
 747 On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32
 748 signals available and the kernel reserves all of them but two:
 749 <code>SIGUSR1</code> and <code>SIGUSR2</code>.  So, LinuxThreads has
 750 no choice but use those two signals.<P>
 751
 752 On recent kernels (2.2 and up), more than 32 signals are provided in
 753 the form of realtime signals. When run on one of those kernels,
 754 LinuxThreads uses two reserved realtime signals for its internal
 755 operation, thus leaving <code>SIGUSR1</code> and <code>SIGUSR2</code>
 756 free for user code.  (This works only with glibc, not with libc 5.) <P>
 757
 758 <H4><A NAME="H.5">H.5: Is the stack of one thread visible from the
 759 other threads?  Can I pass a pointer into my stack to other threads?
 760 </A></H4>
 761
 762 Yes, you can -- if you're very careful.  The stacks are indeed visible
 763 from all threads in the system.  Some non-POSIX thread libraries seem
 764 to map the stacks for all threads at the same virtual addresses and
 765 change the memory mapping when they switch from one thread to
 766 another.  But this is not the case for LinuxThreads, as it would make
 767 context switching between threads more expensive, and at any rate
 768 might not conform to the POSIX standard.<P>
 769
 770 So, you can take the address of an "auto" variable and pass it to
 771 other threads via shared data structures.  However, you need to make
 772 absolutely sure that the function doing this will not return as long
 773 as other threads need to access this address.  It's the usual mistake
 774 of returning the address of an "auto" variable, only made much worse
 775 because of concurrency.  It's much, much safer to systematically
 776 heap-allocate all shared data structures. <P>
 777
 778 <HR>
 779 <P>
 780
 781 <H2><A NAME="I">I.  X-Windows and other libraries</A></H2>
 782
 783 <H4><A NAME="I.1">I.1: My program uses both Xlib and LinuxThreads.
 784 It stops very early with an "Xlib: unknown 0 error" message.  What
 785 does this mean? </A></H4>
 786
 787 That's a prime example of the errno problem described in question <A
 788 HREF="#H.2">H.2</A>.  The binaries for Xlib you're using have not been
 789 compiled with <CODE>-D_REENTRANT</CODE>.  It happens Xlib contains a
 790 piece of code very much like the one in question <A
 791 HREF="#H.2">H.2</A>.  So, your Xlib fetches the error code from the
 792 wrong errno location and concludes that an error it cannot handle
 793 occurred.<P>
 794
 795 <H4><A NAME="I.2">I.2: So, what can I do to build a multithreaded X
 796 Windows client? </A></H4>
 797
 798 The best solution is to use X libraries that have been compiled with
 799 multithreading options set.  Linux distributions that come with glibc
 800 2 as the main C library generally provide thread-safe X libraries.
 801 At least, that seems to be the case for RedHat 5 and later.<P>
 802
 803 You can try to recompile yourself the X libraries with multithreading
 804 options set.  They contain optional support for multithreading; it's
 805 just that the binaries provided by your Linux distribution were built
 806 without this support.  See the file <code>README.Xfree3.3</code> in
 807 the LinuxThreads distribution for patches and info on how to compile
 808 thread-safe X libraries from the Xfree3.3 distribution.  The Xfree3.3
 809 sources are readily available in most Linux distributions, e.g. as a
 810 source RPM for RedHat.  Be warned, however, that X Windows is a huge
 811 system, and recompiling even just the libraries takes a lot of time
 812 and disk space.<P>
 813
 814 Another, less involving solution is to call X functions only from the
 815 main thread of your program.  Even if all threads have their own errno
 816 location, the main thread uses the global errno variable for its errno
 817 location.  Thus, code not compiled with <code>-D_REENTRANT</code>
 818 still "sees" the right error values if it executes in the main thread
 819 only. <P>
 820
 821 <H4><A NAME="I.2">This is a lot of work. Don't you have precompiled
 822 thread-safe X libraries that you could distribute?</A></H4>
 823
 824 No, I don't.  Sorry.  But consider installing a Linux distribution
 825 that comes with thread-safe X libraries, such as RedHat 6.<P>
 826
 827 <H4><A NAME="I.3">I.3: Can I use library FOO in a multithreaded
 828 program?</A></H4>
 829
 830 Most libraries cannot be used "as is" in a multithreaded program.
 831 For one thing, they are not necessarily thread-safe: calling
 832 simultaneously two functions of the library from two threads might not
 833 work, due to internal use of global variables and the like.  Second,
 834 the libraries must have been compiled with <CODE>-D_REENTRANT</CODE> to avoid
 835 the errno problems explained in question <A HREF="#H.2">H.2</A>.
 836 <P>
 837
 838 <H4><A NAME="I.4">I.4: What if I make sure that only one thread calls
 839 functions in these libraries?</A></H4>
 840
 841 This avoids problems with the library not being thread-safe.  But
 842 you're still vulnerable to errno problems.  At the very least, a
 843 recompile of the library with <CODE>-D_REENTRANT</CODE> is needed.
 844 <P>
 845
 846 <H4><A NAME="I.5">I.5: What if I make sure that only the main thread
 847 calls functions in these libraries?</A></H4>
 848
 849 That might actually work.  As explained in question <A HREF="#I.1">I.1</A>,
 850 the main thread uses the global errno variable, and can therefore
 851 execute code not compiled with <CODE>-D_REENTRANT</CODE>.<P>
 852
 853 <H4><A NAME="I.6">I.6: SVGAlib doesn't work with LinuxThreads.  Why?
 854 </A></H4>
 855
 856 Because both LinuxThreads and SVGAlib use the signals
 857 <code>SIGUSR1</code> and <code>SIGUSR2</code>.  See question <A
 858 HREF="#H.4">H.4</A>.
 859 <P>
 860
 861
 862 <HR>
 863 <P>
 864
 865 <H2><A NAME="J">J.  Signals and threads</A></H2>
 866
 867 <H4><A NAME="J.1">J.1: When it comes to signals, what is shared
 868 between threads and what isn't?</A></H4>
 869
 870 Signal handlers are shared between all threads: when a thread calls
 871 <CODE>sigaction()</CODE>, it sets how the signal is handled not only
 872 for itself, but for all other threads in the program as well.<P>
 873
 874 On the other hand, signal masks are per-thread: each thread chooses
 875 which signals it blocks independently of others.  At thread creation
 876 time, the newly created thread inherits the signal mask of the thread
 877 calling <CODE>pthread_create()</CODE>.  But afterwards, the new thread
 878 can modify its signal mask independently of its creator thread.<P>
 879
 880 <H4><A NAME="J.2">J.2: When I send a <CODE>SIGKILL</CODE> to a
 881 particular thread using <CODE>pthread_kill</CODE>, all my threads are
 882 killed!</A></H4>
 883
 884 That's how it should be.  The POSIX standard mandates that all threads
 885 should terminate when the process (i.e. the collection of all threads
 886 running the program) receives a signal whose effect is to
 887 terminate the process (such as <CODE>SIGKILL</CODE> or <CODE>SIGINT</CODE>
 888 when no handler is installed on that signal).  This behavior makes a
 889 lot of sense: when you type "ctrl-C" at the keyboard, or when a thread
 890 crashes on a division by zero or a segmentation fault, you really want
 891 all threads to stop immediately, not just the one that caused the
 892 segmentation violation or that got the <CODE>SIGINT</CODE> signal.
 893 (This assumes default behavior for those signals; see question
 894 <A HREF="#J.3">J.3</A> if you install handlers for those signals.)<P>
 895
 896 If you're trying to terminate a thread without bringing the whole
 897 process down, use <code>pthread_cancel()</code>.<P>
 898
 899 <H4><A NAME="J.3">J.3: I've installed a handler on a signal.  Which
 900 thread executes the handler when the signal is received?</A></H4>
 901
 902 If the signal is generated by a thread during its execution (e.g. a
 903 thread executes a division by zero and thus generates a
 904 <CODE>SIGFPE</CODE> signal), then the handler is executed by that
 905 thread.  This also applies to signals generated by
 906 <CODE>raise()</CODE>.<P>
 907
 908 If the signal is sent to a particular thread using
 909 <CODE>pthread_kill()</CODE>, then that thread executes the handler.<P>
 910
 911 If the signal is sent via <CODE>kill()</CODE> or the tty interface
 912 (e.g. by pressing ctrl-C), then the POSIX specs say that the handler
 913 is executed by any thread in the process that does not currently block
 914 the signal.  In other terms, POSIX considers that the signal is sent
 915 to the process (the collection of all threads) as a whole, and any
 916 thread that is not blocking this signal can then handle it.<P>
 917
 918 The latter case is where LinuxThreads departs from the POSIX specs.
 919 In LinuxThreads, there is no real notion of ``the process as a whole'':
 920 in the kernel, each thread is really a distinct process with a
 921 distinct PID, and signals sent to the PID of a thread can only be
 922 handled by that thread.  As long as no thread is blocking the signal,
 923 the behavior conforms to the standard: one (unspecified) thread of the
 924 program handles the signal.  But if the thread to which PID the signal
 925 is sent blocks the signal, and some other thread does not block the
 926 signal, then LinuxThreads will simply queue in
 927 that thread and execute the handler only when that thread unblocks
 928 the signal, instead of executing the handler immediately in the other
 929 thread that does not block the signal.<P>
 930
 931 This is to be viewed as a LinuxThreads bug, but I currently don't see
 932 any way to implement the POSIX behavior without kernel support.<P>
 933
 934 <H4><A NAME="J.3">J.3: How shall I go about mixing signals and threads
 935 in my program? </A></H4>
 936
 937 The less you mix them, the better.  Notice that all
 938 <CODE>pthread_*</CODE> functions are not async-signal safe, meaning
 939 that you should not call them from signal handlers.  This
 940 recommendation is not to be taken lightly: your program can deadlock
 941 if you call a <CODE>pthread_*</CODE> function from a signal handler!
 942 <P>
 943
 944 The only sensible things you can do from a signal handler is set a
 945 global flag, or call <CODE>sem_post</CODE> on a semaphore, to record
 946 the delivery of the signal.  The remainder of the program can then
 947 either poll the global flag, or use <CODE>sem_wait()</CODE> and
 948 <CODE>sem_trywait()</CODE> on the semaphore.<P>
 949
 950 Another option is to do nothing in the signal handler, and dedicate
 951 one thread (preferably the initial thread) to wait synchronously for
 952 signals, using <CODE>sigwait()</CODE>, and send messages to the other
 953 threads accordingly.
 954
 955 <H4><A NAME="J.4">J.4: When one thread is blocked in
 956 <CODE>sigwait()</CODE>, other threads no longer receive the signals
 957 <CODE>sigwait()</CODE> is waiting for!  What happens? </A></H4>
 958
 959 It's an unfortunate consequence of how LinuxThreads implements
 960 <CODE>sigwait()</CODE>.  Basically, it installs signal handlers on all
 961 signals waited for, in order to record which signal was received.
 962 Since signal handlers are shared with the other threads, this
 963 temporarily deactivates any signal handlers you might have previously
 964 installed on these signals.<P>
 965
 966 Though surprising, this behavior actually seems to conform to the
 967 POSIX standard.  According to POSIX, <CODE>sigwait()</CODE> is
 968 guaranteed to work as expected only if all other threads in the
 969 program block the signals waited for (otherwise, the signals could be
 970 delivered to other threads than the one doing <CODE>sigwait()</CODE>,
 971 which would make <CODE>sigwait()</CODE> useless).  In this particular
 972 case, the problem described in this question does not appear.<P>
 973
 974 One day, <CODE>sigwait()</CODE> will be implemented in the kernel,
 975 along with others POSIX 1003.1b extensions, and <CODE>sigwait()</CODE>
 976 will have a more natural behavior (as well as better performances).<P>
 977
 978 <HR>
 979 <P>
 980
 981 <H2><A NAME="K">K.  Internals of LinuxThreads</A></H2>
 982
 983 <H4><A NAME="K.1">K.1: What is the implementation model for
 984 LinuxThreads?</A></H4>
 985
 986 LinuxThreads follows the so-called "one-to-one" model: each thread is
 987 actually a separate process in the kernel.  The kernel scheduler takes
 988 care of scheduling the threads, just like it schedules regular
 989 processes.  The threads are created with the Linux
 990 <code>clone()</code> system call, which is a generalization of
 991 <code>fork()</code> allowing the new process to share the memory
 992 space, file descriptors, and signal handlers of the parent.<P>
 993
 994 Advantages of the "one-to-one" model include:
 995 <UL>
 996 <LI> minimal overhead on CPU-intensive multiprocessing (with
 997 about one thread per processor);
 998 <LI> minimal overhead on I/O operations;
 999 <LI> a simple and robust implementation (the kernel scheduler does
1000 most of the hard work for us).
1001 </UL>
1002 The main disadvantage is more expensive context switches on mutex and
1003 condition operations, which must go through the kernel.  This is
1004 mitigated by the fact that context switches in the Linux kernel are
1005 pretty efficient.<P>
1006
1007 <H4><A NAME="K.2">K.2: Have you considered other implementation
1008 models?</A></H4>
1009
1010 There are basically two other models.  The "many-to-one" model
1011 relies on a user-level scheduler that context-switches between the
1012 threads entirely in user code; viewed from the kernel, there is only
1013 one process running.  This model is completely out of the question for
1014 me, since it does not take advantage of multiprocessors, and require
1015 unholy magic to handle blocking I/O operations properly.  There are
1016 several user-level thread libraries available for Linux, but I found
1017 all of them deficient in functionality, performance, and/or robustness.
1018 <P>
1019
1020 The "many-to-many" model combines both kernel-level and user-level
1021 scheduling: several kernel-level threads run concurrently, each
1022 executing a user-level scheduler that selects between user threads.
1023 Most commercial Unix systems (Solaris, Digital Unix, IRIX) implement
1024 POSIX threads this way.  This model combines the advantages of both
1025 the "many-to-one" and the "one-to-one" model, and is attractive
1026 because it avoids the worst-case behaviors of both models --
1027 especially on kernels where context switches are expensive, such as
1028 Digital Unix.  Unfortunately, it is pretty complex to implement, and
1029 requires kernel support which Linux does not provide.  Linus Torvalds
1030 and other Linux kernel developers have always been pushing the
1031 "one-to-one" model in the name of overall simplicity, and are doing a
1032 pretty good job of making kernel-level context switches between
1033 threads efficient.  LinuxThreads is just following the general
1034 direction they set.<P>
1035
1036 <HR>
1037 <ADDRESS>Xavier.Leroy@inria.fr</ADDRESS>
1038 </BODY>
1039 </HTML>