Public Git Hosting - linux-2.6/verdex.git/commit

debug lockups: Improve lockup detection

When debugging a recent lockup bug i found various deficiencies
in how our current lockup detection helpers work:

- SysRq-L is not very efficient as it uses a workqueue, hence
   it cannot punch through hard lockups and cannot see through
   most soft lockups either.

- The SysRq-L code depends on the NMI watchdog - which is off
   by default.

- We dont print backtraces from the RCU code's built-in
   'RCU state machine is stuck' debug code. This debug
   code tends to be one of the first (and only) mechanisms
   that show that a lockup has occured.

This patch changes the code so taht we:

- Trigger the NMI backtrace code from SysRq-L instead of using
   a workqueue (which cannot punch through hard lockups)

- Trigger print-all-CPU-backtraces from the RCU lockup detection
   code

Also decouple the backtrace printing code from the NMI watchdog:

- Dont use variable size cpumasks (it might not be initialized
   and they are a bit more fragile anyway)

- Trigger an NMI immediately via an IPI, instead of waiting
   for the NMI tick to occur. This is a lot faster and can
   produce more relevant backtraces. It will also work if the
   NMI watchdog is disabled.

- Dont print the 'dazed and confused' message when we print
   a backtrace from the NMI

- Do a show_regs() plus a dump_stack() to get maximum info
   out of the dump. Worst-case we get two stacktraces - which
   is not a big deal. Sometimes, if register content is
   corrupted, the precise stack walker in show_regs() wont
   give us a full backtrace - in this case dump_stack() will
   do it.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit	c1dc0b9c0c8979ce4d411caadff5c0d79dee58bc
author	Ingo Molnar <mingo@elte.hu>
	Sun, 2 Aug 2009 09:28:21 +0000 (2 11:28 +0200)
committer	Ingo Molnar <mingo@elte.hu>
	Sun, 2 Aug 2009 11:27:17 +0000 (2 13:27 +0200)
tree	6822205799a6cf8928623d60aa226c95534a20f9	tree \| snapshot (tar.gz zip)
parent	ed680c4ad478d0fee9740f7d029087f181346564	commit \| diff

arch/x86/kernel/apic/nmi.c		diff \| blob \| blame \| history
drivers/char/sysrq.c		diff \| blob \| blame \| history
kernel/rcutree.c		diff \| blob \| blame \| history