Linux 2.2.0pre72.2.0pre7
commitc68677acad7a4abc12505614268f61cf9f3bdcef
authorLinus Torvalds <torvalds@linuxfoundation.org>
Fri, 23 Nov 2007 20:18:01 +0000 (23 15:18 -0500)
committerLinus Torvalds <torvalds@linuxfoundation.org>
Fri, 23 Nov 2007 20:18:01 +0000 (23 15:18 -0500)
tree8ed51a6e77f53ba541991eb5390dfdba49c62eb5
parent70c27ee94003b5e3741c5d36f5a84ac6cc81ae82
Linux 2.2.0pre7

Ok, I think I now know why pre-6 looks so unbalanced. It's two issues.
Basically, trying to swap out a large number of pages from one process
context is just doomed. It bascially sucks, because

 - it has bad latency. This is further excerberated by the per-process
   "thrashing_memory" flag, which means that if we were unlucky enough to
   be selected to be the process that frees up memory, we'll probably be
   stuck with it for a long time. That can make it extremely unfair under
   some circumstances - other processes may allocate the pages we free'd
   up, so that we keep on being counted as a memory trasher even if we
   really aren't.
   Note that this shows most under "moderate" load - the problem doesn't
   tend to show itself if you have some process that is _really_
   allocating a lot of pages, because then that process will be correctly
   found by the trashing logic. But if you have lots of "normal load"
   processes, some of those can get really badly hurt by this.
   In particular, the worst case you have a number of processes that all
   allocate memory, but not very quickly - certainly not more quickly than
   we can page things out. What happens is that under these circumstances
   one of them gets marked as a "scapegoat", and once that happens all the
   others will just live off the pages that the scapegoat frees up, while
   the scapegoat itself doesn't make much progress at all because it is
   always just freeing memory for others.
   The really bad behaviour tends to go away reasonably quickly, but while
   it happens it's _really_ unfair.
 - try_to_free_pages() just goes overboard, and starts paging stuff out
   without getting back to the nice balanced behaviour. This is what
   Andrea noticed.
   Essentially, once it starts failing the shrink_mmap() tests, it will
   just page things out crazily. Normally this is avoided by just always
   starting from shrink_mmap(), but if you ask try_to_free_pages() to try
   to free up a ton of pages, the balancing that it does is basically
   bypassed.

So basically pre-6 works _really_ well for the kind of stress-me stuff
that it was designed for: a few processes that are extremely memory
hungry. It gets close to perfect swap-out behaviour, simply because it is
optimized for getting into a paging rut.

That makes for nice benchmarks, but it also explains why (a) sometimes
it's just not very nice for interactive behaviour and (b) why it under
normal load can easily swap much too eagerly.

Anyway, the first problem is fixed by making "trashing" be a global flag
rather than a per-process flag. Being per-process is really nice when it
finds the right process, but it's really unfair under a lot of other
circumstances. I'd rather be fair than get the best possible page-out
speed.

Note that even a global flag helps: it still clusters the write-outs, and
means that processes that allocate more pages tend to be more likely to be
hit by it, so it still does a large part of what the per-process flag did
- without the unfairness (but admittedly being unfair sometimes gets you
better performance - you just have to be _very_ careful whom you target
with the unfairness, and that's the hard part).

The second problem actually goes away by simply just not asking
try_to_free_pages() to free too many pages - and having the global
trashing flag makes it unnecessary to do so anyway because the flag will
essentially cluster the page-outs even without asking for them to be all
done in one large chunk (and now it's not just one process that gets hit
any more).

There's a "pre-7.gz" on ftp.kernel.org in testing, anybody interested?
It's not the real thing, as I haven't done the write semaphore deadlock
thing yet, but that one will not affect normal users anyway so for
performance testing this should be equivalent.

                        Linus
101 files changed:
CREDITS
Documentation/Configure.help
Documentation/sound/VIA-chipset
Makefile
arch/alpha/Makefile
arch/alpha/config.in
arch/alpha/kernel/Makefile
arch/alpha/kernel/bios32.h
arch/alpha/kernel/core_polaris.c [new file with mode: 0644]
arch/alpha/kernel/core_t2.c
arch/alpha/kernel/core_tsunami.c
arch/alpha/kernel/entry.S
arch/alpha/kernel/irq.c
arch/alpha/kernel/machvec.h
arch/alpha/kernel/process.c
arch/alpha/kernel/proto.h
arch/alpha/kernel/setup.c
arch/alpha/kernel/sys_dp264.c
arch/alpha/kernel/sys_rawhide.c
arch/alpha/kernel/sys_ruffian.c
arch/alpha/kernel/sys_rx164.c [new file with mode: 0644]
arch/alpha/kernel/time.c
arch/i386/kernel/ioport.c
arch/i386/kernel/process.c
arch/i386/kernel/ptrace.c
drivers/block/floppy.c
drivers/block/genhd.c
drivers/char/epca.c
drivers/char/lp_m68k.c
drivers/char/pc_keyb.c
drivers/char/radio-aimslab.c
drivers/char/vt.c
drivers/net/3c59x.c
drivers/net/epic100.c
drivers/net/hp100.c
drivers/net/tulip.c
drivers/scsi/AM53C974.c
drivers/scsi/aha152x.c
drivers/scsi/aic7xxx.c
drivers/scsi/atari_dma_emul.c
drivers/scsi/gdth.c
drivers/scsi/gdth.h
drivers/scsi/gdth_ioctl.h
drivers/scsi/gdth_proc.c
drivers/scsi/gdth_proc.h
drivers/scsi/i91uscsi.c
drivers/scsi/imm.c
drivers/scsi/ini9100u.c
drivers/scsi/ini9100u.h
drivers/scsi/megaraid.c
drivers/scsi/ppa.c
drivers/scsi/scsi.c
drivers/scsi/scsi.h
drivers/scsi/scsi_obsolete.c
drivers/sound/lowlevel/README
drivers/sound/wavfront.c
drivers/video/fbmem.c
drivers/video/mdacon.c
drivers/video/retz3fb.c
fs/autofs/autofs_i.h
fs/autofs/dirhash.c
fs/autofs/inode.c
fs/autofs/root.c
fs/autofs/waitq.c
fs/nls/Config.in
fs/proc/array.c
fs/proc/root.c
fs/select.c
fs/ufs/namei.c
fs/ufs/super.c
include/asm-alpha/core_cia.h
include/asm-alpha/core_mcpcia.h
include/asm-alpha/core_polaris.h [new file with mode: 0644]
include/asm-alpha/core_tsunami.h
include/asm-alpha/io.h
include/asm-alpha/irq.h
include/asm-alpha/spinlock.h
include/asm-alpha/unistd.h
include/asm-i386/locks.h
include/asm-i386/semaphore.h
include/asm-sparc/page.h
include/linux/file.h
include/linux/mm.h
include/linux/proc_fs.h
include/linux/sched.h
include/linux/swap.h
include/linux/swapctl.h
ipc/sem.c
kernel/fork.c
kernel/sysctl.c
mm/page_alloc.c
mm/swap.c
mm/swap_state.c
mm/swapfile.c
mm/vmscan.c
net/ipv4/Config.in
net/ipv4/ip_fw.c
net/irda/irttp.c
scripts/Menuconfig
scripts/lxdialog/menubox.c
scripts/tkgen.c