TUNING

   1 = Tuning unicorn
   2
   3 unicorn performance is generally as good as a (mostly) Ruby web server
   4 can provide.  Most often the performance bottleneck is in the web
   5 application running on Unicorn rather than Unicorn itself.
   6
   7 == unicorn Configuration
   8
   9 See Unicorn::Configurator for details on the config file format.
  10 +worker_processes+ is the most-commonly needed tuning parameter.
  11
  12 === Unicorn::Configurator#worker_processes
  13
  14 * worker_processes should be scaled to the number of processes your
  15   backend system(s) can support.  DO NOT scale it to the number of
  16   external network clients your application expects to be serving.
  17   unicorn is NOT for serving slow clients, that is the job of nginx.
  18
  19 * worker_processes should be *at* *least* the number of CPU cores on
  20   a dedicated server (unless you do not have enough memory).
  21   If your application has occasionally slow responses that are /not/
  22   CPU-intensive, you may increase this to workaround those inefficiencies.
  23
  24 * Under Ruby 2.2 or later, Etc.nprocessors may be used to determine
  25   the number of CPU cores present.
  26
  27 * worker_processes may be increased for Unicorn::OobGC users to provide
  28   more consistent response times.
  29
  30 * Never, ever, increase worker_processes to the point where the system
  31   runs out of physical memory and hits swap.  Production servers should
  32   never see heavy swap activity.
  33
  34 === Unicorn::Configurator#listen Options
  35
  36 * Setting a very low value for the :backlog parameter in "listen"
  37   directives can allow failover to happen more quickly if your
  38   cluster is configured for it.
  39
  40 * If you're doing extremely simple benchmarks and getting connection
  41   errors under high request rates, increasing your :backlog parameter
  42   above the already-generous default of 1024 can help avoid connection
  43   errors.  Keep in mind this is not recommended for real traffic if
  44   you have another machine to failover to (see above).
  45
  46 * :rcvbuf and :sndbuf parameters generally do not need to be set for TCP
  47   listeners under Linux 2.6 because auto-tuning is enabled.  UNIX domain
  48   sockets do not have auto-tuning buffer sizes; so increasing those will
  49   allow syscalls and task switches to be saved for larger requests
  50   and responses.  If your app only generates small responses or expects
  51   small requests, you may shrink the buffer sizes to save memory, too.
  52
  53 * Having socket buffers too large can also be detrimental or have
  54   little effect.  Huge buffers can put more pressure on the allocator
  55   and may also thrash CPU caches, cancelling out performance gains
  56   one would normally expect.
  57
  58 * UNIX domain sockets are slightly faster than TCP sockets, but only
  59   work if nginx is on the same machine.
  60
  61 == Other unicorn settings
  62
  63 * Setting "preload_app true" can allow copy-on-write-friendly GC to
  64   be used to save memory.  It will probably not work out of the box with
  65   applications that open sockets or perform random I/O on files.
  66   Databases like TokyoCabinet use concurrency-safe pread()/pwrite()
  67   functions for safe sharing of database file descriptors across
  68   processes.
  69
  70 * On POSIX-compliant filesystems, it is safe for multiple threads or
  71   processes to append to one log file as long as all the processes are
  72   have them unbuffered (File#sync = true) or they are
  73   record(line)-buffered in userspace before any writes.
  74
  75 == Kernel Parameters (Linux sysctl and sysfs)
  76
  77 WARNING: Do not change system parameters unless you know what you're doing!
  78
  79 * Transparent hugepages (THP) improves performance in many cases,
  80   but can also increase memory use when relying on a
  81   copy-on-write(CoW)-friendly GC (Ruby 2.0+) with "preload_app true".
  82   CoW operates at the page level, so writing to a huge page would
  83   trigger a 2 MB copy (x86-64), as opposed to a 4 KB copy on a
  84   regular (non-huge) page.
  85
  86   Consider only allowing THP to be used when it is requested via the
  87   madvise(2) syscall:
  88
  89         echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
  90
  91   Or disabling it system-wide, via "never".
  92
  93   n.b. "page" in this context only applies to the OS kernel,
  94   Ruby GC implementations also use this term for the same concept
  95   in a way that is agnostic to the OS.
  96
  97 * net.core.rmem_max and net.core.wmem_max can increase the allowed
  98   size of :rcvbuf and :sndbuf respectively. This is mostly only useful
  99   for UNIX domain sockets which do not have auto-tuning buffer sizes.
 100
 101 * For load testing/benchmarking with UNIX domain sockets, you should
 102   consider increasing net.core.somaxconn or else nginx will start
 103   failing to connect under heavy load.  You may also consider setting
 104   a higher :backlog to listen on as noted earlier.
 105
 106 * If you're running out of local ports, consider lowering
 107   net.ipv4.tcp_fin_timeout to 20-30 (default: 60 seconds).  Also
 108   consider widening the usable port range by changing
 109   net.ipv4.ip_local_port_range.
 110
 111 * Setting net.ipv4.tcp_timestamps=1 will also allow setting
 112   net.ipv4.tcp_tw_reuse=1 and net.ipv4.tcp_tw_recycle=1, which along
 113   with the above settings can slow down port exhaustion.  Not all
 114   networks are compatible with these settings, check with your friendly
 115   network administrator before changing these.
 116
 117 * Increasing the MTU size can reduce framing overhead for larger
 118   transfers.  One often-overlooked detail is that the loopback
 119   device (usually "lo") can have its MTU increased, too.