TUNING

   1 = Tuning \Unicorn
   2
   3 \Unicorn performance is generally as good as a (mostly) Ruby web server
   4 can provide.  Most often the performance bottleneck is in the web
   5 application running on Unicorn rather than Unicorn itself.
   6
   7 == \Unicorn Configuration
   8
   9 See Unicorn::Configurator for details on the config file format.
  10
  11 === Unicorn::Configurator#worker_processes
  12
  13 * worker_processes should be scaled to the number of processes your
  14   backend system(s) can support.  DO NOT scale it to the number of
  15   external network clients your application expects to be serving.
  16   \Unicorn is NOT for serving slow clients, that is the job of nginx.
  17
  18 * worker_processes should be *at* *least* the number of CPU cores on
  19   a dedicated server.  If your application has occasionally slow
  20   responses that are /not/ CPU-intensive, you may increase this to
  21   workaround those inefficiencies.
  22
  23 * worker_processes may be increased for Unicorn::OobGC users to provide
  24   more consistent response times.
  25
  26 * Never, ever, increase worker_processes to the point where the system
  27   runs out of physical memory and hits swap.  Production servers should
  28   never see heavy swap activity.
  29
  30 * Setting a very low value for the :backlog parameter in "listen"
  31   directives can allow failover to happen more quickly if your
  32   cluster is configured for it.
  33
  34 * If you're doing extremely simple benchmarks and getting connection
  35   errors under high request rates, increasing your :backlog parameter
  36   above the already-generous default of 1024 can help avoid connection
  37   errors.  Keep in mind this is not recommended for real traffic if
  38   you have another machine to failover to (see above).
  39
  40 * :rcvbuf and :sndbuf parameters generally do not need to be set for TCP
  41   listeners under Linux 2.6 because auto-tuning is enabled.  UNIX domain
  42   sockets do not have auto-tuning buffer sizes; so increasing those will
  43   allow syscalls and task switches to be saved for larger requests
  44   and responses.  If your app only generates small responses or expects
  45   small requests, you may shrink the buffer sizes to save memory, too.
  46
  47 * Having socket buffers too large can also be detrimental or have
  48   little effect.  Huge buffers can put more pressure on the allocator
  49   and may also thrash CPU caches, cancelling out performance gains
  50   one would normally expect.
  51
  52 * Setting "preload_app true" can allow copy-on-write-friendly GC to
  53   be used to save memory.  It will probably not work out of the box with
  54   applications that open sockets or perform random I/O on files.
  55   Databases like TokyoCabinet use concurrency-safe pread()/pwrite()
  56   functions for safe sharing of database file descriptors across
  57   processes.
  58
  59 * On POSIX-compliant filesystems, it is safe for multiple threads or
  60   processes to append to one log file as long as all the processes are
  61   have them unbuffered (File#sync = true) or they are
  62   record(line)-buffered in userspace before any writes.
  63
  64 == Kernel Parameters (Linux sysctl)
  65
  66 WARNING: Do not change system parameters unless you know what you're doing!
  67
  68 * net.core.rmem_max and net.core.wmem_max can increase the allowed
  69   size of :rcvbuf and :sndbuf respectively. This is mostly only useful
  70   for UNIX domain sockets which do not have auto-tuning buffer sizes.
  71
  72 * For load testing/benchmarking with UNIX domain sockets, you should
  73   consider increasing net.core.somaxconn or else nginx will start
  74   failing to connect under heavy load.  You may also consider setting
  75   a higher :backlog to listen on as noted earlier.
  76
  77 * If you're running out of local ports, consider lowering
  78   net.ipv4.tcp_fin_timeout to 20-30 (default: 60 seconds).  Also
  79   consider widening the usable port range by changing
  80   net.ipv4.ip_local_port_range.
  81
  82 * Setting net.ipv4.tcp_timestamps=1 will also allow setting
  83   net.ipv4.tcp_tw_reuse=1 and net.ipv4.tcp_tw_recycle=1, which along
  84   with the above settings can slow down port exhaustion.  Not all
  85   networks are compatible with these settings, check with your friendly
  86   network administrator before changing these.
  87
  88 * Increasing the MTU size can reduce framing overhead for larger
  89   transfers.  One often-overlooked detail is that the loopback
  90   device (usually "lo") can have its MTU increased, too.