share/man/man7/tuning.7

   1 .\" Copyright (c) 2001 Matthew Dillon.  Terms and conditions are those of
   2 .\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
   3 .\" the source tree.
   4 .\"
   5 .Dd August 13, 2017
   6 .Dt TUNING 7
   7 .Os
   8 .Sh NAME
   9 .Nm tuning
  10 .Nd performance tuning under DragonFly
  11 .Sh SYSTEM SETUP
  12 Modern
  13 .Dx
  14 systems typically have just three partitions on the main drive.
  15 In order, a UFS
  16 .Pa /boot ,
  17 .Pa swap ,
  18 and a HAMMER
  19 .Pa root .
  20 The installer used to create separate PFSs for half a dozen directories,
  21 but now it just puts (almost) everything in the root.
  22 It will separate stuff that doesn't need to be backed up into a /build
  23 subdirectory and create null-mounts for things like /usr/obj, but it
  24 no longer creates separate PFSs for these.
  25 If desired, you can make /build its own mount to separate-out the
  26 components of the filesystem which do not need to be persistent.
  27 .Pp
  28 Generally speaking the
  29 .Pa /boot
  30 partition should be 1GB in size.  This is the minimum recommended
  31 size, giving you room for backup kernels and alternative boot schemes.
  32 .Dx
  33 always installs debug-enabled kernels and modules and these can take
  34 up quite a bit of disk space (but will not take up any extra ram).
  35 .Pp
  36 In the old days we recommended that swap be sized to at least 2x main
  37 memory.  These days swap is often used for other activities, including
  38 .Xr tmpfs 5
  39 and
  40 .Xr swapcache 8 .
  41 We recommend that swap be sized to the larger of 2x main memory or
  42 1GB if you have a fairly small disk and 16GB or more if you have a
  43 modestly endowed system.
  44 If you have a modest SSD + large HDD combination, we recommend
  45 a large dedicated swap partition on the SSD.  For example, if
  46 you have a 128GB SSD and 2TB or more of HDD storage, dedicating
  47 upwards of 64GB of the SSD to swap and using
  48 .Xr swapcache 8
  49 and
  50 .Xr tmpfs 5
  51 will significantly improve your HDD's performance.
  52 .Pp
  53 In an all-SSD or mostly-SSD system,
  54 .Xr swapcache 8
  55 is not normally used but you may still want to have a large swap
  56 partition to support
  57 .Xr tmpfs 5
  58 use.
  59 Our synth/poudriere build machines run with a 200GB
  60 swap partition and use tmpfs for all the builder jails.  50-100 GB
  61 is swapped out at the peak of the build.  As a result, actual
  62 system storage bandwidth is minimized and performance increased.
  63 .Pp
  64 If you are on a minimally configured machine you may, of course,
  65 configure far less swap or no swap at all but we recommend at least
  66 some swap.
  67 The kernel's VM paging algorithms are tuned to perform best when there is
  68 swap space configured.
  69 Configuring too little swap can lead to inefficiencies in the VM
  70 page scanning code as well as create issues later on if you add
  71 more memory to your machine, so don't be shy about it.
  72 Swap is a good idea even if you don't think you will ever need it as it
  73 allows the
  74 machine to page out completely unused data and idle programs (like getty),
  75 maximizing the ram available for your activities.
  76 .Pp
  77 If you intend to use the
  78 .Xr swapcache 8
  79 facility with a SSD + HDD combination we recommend configuring as much
  80 swap space as you can on the SSD.
  81 However, keep in mind that each 1GByte of swapcache requires around
  82 1MByte of ram, so don't scale your swap beyond the equivalent ram
  83 that you reasonably want to eat to support it.
  84 .Pp
  85 Finally, on larger systems with multiple drives, if the use
  86 of SSD swap is not in the cards or if it is and you need higher-than-normal
  87 swapcache bandwidth, you can configure swap on up to four drives and
  88 the kernel will interleave the storage.
  89 The swap partitions on the drives should be approximately the same size.
  90 The kernel can handle arbitrary sizes but
  91 internal data structures scale to 4 times the largest swap partition.
  92 Keeping
  93 the swap partitions near the same size will allow the kernel to optimally
  94 stripe swap space across the N disks.
  95 Do not worry about overdoing it a
  96 little, swap space is the saving grace of
  97 .Ux
  98 and even if you do not normally use much swap, it can give you more time to
  99 recover from a runaway program before being forced to reboot.
 100 However, keep in mind that any sort of swap space failure can lock the
 101 system up.
 102 Most machines are setup with only one or two swap partitions.
 103 .Pp
 104 Most
 105 .Dx
 106 systems have a single HAMMER root.
 107 PFSs can be used to administratively separate domains for backup purposes
 108 but tend to be a hassle otherwise so if you don't need the administrative
 109 separation you don't really need to use multiple HAMMER PFSs.
 110 All the PFSs share the same allocation layer so there is no longer a need
 111 to size each individual mount.
 112 Instead you should review the
 113 .Xr hammer 8
 114 manual page and use the 'hammer viconfig' facility to adjust snapshot
 115 retention and other parameters.
 116 By default
 117 HAMMER keeps 60 days worth of snapshots.
 118 Usually snapshots are not desired on PFSs such as
 119 .Pa /usr/obj
 120 or
 121 .Pa /tmp
 122 since data on these partitions cycles a lot.
 123 .Pp
 124 If a very large work area is desired it is often beneficial to
 125 configure it as a separate HAMMER mount.  If it is integrated into
 126 the root mount it should at least be its own HAMMER PFS.
 127 We recommend naming the large work area
 128 .Pa /build .
 129 Similarly if a machine is going to have a large number of users
 130 you might want to separate your
 131 .Pa /home
 132 out as well.
 133 .Pp
 134 A number of run-time
 135 .Xr mount 8
 136 options exist that can help you tune the system.
 137 The most obvious and most dangerous one is
 138 .Cm async .
 139 Do not ever use it; it is far too dangerous.
 140 A less dangerous and more
 141 useful
 142 .Xr mount 8
 143 option is called
 144 .Cm noatime .
 145 .Ux
 146 filesystems normally update the last-accessed time of a file or
 147 directory whenever it is accessed.
 148 However, this creates a massive burden on copy-on-write filesystems like
 149 HAMMER, particularly when scanning the filesystem.
 150 .Dx
 151 currently defaults to disabling atime updates on HAMMER mounts.
 152 It can be enabled by setting the
 153 .Va vfs.hammer.noatime
 154 tunable to 0 in
 155 .Xr loader.conf 5
 156 but we recommend leaving it disabled.
 157 The lack of atime updates can create issues with certain programs
 158 such as when detecting whether unread mail is present, but
 159 applications for the most part no longer depend on it.
 160 .Sh SSD SWAP
 161 The single most important thing you can do is have at least one
 162 solid-state drive in your system, and configure your swap space
 163 on that drive.
 164 If you are using a combination of a smaller SSD and a very larger HDD,
 165 you can use
 166 .Xr swapcache 8
 167 to automatically cache data from your HDD.
 168 But even if you do not, having swap space configured on your SSD will
 169 significantly improve performance under even modest paging loads.
 170 It is particularly useful to configure a significant amount of swap
 171 on a workstation, 32GB or more is not uncommon, to handle bloated
 172 leaky applications such as browsers.
 173 .Sh SYSCTL TUNING
 174 .Xr sysctl 8
 175 variables permit system behavior to be monitored and controlled at
 176 run-time.
 177 Some sysctls simply report on the behavior of the system; others allow
 178 the system behavior to be modified;
 179 some may be set at boot time using
 180 .Xr rc.conf 5 ,
 181 but most will be set via
 182 .Xr sysctl.conf 5 .
 183 There are several hundred sysctls in the system, including many that appear
 184 to be candidates for tuning but actually are not.
 185 In this document we will only cover the ones that have the greatest effect
 186 on the system.
 187 .Pp
 188 The
 189 .Va kern.ipc.shm_use_phys
 190 sysctl defaults to 1 (on) and may be set to 0 (off) or 1 (on).
 191 Setting
 192 this parameter to 1 will cause all System V shared memory segments to be
 193 mapped to unpageable physical RAM.
 194 This feature only has an effect if you
 195 are either (A) mapping small amounts of shared memory across many (hundreds)
 196 of processes, or (B) mapping large amounts of shared memory across any
 197 number of processes.
 198 This feature allows the kernel to remove a great deal
 199 of internal memory management page-tracking overhead at the cost of wiring
 200 the shared memory into core, making it unswappable.
 201 .Pp
 202 The
 203 .Va vfs.write_behind
 204 sysctl defaults to 1 (on).  This tells the filesystem to issue media
 205 writes as full clusters are collected, which typically occurs when writing
 206 large sequential files.  The idea is to avoid saturating the buffer
 207 cache with dirty buffers when it would not benefit I/O performance.  However,
 208 this may stall processes and under certain circumstances you may wish to turn
 209 it off.
 210 .Pp
 211 The
 212 .Va vfs.hirunningspace
 213 sysctl determines how much outstanding write I/O may be queued to
 214 disk controllers system wide at any given instance.  The default is
 215 usually sufficient but on machines with lots of disks you may want to bump
 216 it up to four or five megabytes.  Note that setting too high a value
 217 (exceeding the buffer cache's write threshold) can lead to extremely
 218 bad clustering performance.  Do not set this value arbitrarily high!  Also,
 219 higher write queueing values may add latency to reads occurring at the same
 220 time.
 221 The
 222 .Va vfs.bufcache_bw
 223 controls data cycling within the buffer cache.  I/O bandwidth less than
 224 this specification (per second) will cycle into the much larger general
 225 VM page cache while I/O bandwidth in excess of this specification will
 226 be recycled within the buffer cache, reducing the load on the rest of
 227 the VM system.
 228 The default value is 200 megabytes (209715200), which means that the
 229 system will try harder to cache data coming off a slower hard drive
 230 and less hard trying to cache data coming off a fast SSD.
 231 This parameter is particularly important if you have NVMe drives in
 232 your system as these storage devices are capable of transferring
 233 well over 2GBytes/sec into the system.
 234 .Pp
 235 There are various other buffer-cache and VM page cache related sysctls.
 236 We do not recommend modifying their values.
 237 .Pp
 238 The
 239 .Va net.inet.tcp.sendspace
 240 and
 241 .Va net.inet.tcp.recvspace
 242 sysctls are of particular interest if you are running network intensive
 243 applications.
 244 They control the amount of send and receive buffer space
 245 allowed for any given TCP connection.
 246 However,
 247 .Dx
 248 now auto-tunes these parameters using a number of other related
 249 sysctls (run 'sysctl net.inet.tcp' to get a list) and usually
 250 no longer need to be tuned manually.
 251 We do not recommend
 252 increasing or decreasing the defaults if you are managing a very large
 253 number of connections.
 254 Note that the routing table (see
 255 .Xr route 8 )
 256 can be used to introduce route-specific send and receive buffer size
 257 defaults.
 258 .Pp
 259 As an additional management tool you can use pipes in your
 260 firewall rules (see
 261 .Xr ipfw 8 )
 262 to limit the bandwidth going to or from particular IP blocks or ports.
 263 For example, if you have a T1 you might want to limit your web traffic
 264 to 70% of the T1's bandwidth in order to leave the remainder available
 265 for mail and interactive use.
 266 Normally a heavily loaded web server
 267 will not introduce significant latencies into other services even if
 268 the network link is maxed out, but enforcing a limit can smooth things
 269 out and lead to longer term stability.
 270 Many people also enforce artificial
 271 bandwidth limitations in order to ensure that they are not charged for
 272 using too much bandwidth.
 273 .Pp
 274 Setting the send or receive TCP buffer to values larger than 65535 will result
 275 in a marginal performance improvement unless both hosts support the window
 276 scaling extension of the TCP protocol, which is controlled by the
 277 .Va net.inet.tcp.rfc1323
 278 sysctl.
 279 These extensions should be enabled and the TCP buffer size should be set
 280 to a value larger than 65536 in order to obtain good performance from
 281 certain types of network links; specifically, gigabit WAN links and
 282 high-latency satellite links.
 283 RFC 1323 support is enabled by default.
 284 .Pp
 285 The
 286 .Va net.inet.tcp.always_keepalive
 287 sysctl determines whether or not the TCP implementation should attempt
 288 to detect dead TCP connections by intermittently delivering
 289 .Dq keepalives
 290 on the connection.
 291 By default, this is now enabled for all applications.
 292 We do not recommend turning it off.
 293 The extra network bandwidth is minimal and this feature will clean-up
 294 stalled and long-dead connections that might not otherwise be cleaned
 295 up.
 296 In the past people using dialup connections often did not want to
 297 use this feature in order to be able to retain connections across
 298 long disconnections, but in modern day the only default that makes
 299 sense is for the feature to be turned on.
 300 .Pp
 301 The
 302 .Va net.inet.tcp.delayed_ack
 303 TCP feature is largely misunderstood.  Historically speaking this feature
 304 was designed to allow the acknowledgement to transmitted data to be returned
 305 along with the response.  For example, when you type over a remote shell
 306 the acknowledgement to the character you send can be returned along with the
 307 data representing the echo of the character.   With delayed acks turned off
 308 the acknowledgement may be sent in its own packet before the remote service
 309 has a chance to echo the data it just received.  This same concept also
 310 applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the
 311 number of tiny packets flowing across the network in half.   The
 312 .Dx
 313 delayed-ack implementation also follows the TCP protocol rule that
 314 at least every other packet be acknowledged even if the standard 100ms
 315 timeout has not yet passed.  Normally the worst a delayed ack can do is
 316 slightly delay the teardown of a connection, or slightly delay the ramp-up
 317 of a slow-start TCP connection.  While we aren't sure we believe that
 318 the several FAQs related to packages such as SAMBA and SQUID which advise
 319 turning off delayed acks may be referring to the slow-start issue.
 320 .Pp
 321 The
 322 .Va net.inet.tcp.inflight_enable
 323 sysctl turns on bandwidth delay product limiting for all TCP connections.
 324 This feature is now turned on by default and we recommend that it be
 325 left on.
 326 It will slightly reduce the maximum bandwidth of a connection but the
 327 benefits of the feature in reducing packet backlogs at router constriction
 328 points are enormous.
 329 These benefits make it a whole lot easier for router algorithms to manage
 330 QOS for multiple connections.
 331 The limiting feature reduces the amount of data built up in intermediate
 332 router and switch packet queues as well as reduces the amount of data built
 333 up in the local host's interface queue.  With fewer packets queued up,
 334 interactive connections, especially over slow modems, will also be able
 335 to operate with lower round trip times.  However, note that this feature
 336 only affects data transmission (uploading / server-side).  It does not
 337 affect data reception (downloading).
 338 .Pp
 339 The system will attempt to calculate the bandwidth delay product for each
 340 connection and limit the amount of data queued to the network to just the
 341 amount required to maintain optimum throughput.  This feature is useful
 342 if you are serving data over modems, GigE, or high speed WAN links (or
 343 any other link with a high bandwidth*delay product), especially if you are
 344 also using window scaling or have configured a large send window.
 345 .Pp
 346 For production use setting
 347 .Va net.inet.tcp.inflight_min
 348 to at least 6144 may be beneficial.  Note, however, that setting high
 349 minimums may effectively disable bandwidth limiting depending on the link.
 350 .Pp
 351 Adjusting
 352 .Va net.inet.tcp.inflight_stab
 353 is not recommended.
 354 This parameter defaults to 50, representing +5% fudge when calculating the
 355 bwnd from the bw.  This fudge is on top of an additional fixed +2*maxseg
 356 added to bwnd.  The fudge factor is required to stabilize the algorithm
 357 at very high speeds while the fixed 2*maxseg stabilizes the algorithm at
 358 low speeds.  If you increase this value excessive packet buffering may occur.
 359 .Pp
 360 The
 361 .Va net.inet.ip.portrange.*
 362 sysctls control the port number ranges automatically bound to TCP and UDP
 363 sockets.  There are three ranges:  A low range, a default range, and a
 364 high range, selectable via an IP_PORTRANGE
 365 .Fn setsockopt
 366 call.
 367 Most network programs use the default range which is controlled by
 368 .Va net.inet.ip.portrange.first
 369 and
 370 .Va net.inet.ip.portrange.last ,
 371 which defaults to 1024 and 5000 respectively.  Bound port ranges are
 372 used for outgoing connections and it is possible to run the system out
 373 of ports under certain circumstances.  This most commonly occurs when you are
 374 running a heavily loaded web proxy.  The port range is not an issue
 375 when running serves which handle mainly incoming connections such as a
 376 normal web server, or has a limited number of outgoing connections such
 377 as a mail relay.  For situations where you may run yourself out of
 378 ports we recommend increasing
 379 .Va net.inet.ip.portrange.last
 380 modestly.  A value of 10000 or 20000 or 30000 may be reasonable.  You should
 381 also consider firewall effects when changing the port range.  Some firewalls
 382 may block large ranges of ports (usually low-numbered ports) and expect systems
 383 to use higher ranges of ports for outgoing connections.  For this reason
 384 we do not recommend that
 385 .Va net.inet.ip.portrange.first
 386 be lowered.
 387 .Pp
 388 The
 389 .Va kern.ipc.somaxconn
 390 sysctl limits the size of the listen queue for accepting new TCP connections.
 391 The default value of 128 is typically too low for robust handling of new
 392 connections in a heavily loaded web server environment.
 393 For such environments,
 394 we recommend increasing this value to 1024 or higher.
 395 The service daemon
 396 may itself limit the listen queue size (e.g.\&
 397 .Xr sendmail 8 ,
 398 apache) but will
 399 often have a directive in its configuration file to adjust the queue size up.
 400 Larger listen queues also do a better job of fending off denial of service
 401 attacks.
 402 .Pp
 403 The
 404 .Va kern.maxvnodes
 405 specifies how many vnodes and related file structures the kernel will
 406 cache.
 407 The kernel uses a very generous default for this parameter based on
 408 available physical memory.
 409 You generally do not want to mess with this parameter as it directly
 410 effects how well the kernel can cache not only file structures but also
 411 the underlying file data.
 412 .Pp
 413 However, situations may crop up where caching too many vnodes can wind
 414 up eating too much kernel memory due to filesystem resources that are
 415 also associated with the vnodes.
 416 You can lower this value if kernel memory use is higher than you would like.
 417 It is, in fact, possible for the system to have more files open than the
 418 value of this tunable, but as files are closed the system will try to
 419 reduce the actual number of cached vnodes to match this value.
 420 .Pp
 421 The
 422 .Va kern.maxfiles
 423 sysctl determines how many open files the system supports.
 424 The default is
 425 typically based on available physical memory but you may need to bump
 426 it up if you are running databases or large descriptor-heavy daemons.
 427 The read-only
 428 .Va kern.openfiles
 429 sysctl may be interrogated to determine the current number of open files
 430 on the system.
 431 .Pp
 432 The
 433 .Va vm.swap_idle_enabled
 434 sysctl is useful in large multi-user systems where you have lots of users
 435 entering and leaving the system and lots of idle processes.
 436 Such systems
 437 tend to generate a great deal of continuous pressure on free memory reserves.
 438 Turning this feature on and adjusting the swapout hysteresis (in idle
 439 seconds) via
 440 .Va vm.swap_idle_threshold1
 441 and
 442 .Va vm.swap_idle_threshold2
 443 allows you to depress the priority of pages associated with idle processes
 444 more quickly than the normal pageout algorithm.
 445 This gives a helping hand
 446 to the pageout daemon.
 447 Do not turn this option on unless you need it,
 448 because the tradeoff you are making is to essentially pre-page memory sooner
 449 rather than later, eating more swap and disk bandwidth.
 450 In a small system
 451 this option will have a detrimental effect but in a large system that is
 452 already doing moderate paging this option allows the VM system to stage
 453 whole processes into and out of memory more easily.
 454 .Sh LOADER TUNABLES
 455 Some aspects of the system behavior may not be tunable at runtime because
 456 memory allocations they perform must occur early in the boot process.
 457 To change loader tunables, you must set their values in
 458 .Xr loader.conf 5
 459 and reboot the system.
 460 .Pp
 461 .Va kern.maxusers
 462 is automatically sized at boot based on the amount of memory available in
 463 the system.  The value can be read (but not written) via sysctl.
 464 .Pp
 465 You can change this value as a loader tunable if the default resource
 466 limits are not sufficient.
 467 This tunable works primarily by adjusting
 468 .Va kern.maxproc ,
 469 so you can opt to override that instead.
 470 It is generally easier formulate an adjustment to
 471 .Va kern.maxproc
 472 instead of
 473 .Va kern.maxusers .
 474 .Pp
 475 .Va kern.maxproc
 476 controls most kernel auto-scaling components.  If kernel resource limits
 477 are not scaled high enough, setting this tunables to a higher value is
 478 usually sufficient.
 479 Generally speaking you will want to set this tunable to the upper limit
 480 for the number of process threads you want the kernel to be able to handle.
 481 The kernel may still decide to cap maxproc at a lower value if there is
 482 insufficient ram to scale resources as desired.
 483 .Pp
 484 Only set this tunable if the defaults are not sufficient.
 485 Do not use this tunable to try to trim kernel resource limits, you will
 486 not actually save much memory by doing so and you will leave the system
 487 more vulnerable to DOS attacks and runaway processes.
 488 .Pp
 489 Setting this tunable will scale the maximum number processes, pipes and
 490 sockets, total open files the system can support, and increase mbuf
 491 and mbuf-cluster limits.  These other elements can also be separately
 492 overridden to fine-tune the setup.  We rcommend setting this tunable
 493 first to create a baseline.
 494 .Pp
 495 Setting a high value presumes that you have enough physical memory to
 496 support the resource utilization.  For example, your system would need
 497 approximately 128GB of ram to reasonably support a maxproc value of
 498 4 million (4000000).  The default maxproc given that much ram will
 499 typically be in the 250000 range.
 500 .Pp
 501 Note that the PID is currently limited to 6 digits, so a system cannot
 502 have more than a million processes operating anyway (though the aggregate
 503 number of threads can be far greater).
 504 And yes, there is in fact no reason why a very well-endowed system
 505 couldn't have that many processes.
 506 .Pp
 507 .Va kern.nbuf
 508 sets how many filesystem buffers the kernel should cache.
 509 Filesystem buffers can be up to 128KB each.
 510 UFS typically uses an 8KB blocksize while HAMMER typically uses 64KB.
 511 The defaults usually suffice.
 512 The cached buffers represent wired physical memory so specifying a value
 513 that is too large can result in excessive kernel memory use, and is also
 514 not entirely necessary since the pages backing the buffers are also
 515 cached by the VM page cache (which does not use wired memory).
 516 The buffer cache significantly improves the hot path for cached file
 517 accesses and dirty data.
 518 .Pp
 519 The kernel reserves (128KB * nbuf) bytes of KVM.  The actual physical
 520 memory use depends on the filesystem buffer size.
 521 .Pp
 522 The
 523 .Va kern.dfldsiz
 524 and
 525 .Va kern.dflssiz
 526 tunables set the default soft limits for process data and stack size
 527 respectively.
 528 Processes may increase these up to the hard limits by calling
 529 .Xr setrlimit 2 .
 530 The
 531 .Va kern.maxdsiz ,
 532 .Va kern.maxssiz ,
 533 and
 534 .Va kern.maxtsiz
 535 tunables set the hard limits for process data, stack, and text size
 536 respectively; processes may not exceed these limits.
 537 The
 538 .Va kern.sgrowsiz
 539 tunable controls how much the stack segment will grow when a process
 540 needs to allocate more stack.
 541 .Pp
 542 .Va kern.ipc.nmbclusters
 543 and
 544 .Va kern.ipc.nmbjclusters
 545 may be adjusted to increase the number of network mbufs the system is
 546 willing to allocate.
 547 Each normal cluster represents approximately 2K of memory,
 548 so a value of 1024 represents 2M of kernel memory reserved for network
 549 buffers.
 550 Each 'j' cluster is typically 4KB, so a value of 1024 represents 4M of
 551 kernel memory.
 552 You can do a simple calculation to figure out how many you need but
 553 keep in mind that tcp buffer sizing is now more dynamic than it used to
 554 be.
 555 .Pp
 556 The defaults usually suffice but you may want to bump it up on service-heavy
 557 machines.
 558 Modern machines often need a large number of mbufs to operate services
 559 efficiently, values of 65536, even upwards of 262144 or more are common.
 560 If you are running a server, it is better to be generous than to be frugal.
 561 Remember the memory calculation though.
 562 .Pp
 563 Under no circumstances
 564 should you specify an arbitrarily high value for this parameter, it could
 565 lead to a boot-time crash.
 566 The
 567 .Fl m
 568 option to
 569 .Xr netstat 1
 570 may be used to observe network cluster use.
 571 .Sh KERNEL CONFIG TUNING
 572 There are a number of kernel options that you may have to fiddle with in
 573 a large-scale system.
 574 In order to change these options you need to be
 575 able to compile a new kernel from source.
 576 The
 577 .Xr config 8
 578 manual page and the handbook are good starting points for learning how to
 579 do this.
 580 Generally speaking, removing options to trim the size of the kernel
 581 is not going to save very much memory on a modern system.
 582 In the grand scheme of things, saving a megabyte or two is in the noise
 583 on a system that likely has multiple gigabytes of memory.
 584 .Pp
 585 If your motherboard is AHCI-capable then we strongly recommend turning
 586 on AHCI mode in the BIOS if it is not already the default.
 587 .Sh CPU, MEMORY, DISK, NETWORK
 588 The type of tuning you do depends heavily on where your system begins to
 589 bottleneck as load increases.
 590 If your system runs out of CPU (idle times
 591 are perpetually 0%) then you need to consider upgrading the CPU or moving to
 592 an SMP motherboard (multiple CPU's), or perhaps you need to revisit the
 593 programs that are causing the load and try to optimize them.
 594 If your system
 595 is paging to swap a lot you need to consider adding more memory.
 596 If your
 597 system is saturating the disk you typically see high CPU idle times and
 598 total disk saturation.
 599 .Xr systat 1
 600 can be used to monitor this.
 601 There are many solutions to saturated disks:
 602 increasing memory for caching, mirroring disks, distributing operations across
 603 several machines, and so forth.
 604 .Pp
 605 Finally, you might run out of network suds.
 606 Optimize the network path
 607 as much as possible.
 608 If you are operating a machine as a router you may need to
 609 setup a
 610 .Xr pf 4
 611 firewall (also see
 612 .Xr firewall 7 .
 613 .Dx
 614 has a very good fair-share queueing algorithm for QOS in
 615 .Xr pf 4 .
 616 .Sh SOURCE OF KERNEL MEMORY USAGE
 617 The primary sources of kernel memory usage are:
 618 .Bl -tag -width ".Va kern.maxvnodes"
 619 .It Va kern.maxvnodes
 620 The maximum number of cached vnodes in the system.
 621 These can eat quite a bit of kernel memory, primarily due to auxiliary
 622 structures tracked by the HAMMER filesystem.
 623 It is relatively easy to configure a smaller value, but we do not
 624 recommend reducing this parameter below 100000.
 625 Smaller values directly impact the number of discrete files the
 626 kernel can cache data for at once.
 627 .It Va kern.ipc.nmbclusters , Va kern.ipc.nmbjclusters
 628 Calculate approximately 2KB per normal cluster and 4KB per jumbo
 629 cluster.
 630 Do not make these values too low or you risk deadlocking the network
 631 stack.
 632 .It Va kern.nbuf
 633 The number of filesystem buffers managed by the kernel.
 634 The kernel wires the underlying cached VM pages, typically 8KB (UFS) or
 635 64KB (HAMMER) per buffer.
 636 .It swap/swapcache
 637 Swap memory requires approximately 1MB of physical ram for each 1GB
 638 of swap space.
 639 When swapcache is used, additional memory may be required to keep
 640 VM objects around longer (only really reducable by reducing the
 641 value of
 642 .Va kern.maxvnodes
 643 which you can do post-boot if you desire).
 644 .It tmpfs
 645 Tmpfs is very useful but keep in mind that while the file data itself
 646 is backed by swap, the meta-data (the directory topology) requires
 647 wired kernel memory.
 648 .It mmu page tables
 649 Even though the underlying data pages themselves can be paged to swap,
 650 the page tables are usually wired into memory.
 651 This can create problems when a large number of processes are mmap()ing
 652 very large files.
 653 Sometimes turning on
 654 .Va machdep.pmap_mmu_optimize
 655 suffices to reduce overhead.
 656 Page table kernel memory use can be observed by using 'vmstat -z'
 657 .It Va kern.ipc.shm_use_phys
 658 It is sometimes necessary to force shared memory to use physical memory
 659 when running a large database which uses shared memory to implement its
 660 own data caching.
 661 The use of sysv shared memory in this regard allows the database to
 662 distinguish between data which it knows it can access instantly (i.e.
 663 without even having to page-in from swap) verses data which it might require
 664 and I/O to fetch.
 665 .Pp
 666 If you use this feature be very careful with regards to the database's
 667 shared memory configuration as you will be wiring the memory.
 668 .El
 669 .Sh SEE ALSO
 670 .Xr netstat 1 ,
 671 .Xr systat 1 ,
 672 .Xr dm 4 ,
 673 .Xr dummynet 4 ,
 674 .Xr nata 4 ,
 675 .Xr pf 4 ,
 676 .Xr login.conf 5 ,
 677 .Xr pf.conf 5 ,
 678 .Xr rc.conf 5 ,
 679 .Xr sysctl.conf 5 ,
 680 .Xr firewall 7 ,
 681 .Xr hier 7 ,
 682 .Xr boot 8 ,
 683 .Xr ccdconfig 8 ,
 684 .Xr config 8 ,
 685 .Xr disklabel 8 ,
 686 .Xr fsck 8 ,
 687 .Xr ifconfig 8 ,
 688 .Xr ipfw 8 ,
 689 .Xr loader 8 ,
 690 .Xr mount 8 ,
 691 .Xr newfs 8 ,
 692 .Xr route 8 ,
 693 .Xr sysctl 8 ,
 694 .Xr tunefs 8
 695 .Sh HISTORY
 696 The
 697 .Nm
 698 manual page was inherited from
 699 .Fx
 700 and first appeared in
 701 .Fx 4.3 ,
 702 May 2001.
 703 .Sh AUTHORS
 704 The
 705 .Nm
 706 manual page was originally written by
 707 .An Matthew Dillon .