1 <?xml version="1.0" encoding="iso-8859-1"?>
3 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
4 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
6 <refentry id="ctdb-tunables.7">
9 <refentrytitle>ctdb-tunables</refentrytitle>
10 <manvolnum>7</manvolnum>
11 <refmiscinfo class="source">ctdb</refmiscinfo>
12 <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo>
16 <refname>ctdb-tunables</refname>
17 <refpurpose>CTDB tunable configuration variables</refpurpose>
21 <title>DESCRIPTION</title>
24 CTDB's behaviour can be configured by setting run-time tunable
25 variables. This lists and describes all tunables. See the
26 <citerefentry><refentrytitle>ctdb</refentrytitle>
27 <manvolnum>1</manvolnum></citerefentry>
28 <command>listvars</command>, <command>setvar</command> and
29 <command>getvar</command> commands for more details.
33 Unless otherwise stated, tunables should be set to the same
34 value on all nodes. Setting tunables to different values across
35 nodes may produce unexpected results. Future releases may set
36 (some or most) tunables globally across the cluster but doing so
37 is currently a manual process.
41 The tunable variables are listed alphabetically.
45 <title>AllowClientDBAttach</title>
46 <para>Default: 1</para>
48 When set to 0, clients are not allowed to attach to any databases.
49 This can be used to temporarily block any new processes from
50 attaching to and accessing the databases. This is mainly used
51 for detaching a volatile database using 'ctdb detach'.
56 <title>AllowMixedVersions</title>
57 <para>Default: 0</para>
59 CTDB will not allow incompatible versions to co-exist in
60 a cluster. If a version mismatch is found, then losing CTDB
61 will shutdown. To disable the incompatible version check,
62 set this tunable to 1.
65 For version checking, CTDB uses major and minor version.
66 For example, CTDB 4.6.1 and CTDB CTDB 4.6.2 are matching versions;
67 CTDB 4.5.x and CTDB 4.6.y do not match.
70 CTDB with version check support will lose to CTDB without
71 version check support. Between two different CTDB versions with
72 version check support, one running for less time will lose.
73 If the running time for both CTDB versions with version check
74 support is equal (to seconds), then the older version will lose.
75 The losing CTDB daemon will shutdown.
80 <title>AllowUnhealthyDBRead</title>
81 <para>Default: 0</para>
83 When set to 1, ctdb allows database traverses to read unhealthy
84 databases. By default, ctdb does not allow reading records from
90 <title>ControlTimeout</title>
91 <para>Default: 60</para>
93 This is the default setting for timeout for when sending a
94 control message to either the local or a remote ctdb daemon.
99 <title>DatabaseHashSize</title>
100 <para>Default: 100001</para>
102 Number of the hash chains for the local store of the tdbs that
108 <title>DatabaseMaxDead</title>
109 <para>Default: 5</para>
111 Maximum number of dead records per hash chain for the tdb databses
117 <title>DBRecordCountWarn</title>
118 <para>Default: 100000</para>
120 When set to non-zero, ctdb will log a warning during recovery if
121 a database has more than this many records. This will produce a
122 warning if a database grows uncontrollably with orphaned records.
127 <title>DBRecordSizeWarn</title>
128 <para>Default: 10000000</para>
130 When set to non-zero, ctdb will log a warning during recovery
131 if a single record is bigger than this size. This will produce
132 a warning if a database record grows uncontrollably.
137 <title>DBSizeWarn</title>
138 <para>Default: 1000000000</para>
140 When set to non-zero, ctdb will log a warning during recovery if
141 a database size is bigger than this. This will produce a warning
142 if a database grows uncontrollably.
147 <title>DeferredAttachTO</title>
148 <para>Default: 120</para>
150 When databases are frozen we do not allow clients to attach to
151 the databases. Instead of returning an error immediately to the
152 client, the attach request from the client is deferred until
153 the database becomes available again at which stage we respond
157 This timeout controls how long we will defer the request from the
158 client before timing it out and returning an error to the client.
163 <title>DisableIPFailover</title>
164 <para>Default: 0</para>
166 When set to non-zero, ctdb will not perform failover or
167 failback. Even if a node fails while holding public IPs, ctdb
168 will not recover the IPs or assign them to another node.
171 When this tunable is enabled, ctdb will no longer attempt
172 to recover the cluster by failing IP addresses over to other
173 nodes. This leads to a service outage until the administrator
174 has manually performed IP failover to replacement nodes using the
175 'ctdb moveip' command.
180 <title>ElectionTimeout</title>
181 <para>Default: 3</para>
183 The number of seconds to wait for the election of recovery
184 master to complete. If the election is not completed during this
185 interval, then that round of election fails and ctdb starts a
191 <title>EnableBans</title>
192 <para>Default: 1</para>
194 This parameter allows ctdb to ban a node if the node is misbehaving.
197 When set to 0, this disables banning completely in the cluster
198 and thus nodes can not get banned, even it they break. Don't
199 set to 0 unless you know what you are doing.
204 <title>EventScriptTimeout</title>
205 <para>Default: 30</para>
207 Maximum time in seconds to allow an event to run before timing
208 out. This is the total time for all enabled scripts that are
209 run for an event, not just a single event script.
212 Note that timeouts are ignored for some events ("takeip",
213 "releaseip", "startrecovery", "recovered") and converted to
214 success. The logic here is that the callers of these events
215 implement their own additional timeout.
220 <title>FetchCollapse</title>
221 <para>Default: 1</para>
223 This parameter is used to avoid multiple migration requests for
224 the same record from a single node. All the record requests for
225 the same record are queued up and processed when the record is
226 migrated to the current node.
229 When many clients across many nodes try to access the same record
230 at the same time this can lead to a fetch storm where the record
231 becomes very active and bounces between nodes very fast. This
232 leads to high CPU utilization of the ctdbd daemon, trying to
233 bounce that record around very fast, and poor performance.
234 This can improve performance and reduce CPU utilization for
240 <title>HopcountMakeSticky</title>
241 <para>Default: 50</para>
243 For database(s) marked STICKY (using 'ctdb setdbsticky'),
244 any record that is migrating so fast that hopcount
245 exceeds this limit is marked as STICKY record for
246 <varname>StickyDuration</varname> seconds. This means that
247 after each migration the sticky record will be kept on the node
248 <varname>StickyPindown</varname>milliseconds and prevented from
249 being migrated off the node.
252 This will improve performance for certain workloads, such as
253 locking.tdb if many clients are opening/closing the same file
259 <title>IPAllocAlgorithm</title>
260 <para>Default: 2</para>
262 Selects the algorithm that CTDB should use when doing public
263 IP address allocation. Meaningful values are:
270 Deterministic IP address allocation.
273 This is a simple and fast option. However, it can cause
274 unnecessary address movement during fail-over because
275 each address has a "home" node. Works badly when some
276 nodes do not have any addresses defined. Should be used
277 with care when addresses are defined across multiple
286 Non-deterministic IP address allocation.
289 This is a relatively fast option that attempts to do a
290 minimise unnecessary address movements. Addresses do
291 not have a "home" node. Rebalancing is limited but it
292 usually adequate. Works badly when addresses are
293 defined across multiple networks.
301 LCP2 IP address allocation.
304 Uses a heuristic to assign addresses defined across
305 multiple networks, usually balancing addresses on each
306 network evenly across nodes. Addresses do not have a
307 "home" node. Minimises unnecessary address movements.
308 The algorithm is complex, so is slower than other
309 choices for a large number of addresses. However, it
310 can calculate an optimal assignment of 900 addresses in
311 under 10 seconds on modern hardware.
317 If the specified value is not one of these then the default
323 <title>KeepaliveInterval</title>
324 <para>Default: 5</para>
326 How often in seconds should the nodes send keep-alive packets to
332 <title>KeepaliveLimit</title>
333 <para>Default: 5</para>
335 After how many keepalive intervals without any traffic should
336 a node wait until marking the peer as DISCONNECTED.
339 If a node has hung, it can take
340 <varname>KeepaliveInterval</varname> *
341 (<varname>KeepaliveLimit</varname> + 1) seconds before
342 ctdb determines that the node is DISCONNECTED and performs
343 a recovery. This limit should not be set too high to enable
344 early detection and avoid any application timeouts (e.g. SMB1)
345 to kick in before the fail over is completed.
350 <title>LockProcessesPerDB</title>
351 <para>Default: 200</para>
353 This is the maximum number of lock helper processes ctdb will
354 create for obtaining record locks. When ctdb cannot get a record
355 lock without blocking, it creates a helper process that waits
356 for the lock to be obtained.
361 <title>LogLatencyMs</title>
362 <para>Default: 0</para>
364 When set to non-zero, ctdb will log if certains operations
365 take longer than this value, in milliseconds, to complete.
366 These operations include "process a record request from client",
367 "take a record or database lock", "update a persistent database
368 record" and "vaccum a database".
373 <title>MaxQueueDropMsg</title>
374 <para>Default: 1000000</para>
376 This is the maximum number of messages to be queued up for
377 a client before ctdb will treat the client as hung and will
378 terminate the client connection.
383 <title>MonitorInterval</title>
384 <para>Default: 15</para>
386 How often should ctdb run the 'monitor' event in seconds to check
392 <title>MonitorTimeoutCount</title>
393 <para>Default: 20</para>
395 How many 'monitor' events in a row need to timeout before a node
396 is flagged as UNHEALTHY. This setting is useful if scripts can
397 not be written so that they do not hang for benign reasons.
402 <title>NoIPFailback</title>
403 <para>Default: 0</para>
405 When set to 1, ctdb will not perform failback of IP addresses
406 when a node becomes healthy. When a node becomes UNHEALTHY,
407 ctdb WILL perform failover of public IP addresses, but when the
408 node becomes HEALTHY again, ctdb will not fail the addresses back.
411 Use with caution! Normally when a node becomes available to the
412 cluster ctdb will try to reassign public IP addresses onto the
413 new node as a way to distribute the workload evenly across the
414 clusternode. Ctdb tries to make sure that all running nodes have
415 approximately the same number of public addresses it hosts.
418 When you enable this tunable, ctdb will no longer attempt to
419 rebalance the cluster by failing IP addresses back to the new
420 nodes. An unbalanced cluster will therefore remain unbalanced
421 until there is manual intervention from the administrator. When
422 this parameter is set, you can manually fail public IP addresses
423 over to the new node(s) using the 'ctdb moveip' command.
428 <title>NoIPHostOnAllDisabled</title>
429 <para>Default: 0</para>
431 If no nodes are HEALTHY then by default ctdb will happily host
432 public IPs on disabled (unhealthy or administratively disabled)
433 nodes. This can cause problems, for example if the underlying
434 cluster filesystem is not mounted. When set to 1 and a node
435 is disabled, any IPs hosted by this node will be released and
436 the node will not takeover any IPs until it is no longer disabled.
441 <title>NoIPTakeover</title>
442 <para>Default: 0</para>
444 When set to 1, ctdb will not allow IP addresses to be failed
445 over to other nodes. Any IP addresses already hosted on
446 healthy nodes will remain. Usually IP addresses hosted on
447 unhealthy nodes will also remain, if NoIPHostOnAllDisabled is
448 0. However, if NoIPHostOnAllDisabled is 1 then IP addresses
449 will be released by unhealthy nodes and will become un-hosted.
454 <title>PullDBPreallocation</title>
455 <para>Default: 10*1024*1024</para>
457 This is the size of a record buffer to pre-allocate for sending
458 reply to PULLDB control. Usually record buffer starts with size
459 of the first record and gets reallocated every time a new record
460 is added to the record buffer. For a large number of records,
461 this can be very inefficient to grow the record buffer one record
467 <title>QueueBufferSize</title>
468 <para>Default: 1024</para>
470 This is the maximum amount of data (in bytes) ctdb will read
471 from a socket at a time.
474 For a busy setup, if ctdb is not able to process the TCP sockets
475 fast enough (large amount of data in Recv-Q for tcp sockets),
476 then this tunable value should be increased. However, large
477 values can keep ctdb busy processing packets and prevent ctdb
478 from handling other events.
483 <title>RecBufferSizeLimit</title>
484 <para>Default: 1000000</para>
486 This is the limit on the size of the record buffer to be sent
487 in various controls. This limit is used by new controls used
488 for recovery and controls used in vacuuming.
493 <title>RecdFailCount</title>
494 <para>Default: 10</para>
496 If the recovery daemon has failed to ping the main dameon for
497 this many consecutive intervals, the main daemon will consider
498 the recovery daemon as hung and will try to restart it to recover.
503 <title>RecdPingTimeout</title>
504 <para>Default: 60</para>
506 If the main dameon has not heard a "ping" from the recovery dameon
507 for this many seconds, the main dameon will log a message that
508 the recovery daemon is potentially hung. This also increments a
509 counter which is checked against <varname>RecdFailCount</varname>
510 for detection of hung recovery daemon.
515 <title>RecLockLatencyMs</title>
516 <para>Default: 1000</para>
518 When using a reclock file for split brain prevention, if set
519 to non-zero this tunable will make the recovery dameon log a
520 message if the fcntl() call to lock/testlock the recovery file
521 takes longer than this number of milliseconds.
526 <title>RecoverInterval</title>
527 <para>Default: 1</para>
529 How frequently in seconds should the recovery daemon perform the
530 consistency checks to determine if it should perform a recovery.
535 <title>RecoverTimeout</title>
536 <para>Default: 120</para>
538 This is the default setting for timeouts for controls when sent
539 from the recovery daemon. We allow longer control timeouts from
540 the recovery daemon than from normal use since the recovery
541 dameon often use controls that can take a lot longer than normal
547 <title>RecoveryBanPeriod</title>
548 <para>Default: 300</para>
550 The duration in seconds for which a node is banned if the node
551 fails during recovery. After this time has elapsed the node will
552 automatically get unbanned and will attempt to rejoin the cluster.
555 A node usually gets banned due to real problems with the node.
556 Don't set this value too small. Otherwise, a problematic node
557 will try to re-join cluster too soon causing unnecessary recoveries.
562 <title>RecoveryDropAllIPs</title>
563 <para>Default: 120</para>
565 If a node is stuck in recovery, or stopped, or banned, for this
566 many seconds, then ctdb will release all public addresses on
572 <title>RecoveryGracePeriod</title>
573 <para>Default: 120</para>
575 During recoveries, if a node has not caused recovery failures
576 during the last grace period in seconds, any records of
577 transgressions that the node has caused recovery failures will be
578 forgiven. This resets the ban-counter back to zero for that node.
583 <title>RepackLimit</title>
584 <para>Default: 10000</para>
586 During vacuuming, if the number of freelist records are more than
587 <varname>RepackLimit</varname>, then the database is repacked
588 to get rid of the freelist records to avoid fragmentation.
591 Databases are repacked only if both <varname>RepackLimit</varname>
592 and <varname>VacuumLimit</varname> are exceeded.
597 <title>RerecoveryTimeout</title>
598 <para>Default: 10</para>
600 Once a recovery has completed, no additional recoveries are
601 permitted until this timeout in seconds has expired.
606 <title>SeqnumInterval</title>
607 <para>Default: 1000</para>
609 Some databases have seqnum tracking enabled, so that samba will
610 be able to detect asynchronously when there has been updates
611 to the database. Everytime a database is updated its sequence
615 This tunable is used to specify in milliseconds how frequently
616 ctdb will send out updates to remote nodes to inform them that
617 the sequence number is increased.
622 <title>StatHistoryInterval</title>
623 <para>Default: 1</para>
625 Granularity of the statistics collected in the statistics
626 history. This is reported by 'ctdb stats' command.
631 <title>StickyDuration</title>
632 <para>Default: 600</para>
634 Once a record has been marked STICKY, this is the duration in
635 seconds, the record will be flagged as a STICKY record.
640 <title>StickyPindown</title>
641 <para>Default: 200</para>
643 Once a STICKY record has been migrated onto a node, it will be
644 pinned down on that node for this number of milliseconds. Any
645 request from other nodes to migrate the record off the node will
651 <title>TakeoverTimeout</title>
652 <para>Default: 9</para>
654 This is the duration in seconds in which ctdb tries to complete IP
660 <title>TDBMutexEnabled</title>
661 <para>Default: 1</para>
663 This parameter enables TDB_MUTEX_LOCKING feature on volatile
664 databases if the robust mutexes are supported. This optimizes the
665 record locking using robust mutexes and is much more efficient
666 that using posix locks.
671 <title>TickleUpdateInterval</title>
672 <para>Default: 20</para>
674 Every <varname>TickleUpdateInterval</varname> seconds, ctdb
675 synchronizes the client connection information across nodes.
680 <title>TraverseTimeout</title>
681 <para>Default: 20</para>
683 This is the duration in seconds for which a database traverse
684 is allowed to run. If the traverse does not complete during
685 this interval, ctdb will abort the traverse.
690 <title>VacuumFastPathCount</title>
691 <para>Default: 60</para>
693 During a vacuuming run, ctdb usually processes only the records
694 marked for deletion also called the fast path vacuuming. After
695 finishing <varname>VacuumFastPathCount</varname> number of fast
696 path vacuuming runs, ctdb will trigger a scan of complete database
697 for any empty records that need to be deleted.
702 <title>VacuumInterval</title>
703 <para>Default: 10</para>
705 Periodic interval in seconds when vacuuming is triggered for
711 <title>VacuumLimit</title>
712 <para>Default: 5000</para>
714 During vacuuming, if the number of deleted records are more than
715 <varname>VacuumLimit</varname>, then databases are repacked to
719 Databases are repacked only if both <varname>RepackLimit</varname>
720 and <varname>VacuumLimit</varname> are exceeded.
725 <title>VacuumMaxRunTime</title>
726 <para>Default: 120</para>
728 The maximum time in seconds for which the vacuuming process is
729 allowed to run. If vacuuming process takes longer than this
730 value, then the vacuuming process is terminated.
735 <title>VerboseMemoryNames</title>
736 <para>Default: 0</para>
738 When set to non-zero, ctdb assigns verbose names for some of
739 the talloc allocated memory objects. These names are visible
740 in the talloc memory report generated by 'ctdb dumpmemory'.
747 <title>SEE ALSO</title>
749 <citerefentry><refentrytitle>ctdb</refentrytitle>
750 <manvolnum>1</manvolnum></citerefentry>,
752 <citerefentry><refentrytitle>ctdbd</refentrytitle>
753 <manvolnum>1</manvolnum></citerefentry>,
755 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
756 <manvolnum>5</manvolnum></citerefentry>,
758 <citerefentry><refentrytitle>ctdb</refentrytitle>
759 <manvolnum>7</manvolnum></citerefentry>,
761 <ulink url="http://ctdb.samba.org/"/>
768 This documentation was written by
777 <holder>Andrew Tridgell</holder>
778 <holder>Ronnie Sahlberg</holder>
782 This program is free software; you can redistribute it and/or
783 modify it under the terms of the GNU General Public License as
784 published by the Free Software Foundation; either version 3 of
785 the License, or (at your option) any later version.
788 This program is distributed in the hope that it will be
789 useful, but WITHOUT ANY WARRANTY; without even the implied
790 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
791 PURPOSE. See the GNU General Public License for more details.
794 You should have received a copy of the GNU General Public
795 License along with this program; if not, see
796 <ulink url="http://www.gnu.org/licenses"/>.