1 <?xml version="1.0" encoding="iso-8859-1"?>
3 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
4 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
8 <refentrytitle>ctdb</refentrytitle>
9 <manvolnum>7</manvolnum>
10 <refmiscinfo class="source">ctdb</refmiscinfo>
11 <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo>
16 <refname>ctdb</refname>
17 <refpurpose>Clustered TDB</refpurpose>
21 <title>DESCRIPTION</title>
24 CTDB is a clustered database component in clustered Samba that
25 provides a high-availability load-sharing CIFS server cluster.
29 The main functions of CTDB are:
35 Provide a clustered version of the TDB database with automatic
36 rebuild/recovery of the databases upon node failures.
42 Monitor nodes in the cluster and services running on each node.
48 Manage a pool of public IP addresses that are used to provide
49 services to clients. Alternatively, CTDB can be used with
56 Combined with a cluster filesystem CTDB provides a full
57 high-availablity (HA) environment for services such as clustered
58 Samba, NFS and other services.
63 <title>ANATOMY OF A CTDB CLUSTER</title>
66 A CTDB cluster is a collection of nodes with 2 or more network
67 interfaces. All nodes provide network (usually file/NAS) services
68 to clients. Data served by file services is stored on shared
69 storage (usually a cluster filesystem) that is accessible by all
73 CTDB provides an "all active" cluster, where services are load
74 balanced across all nodes.
79 <title>Private vs Public addresses</title>
82 Each node in a CTDB cluster has multiple IP addresses assigned
88 A single private IP address that is used for communication
94 One or more public IP addresses that are used to provide
95 NAS or other services.
102 <title>Private address</title>
105 Each node is configured with a unique, permanently assigned
106 private address. This address is configured by the operating
107 system. This address uniquely identifies a physical node in
108 the cluster and is the address that CTDB daemons will use to
109 communicate with the CTDB daemons on other nodes.
112 Private addresses are listed in the file specified by the
113 <varname>CTDB_NODES</varname> configuration variable (see
114 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
115 <manvolnum>5</manvolnum></citerefentry>, default
116 <filename>/etc/ctdb/nodes</filename>). This file contains the
117 list of private addresses for all nodes in the cluster, one
118 per line. This file must be the same on all nodes in the
122 Private addresses should not be used by clients to connect to
123 services provided by the cluster.
126 It is strongly recommended that the private addresses are
127 configured on a private network that is separate from client
132 Example <filename>/etc/ctdb/nodes</filename> for a four node
135 <screen format="linespecific">
144 <title>Public addresses</title>
147 Public addresses are used to provide services to clients.
148 Public addresses are not configured at the operating system
149 level and are not permanently associated with a particular
150 node. Instead, they are managed by CTDB and are assigned to
151 interfaces on physical nodes at runtime.
154 The CTDB cluster will assign/reassign these public addresses
155 across the available healthy nodes in the cluster. When one
156 node fails, its public addresses will be taken over by one or
157 more other nodes in the cluster. This ensures that services
158 provided by all public addresses are always available to
159 clients, as long as there are nodes available capable of
160 hosting this address.
163 The public address configuration is stored in a file on each
164 node specified by the <varname>CTDB_PUBLIC_ADDRESSES</varname>
165 configuration variable (see
166 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
167 <manvolnum>5</manvolnum></citerefentry>, recommended
168 <filename>/etc/ctdb/public_addresses</filename>). This file
169 contains a list of the public addresses that the node is
170 capable of hosting, one per line. Each entry also contains
171 the netmask and the interface to which the address should be
176 Example <filename>/etc/ctdb/public_addresses</filename> for a
177 node that can host 4 public addresses, on 2 different
180 <screen format="linespecific">
188 In many cases the public addresses file will be the same on
189 all nodes. However, it is possible to use different public
190 address configurations on different nodes.
194 Example: 4 nodes partitioned into two subgroups:
196 <screen format="linespecific">
197 Node 0:/etc/ctdb/public_addresses
201 Node 1:/etc/ctdb/public_addresses
205 Node 2:/etc/ctdb/public_addresses
209 Node 3:/etc/ctdb/public_addresses
214 In this example nodes 0 and 1 host two public addresses on the
215 10.1.1.x network while nodes 2 and 3 host two public addresses
216 for the 10.1.2.x network.
219 Public address 10.1.1.1 can be hosted by either of nodes 0 or
220 1 and will be available to clients as long as at least one of
221 these two nodes are available.
224 If both nodes 0 and 1 become unavailable then public address
225 10.1.1.1 also becomes unavailable. 10.1.1.1 can not be failed
226 over to nodes 2 or 3 since these nodes do not have this public
230 The <command>ctdb ip</command> command can be used to view the
231 current assignment of public addresses to physical nodes.
238 <title>Node status</title>
241 The current status of each node in the cluster can be viewed by the
242 <command>ctdb status</command> command.
246 A node can be in one of the following states:
254 This node is healthy and fully functional. It hosts public
255 addresses to provide services.
261 <term>DISCONNECTED</term>
264 This node is not reachable by other nodes via the private
265 network. It is not currently participating in the cluster.
266 It <emphasis>does not</emphasis> host public addresses to
267 provide services. It might be shut down.
273 <term>DISABLED</term>
276 This node has been administratively disabled. This node is
277 partially functional and participates in the cluster.
278 However, it <emphasis>does not</emphasis> host public
279 addresses to provide services.
285 <term>UNHEALTHY</term>
288 A service provided by this node has failed a health check
289 and should be investigated. This node is partially
290 functional and participates in the cluster. However, it
291 <emphasis>does not</emphasis> host public addresses to
292 provide services. Unhealthy nodes should be investigated
293 and may require an administrative action to rectify.
302 CTDB is not behaving as designed on this node. For example,
303 it may have failed too many recovery attempts. Such nodes
304 are banned from participating in the cluster for a
305 configurable time period before they attempt to rejoin the
306 cluster. A banned node <emphasis>does not</emphasis> host
307 public addresses to provide services. All banned nodes
308 should be investigated and may require an administrative
318 This node has been administratively exclude from the
319 cluster. A stopped node does no participate in the cluster
320 and <emphasis>does not</emphasis> host public addresses to
321 provide services. This state can be used while performing
322 maintenance on a node.
328 <term>PARTIALLYONLINE</term>
331 A node that is partially online participates in a cluster
332 like a healthy (OK) node. Some interfaces to serve public
333 addresses are down, but at least one interface is up. See
334 also <command>ctdb ifaces</command>.
343 <title>CAPABILITIES</title>
346 Cluster nodes can have several different capabilities enabled.
347 These are listed below.
353 <term>RECMASTER</term>
356 Indicates that a node can become the CTDB cluster recovery
357 master. The current recovery master is decided via an
358 election held by all active nodes with this capability.
370 Indicates that a node can be the location master (LMASTER)
371 for database records. The LMASTER always knows which node
372 has the latest copy of a record in a volatile database.
384 Indicates that a node is configued in Linux Virtual Server
385 (LVS) mode. In this mode the entire CTDB cluster uses one
386 single public address for the entire cluster instead of
387 using multiple public addresses in failover mode. This is
388 an alternative to using a load-balancing layer-4 switch.
389 See the <citetitle>LVS</citetitle> section for more
399 Indicates that this node is configured to become the NAT
400 gateway master in a NAT gateway group. See the
401 <citetitle>NAT GATEWAY</citetitle> section for more
410 The RECMASTER and LMASTER capabilities can be disabled when CTDB
411 is used to create a cluster spanning across WAN links. In this
412 case CTDB acts as a WAN accelerator.
421 LVS is a mode where CTDB presents one single IP address for the
422 entire cluster. This is an alternative to using public IP
423 addresses and round-robin DNS to loadbalance clients across the
428 This is similar to using a layer-4 loadbalancing switch but with
433 In this mode the cluster selects a set of nodes in the cluster
434 and loadbalance all client access to the LVS address across this
435 set of nodes. This set of nodes are all LVS capable nodes that
436 are HEALTHY, or if no HEALTHY nodes exists all LVS capable nodes
437 regardless of health status. LVS will however never loadbalance
438 traffic to nodes that are BANNED, STOPPED, DISABLED or
439 DISCONNECTED. The <command>ctdb lvs</command> command is used to
440 show which nodes are currently load-balanced across.
444 One of the these nodes are elected as the LVSMASTER. This node
445 receives all traffic from clients coming in to the LVS address
446 and multiplexes it across the internal network to one of the
447 nodes that LVS is using. When responding to the client, that
448 node will send the data back directly to the client, bypassing
449 the LVSMASTER node. The command <command>ctdb
450 lvsmaster</command> will show which node is the current
455 The path used for a client I/O is:
459 Client sends request packet to LVSMASTER.
464 LVSMASTER passes the request on to one node across the
470 Selected node processes the request.
475 Node responds back to client.
482 This means that all incoming traffic to the cluster will pass
483 through one physical node, which limits scalability. You can
484 send more data to the LVS address that one physical node can
485 multiplex. This means that you should not use LVS if your I/O
486 pattern is write-intensive since you will be limited in the
487 available network bandwidth that node can handle. LVS does work
488 wery well for read-intensive workloads where only smallish READ
489 requests are going through the LVSMASTER bottleneck and the
490 majority of the traffic volume (the data in the read replies)
491 goes straight from the processing node back to the clients. For
492 read-intensive i/o patterns you can acheive very high throughput
497 Note: you can use LVS and public addresses at the same time.
501 If you use LVS, you must have a permanent address configured for
502 the public interface on each node. This address must be routable
503 and the cluster nodes must be configured so that all traffic
504 back to client hosts are routed through this interface. This is
505 also required in order to allow samba/winbind on the node to
506 talk to the domain controller. This LVS IP address can not be
507 used to initiate outgoing traffic.
510 Make sure that the domain controller and the clients are
511 reachable from a node <emphasis>before</emphasis> you enable
512 LVS. Also ensure that outgoing traffic to these hosts is routed
513 out through the configured public interface.
517 <title>Configuration</title>
520 To activate LVS on a CTDB node you must specify the
521 <varname>CTDB_PUBLIC_INTERFACE</varname> and
522 <varname>CTDB_LVS_PUBLIC_IP</varname> configuration variables.
523 Setting the latter variable also enables the LVS capability on
529 <screen format="linespecific">
530 CTDB_PUBLIC_INTERFACE=eth1
531 CTDB_LVS_PUBLIC_IP=10.1.1.237
539 <title>NAT GATEWAY</title>
542 NAT gateway (NATGW) is an optional feature that is used to
543 configure fallback routing for nodes. This allows cluster nodes
544 to connect to external services (e.g. DNS, AD, NIS and LDAP)
545 when they do not host any public addresses (e.g. when they are
549 This also applies to node startup because CTDB marks nodes as
550 UNHEALTHY until they have passed a "monitor" event. In this
551 context, NAT gateway helps to avoid a "chicken and egg"
552 situation where a node needs to access an external service to
556 Another way of solving this type of problem is to assign an
557 extra static IP address to a public interface on every node.
558 This is simpler but it uses an extra IP address per node, while
559 NAT gateway generally uses only one extra IP address.
563 <title>Operation</title>
566 One extra NATGW public address is assigned on the public
567 network to each NATGW group. Each NATGW group is a set of
568 nodes in the cluster that shares the same NATGW address to
569 talk to the outside world. Normally there would only be one
570 NATGW group spanning an entire cluster, but in situations
571 where one CTDB cluster spans multiple physical sites it might
572 be useful to have one NATGW group for each site.
575 There can be multiple NATGW groups in a cluster but each node
576 can only be member of one NATGW group.
579 In each NATGW group, one of the nodes is selected by CTDB to
580 be the NATGW master and the other nodes are consider to be
581 NATGW slaves. NATGW slaves establish a fallback default route
582 to the NATGW master via the private network. When a NATGW
583 slave hosts no public IP addresses then it will use this route
584 for outbound connections. The NATGW master hosts the NATGW
585 public IP address and routes outgoing connections from
586 slave nodes via this IP address. It also establishes a
587 fallback default route.
592 <title>Configuration</title>
595 NATGW is usually configured similar to the following example configuration:
597 <screen format="linespecific">
598 CTDB_NATGW_NODES=/etc/ctdb/natgw_nodes
599 CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
600 CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
601 CTDB_NATGW_PUBLIC_IFACE=eth0
602 CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
606 Normally any node in a NATGW group can act as the NATGW
607 master. Some configurations may have special nodes that lack
608 connectivity to a public network. In such cases,
609 <varname>CTDB_NATGW_SLAVE_ONLY</varname> can be used to limit the
610 NATGW functionality of thos nodes.
614 See the <citetitle>NAT GATEWAY</citetitle> section in
615 <citerefentry><refentrytitle>ctdb.conf</refentrytitle>
616 <manvolnum>5</manvolnum></citerefentry> for more details of
623 <title>Implementation details</title>
626 When the NATGW functionality is used, one of the nodes is
627 selected to act as a NAT gateway for all the other nodes in
628 the group when they need to communicate with the external
629 services. The NATGW master is selected to be a node that is
630 most likely to have usable networks.
634 The NATGW master hosts the NATGW public IP address
635 <varname>CTDB_NATGW_PUBLIC_IP</varname> on the configured public
636 interfaces <varname>CTDB_NATGW_PUBLIC_IFACE</varname> and acts as
637 a router, masquerading outgoing connections from slave nodes
638 via this IP address. If
639 <varname>CTDB_NATGW_DEFAULT_GATEWAY</varname> is set then it
640 also establishes a fallback default route to the configured
641 this gateway with a metric of 10. A metric 10 route is used
642 so it can co-exist with other default routes that may be
647 A NATGW slave establishes its fallback default route to the
648 NATGW master via the private network
649 <varname>CTDB_NATGW_PRIVATE_NETWORK</varname>with a metric of 10.
650 This route is used for outbound connections when no other
651 default route is available because the node hosts no public
652 addresses. A metric 10 routes is used so that it can co-exist
653 with other default routes that may be available when the node
654 is hosting public addresses.
658 <varname>CTDB_NATGW_STATIC_ROUTES</varname> can be used to
659 have NATGW create more specific routes instead of just default
664 This is implemented in the <filename>11.natgw</filename>
665 eventscript. Please see the eventscript file and the
666 <citetitle>NAT GATEWAY</citetitle> section in
667 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
668 <manvolnum>5</manvolnum></citerefentry> for more details.
675 <title>POLICY ROUTING</title>
678 Policy routing is an optional CTDB feature to support complex
679 network topologies. Public addresses may be spread across
680 several different networks (or VLANs) and it may not be possible
681 to route packets from these public addresses via the system's
682 default route. Therefore, CTDB has support for policy routing
683 via the <filename>13.per_ip_routing</filename> eventscript.
684 This allows routing to be specified for packets sourced from
685 each public address. The routes are added and removed as CTDB
686 moves public addresses between nodes.
690 <title>Configuration variables</title>
693 There are 4 configuration variables related to policy routing:
694 <varname>CTDB_PER_IP_ROUTING_CONF</varname>,
695 <varname>CTDB_PER_IP_ROUTING_RULE_PREF</varname>,
696 <varname>CTDB_PER_IP_ROUTING_TABLE_ID_LOW</varname>,
697 <varname>CTDB_PER_IP_ROUTING_TABLE_ID_HIGH</varname>. See the
698 <citetitle>POLICY ROUTING</citetitle> section in
699 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
700 <manvolnum>5</manvolnum></citerefentry> for more details.
705 <title>Configuration</title>
708 The format of each line of
709 <varname>CTDB_PER_IP_ROUTING_CONF</varname> is:
713 <public_address> <network> [ <gateway> ]
717 Leading whitespace is ignored and arbitrary whitespace may be
718 used as a separator. Lines that have a "public address" item
719 that doesn't match an actual public address are ignored. This
720 means that comment lines can be added using a leading
721 character such as '#', since this will never match an IP
726 A line without a gateway indicates a link local route.
730 For example, consider the configuration line:
734 192.168.1.99 192.168.1.1/24
738 If the corresponding public_addresses line is:
742 192.168.1.99/24 eth2,eth3
746 <varname>CTDB_PER_IP_ROUTING_RULE_PREF</varname> is 100, and
747 CTDB adds the address to eth2 then the following routing
748 information is added:
752 ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
753 ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
757 This causes traffic from 192.168.1.1 to 192.168.1.0/24 go via
762 The <command>ip rule</command> command will show (something
763 like - depending on other public addresses and other routes on
768 0: from all lookup local
769 100: from 192.168.1.99 lookup ctdb.192.168.1.99
770 32766: from all lookup main
771 32767: from all lookup default
775 <command>ip route show table ctdb.192.168.1.99</command> will show:
779 192.168.1.0/24 dev eth2 scope link
783 The usual use for a line containing a gateway is to add a
784 default route corresponding to a particular source address.
785 Consider this line of configuration:
789 192.168.1.99 0.0.0.0/0 192.168.1.1
793 In the situation described above this will cause an extra
794 routing command to be executed:
798 ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
802 With both configuration lines, <command>ip route show table
803 ctdb.192.168.1.99</command> will show:
807 192.168.1.0/24 dev eth2 scope link
808 default via 192.168.1.1 dev eth2
813 <title>Sample configuration</title>
816 Here is a more complete example configuration.
820 /etc/ctdb/public_addresses:
822 192.168.1.98 eth2,eth3
823 192.168.1.99 eth2,eth3
825 /etc/ctdb/policy_routing:
827 192.168.1.98 192.168.1.0/24
828 192.168.1.98 192.168.200.0/24 192.168.1.254
829 192.168.1.98 0.0.0.0/0 192.168.1.1
830 192.168.1.99 192.168.1.0/24
831 192.168.1.99 192.168.200.0/24 192.168.1.254
832 192.168.1.99 0.0.0.0/0 192.168.1.1
836 The routes local packets as expected, the default route is as
837 previously discussed, but packets to 192.168.200.0/24 are
838 routed via the alternate gateway 192.168.1.254.
845 <title>NOTIFICATION SCRIPT</title>
848 When certain state changes occur in CTDB, it can be configured
849 to perform arbitrary actions via a notification script. For
850 example, sending SNMP traps or emails when a node becomes
851 unhealthy or similar.
854 This is activated by setting the
855 <varname>CTDB_NOTIFY_SCRIPT</varname> configuration variable.
856 The specified script must be executable.
859 Use of the provided <filename>/etc/ctdb/notify.sh</filename>
860 script is recommended. It executes files in
861 <filename>/etc/ctdb/notify.d/</filename>.
864 CTDB currently generates notifications after CTDB changes to
869 <member>init</member>
870 <member>setup</member>
871 <member>startup</member>
872 <member>healthy</member>
873 <member>unhealthy</member>
879 <title>DEBUG LEVELS</title>
882 Valid values for DEBUGLEVEL are:
886 <member>EMERG (-3)</member>
887 <member>ALERT (-2)</member>
888 <member>CRIT (-1)</member>
889 <member>ERR (0)</member>
890 <member>WARNING (1)</member>
891 <member>NOTICE (2)</member>
892 <member>INFO (3)</member>
893 <member>DEBUG (4)</member>
899 <title>REMOTE CLUSTER NODES</title>
901 It is possible to have a CTDB cluster that spans across a WAN link.
902 For example where you have a CTDB cluster in your datacentre but you also
903 want to have one additional CTDB node located at a remote branch site.
904 This is similar to how a WAN accelerator works but with the difference
905 that while a WAN-accelerator often acts as a Proxy or a MitM, in
906 the ctdb remote cluster node configuration the Samba instance at the remote site
907 IS the genuine server, not a proxy and not a MitM, and thus provides 100%
908 correct CIFS semantics to clients.
912 See the cluster as one single multihomed samba server where one of
913 the NICs (the remote node) is very far away.
917 NOTE: This does require that the cluster filesystem you use can cope
918 with WAN-link latencies. Not all cluster filesystems can handle
919 WAN-link latencies! Whether this will provide very good WAN-accelerator
920 performance or it will perform very poorly depends entirely
921 on how optimized your cluster filesystem is in handling high latency
922 for data and metadata operations.
926 To activate a node as being a remote cluster node you need to set
927 the following two parameters in /etc/sysconfig/ctdb for the remote node:
928 <screen format="linespecific">
929 CTDB_CAPABILITY_LMASTER=no
930 CTDB_CAPABILITY_RECMASTER=no
935 Verify with the command "ctdb getcapabilities" that that node no longer
936 has the recmaster or the lmaster capabilities.
943 <title>SEE ALSO</title>
946 <citerefentry><refentrytitle>ctdb</refentrytitle>
947 <manvolnum>1</manvolnum></citerefentry>,
949 <citerefentry><refentrytitle>ctdbd</refentrytitle>
950 <manvolnum>1</manvolnum></citerefentry>,
952 <citerefentry><refentrytitle>ctdbd_wrapper</refentrytitle>
953 <manvolnum>1</manvolnum></citerefentry>,
955 <citerefentry><refentrytitle>ltdbtool</refentrytitle>
956 <manvolnum>1</manvolnum></citerefentry>,
958 <citerefentry><refentrytitle>onnode</refentrytitle>
959 <manvolnum>1</manvolnum></citerefentry>,
961 <citerefentry><refentrytitle>ping_pong</refentrytitle>
962 <manvolnum>1</manvolnum></citerefentry>,
964 <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
965 <manvolnum>5</manvolnum></citerefentry>,
967 <citerefentry><refentrytitle>ctdb-tunables</refentrytitle>
968 <manvolnum>7</manvolnum></citerefentry>,
970 <ulink url="http://ctdb.samba.org/"/>
977 This documentation was written by
986 <holder>Andrew Tridgell</holder>
987 <holder>Ronnie Sahlberg</holder>
991 This program is free software; you can redistribute it and/or
992 modify it under the terms of the GNU General Public License as
993 published by the Free Software Foundation; either version 3 of
994 the License, or (at your option) any later version.
997 This program is distributed in the hope that it will be
998 useful, but WITHOUT ANY WARRANTY; without even the implied
999 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
1000 PURPOSE. See the GNU General Public License for more details.
1003 You should have received a copy of the GNU General Public
1004 License along with this program; if not, see
1005 <ulink url="http://www.gnu.org/licenses"/>.