ctdb/NEWS

   1 Changes in CTDB 2.5.1
   2 =====================
   3
   4 Important bug fixes
   5 -------------------
   6
   7 * The locking code now correctly implements a per-database active
   8   locks limit.  Whole database lock requests can no longer be denied
   9   because there are too many active locks - this is particularly
  10   important for freezing databases during recovery.
  11
  12 * The debug_locks.sh script locks against itself.  If it is already
  13   running then subsequent invocations will exit immediately.
  14
  15 * ctdb tool commands that operate on databases now work correctly when
  16   a database ID is given.
  17
  18 * Various code fixes for issues found by Coverity.
  19
  20 Important internal changes
  21 --------------------------
  22
  23 * statd-callout has been updated so that statd client information is
  24   always up-to-date across the cluster.  This is implemented by
  25   storing the client information in a persistent database using a new
  26   "ctdb ptrans" command.
  27
  28 * The transaction code for persistent databases now retries until it
  29   is able to take the transaction lock.  This makes the transation
  30   semantics compatible with Samba's implementation.
  31
  32 * Locking helpers are created with vfork(2) instead of fork(2),
  33   providing a performance improvement.
  34
  35 * config.guess has been updated to the latest upstream version so CTDB
  36   should build on more platforms.
  37
  38
  39 Changes in CTDB 2.5
  40 ===================
  41
  42 User-visible changes
  43 --------------------
  44
  45 * The default location of the ctdbd socket is now:
  46
  47     /var/run/ctdb/ctdbd.socket
  48
  49   If you currently set CTDB_SOCKET in configuration then unsetting it
  50   will probably do what you want.
  51
  52 * The default location of CTDB TDB databases is now:
  53
  54     /var/lib/ctdb
  55
  56   If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
  57   you probably want to move your databases to /var/lib/ctdb, drop your
  58   setting of CTDB_DBDIR and just use the default.
  59
  60   To maintain the database files in /var/ctdb you will need to set
  61   CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
  62   these have moved.
  63
  64 * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
  65   supported.  Please use individual configuration variables instead.
  66
  67 * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
  68   VacuumMaxInterval have been removed.  Setting them had no effect but
  69   if you now try to set them in a configuration files via CTDB_SET_X=Y
  70   then CTDB will not start.
  71
  72 * Much improved manual pages.  Added new manpages ctdb(7),
  73   ctdbd.conf(5), ctdb-tunables(7).  Still some work to do.
  74
  75 * Most CTDB-specific configuration can now be set in
  76   /etc/ctdb/ctdbd.conf.
  77
  78   This avoids cluttering distribution-specific configuration files,
  79   such as /etc/sysconfig/ctdb.  It also means that we can say: see
  80   ctdbd.conf(5) for more details.  :-)
  81
  82 * Configuration variable NFS_SERVER_MODE is deprecated and has been
  83   replaced by CTDB_NFS_SERVER_MODE.  See ctdbd.conf(5) for more
  84   details.
  85
  86 * "ctdb reloadips" is much improved and should be used for reloading
  87   the public IP configuration.
  88
  89   This commands attempts to yield much more predictable IP allocations
  90   than using sequences of delip and addip commands.  See ctdb(1) for
  91   details.
  92
  93 * Ability to pass comma-separated string to ctdb(1) tool commands via
  94   the -n option is now documented and works for most commands.  See
  95   ctdb(1) for details.
  96
  97 * "ctdb rebalancenode" is now a debugging command and should not be
  98   used in normal operation.  See ctdb(1) for details.
  99
 100 * "ctdb ban 0" is now invalid.
 101
 102   This was documented as causing a permanent ban.  However, this was
 103   not implemented and caused an "unban" instead.  To avoid confusion,
 104   0 is now an invalid ban duration.  To administratively "ban" a node
 105   use "ctdb stop" instead.
 106
 107 * The systemd configuration now puts the PID file in /run/ctdb (rather
 108   than /run/ctdbd) for consistency with the initscript and other uses
 109   of /var/run/ctdb.
 110
 111 Important bug fixes
 112 -------------------
 113
 114 * Traverse regression fixed.
 115
 116 * The default recovery method for persistent databases has been
 117   changed to use database sequence numbers instead of doing
 118   record-by-record recovery (using record sequence numbers).  This
 119   fixes issues including registry corruption.
 120
 121 * Banned nodes are no longer told to run the "ipreallocated" event
 122   during a takeover run, when in fallback mode with nodes that don't
 123   support the IPREALLOCATED control.
 124
 125 Important internal changes
 126 --------------------------
 127
 128 * Persistent transactions are now compatible with Samba and work
 129   reliably.
 130
 131 * The recovery master role has been made more stable by resetting the
 132   priority time each time a node becomes inactive.  This means that
 133   nodes that are active for a long time are more likely to retain the
 134   recovery master role.
 135
 136 * The incomplete libctdb library has been removed.
 137
 138 * Test suite now starts ctdbd with the --sloppy-start option to speed
 139   up startup.  However, this should not be done in production.
 140
 141
 142 Changes in CTDB 2.4
 143 ===================
 144
 145 User-visible changes
 146 --------------------
 147
 148 * A missing network interface now causes monitoring to fail and the
 149   node to become unhealthy.
 150
 151 * Changed ctdb command's default control timeout from 3s to 10s.
 152
 153 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
 154   to provide more information.
 155
 156 Important bug fixes
 157 -------------------
 158
 159 * Starting CTDB daemon by running ctdbd directly should not remove
 160   existing unix socket unconditionally.
 161
 162 * ctdbd once again successfully kills client processes on releasing
 163   public IPs.  It was checking for them as tracked child processes
 164   and not finding them, so wasn't killing them.
 165
 166 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
 167   ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
 168
 169 * Always use Jenkins hash when creating volatile databases.  There
 170   were a few places where TDBs would be attached with the wrong flags.
 171
 172 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
 173   which led to header corruption for empty records.  This resulted
 174   in inconsistent headers on two nodes and a request for such a record
 175   keeps bouncing between nodes indefinitely and logs "High hopcount"
 176   messages in the log. This also caused performance degradation.
 177
 178 * ctdbd was losing log messages at shutdown because they weren't being
 179   given time to flush.  ctdbd now sleeps for a second during shutdown
 180   to allow time to flush log messages.
 181
 182 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
 183   process a large number of packets available on single FD before
 184   polling other FDs.  Use fixed size queue buffers to allow fair
 185   scheduling across multiple FDs.
 186
 187 Important internal changes
 188 --------------------------
 189
 190 * A node that fails to take/release multiple IPs will only incur a
 191   single banning credit.  This makes a brief failure less likely to
 192   cause node to be banned.
 193
 194 * ctdb killtcp has been changed to read connections from stdin and
 195   10.interface now uses this feature to improve the time taken to kill
 196   connections.
 197
 198 * Improvements to hot records statistics in ctdb dbstatistics.
 199
 200 * Recovery daemon now assembles up-to-date node flags information
 201   from remote nodes before checking if any flags are inconsistent and
 202   forcing a recovery.
 203
 204 * ctdbd no longer creates multiple lock sub-processes for the same
 205   key.  This reduces the number of lock sub-processes substantially.
 206
 207 * Changed the nfsd RPC check failure policy to failover quickly
 208   instead of trying to repair a node first by restarting NFS.  Such
 209   restarts would often hang if the cause of the RPC check failure was
 210   the cluster filesystem or storage.
 211
 212 * Logging improvements relating to high hopcounts and sticky records.
 213
 214 * Make sure lower level tdb messages are logged correctly.
 215
 216 * CTDB commands disable/enable/stop/continue are now resilient to
 217   individual control failures and retry in case of failures.
 218
 219
 220 Changes in CTDB 2.3
 221 ===================
 222
 223 User-visible changes
 224 --------------------
 225
 226 * 2 new configuration variables for 60.nfs eventscript:
 227
 228   - CTDB_MONITOR_NFS_THREAD_COUNT
 229   - CTDB_NFS_DUMP_STUCK_THREADS
 230
 231   See ctdb.sysconfig for details.
 232
 233 * Removed DeadlockTimeout tunable.  To enable debug of locking issues set
 234
 235    CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
 236
 237 * In overall statistics and database statistics, lock buckets have been
 238   updated to use following timings:
 239
 240    < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
 241
 242 * Initscript is now simplified with most CTDB-specific functionality
 243   split out to ctdbd_wrapper, which is used to start and stop ctdbd.
 244
 245 * Add systemd support.
 246
 247 * CTDB subprocesses are now given informative names to allow them to
 248   be easily distinguished when using programs like "top" or "perf".
 249
 250 Important bug fixes
 251 -------------------
 252
 253 * ctdb tool should not exit from a retry loop if a control times out
 254   (e.g. under high load).  This simple fix will stop an exit from the
 255   retry loop on any error.
 256
 257 * When updating flags on all nodes, use the correct updated flags.  This
 258   should avoid wrong flag change messages in the logs.
 259
 260 * The recovery daemon will not ban other nodes if the current node
 261   is banned.
 262
 263 * ctdb dbstatistics command now correctly outputs database statistics.
 264
 265 * Fixed a panic with overlapping shutdowns (regression in 2.2).
 266
 267 * Fixed 60.ganesha "monitor" event (regression in 2.2).
 268
 269 * Fixed a buffer overflow in the "reloadips" implementation.
 270
 271 * Fixed segmentation faults in ping_pong (called with incorrect
 272   argument) and test binaries (called when ctdbd not running).
 273
 274 Important internal changes
 275 --------------------------
 276
 277 * The recovery daemon on stopped or banned node will stop participating in any
 278   cluster activity.
 279
 280 * Improve cluster wide database traverse by sending the records directly from
 281   traverse child process to requesting node.
 282
 283 * TDB checking and dropping of all IPs moved from initscript to "init"
 284   event in 00.ctdb.
 285
 286 * To avoid "rogue IPs" the release IP callback now fails if the
 287   released IP is still present on an interface.
 288
 289
 290 Changes in CTDB 2.2
 291 ===================
 292
 293 User-visible changes
 294 --------------------
 295
 296 * The "stopped" event has been removed.
 297
 298   The "ipreallocated" event is now run when a node is stopped.  Use
 299   this instead of "stopped".
 300
 301 * New --pidfile option for ctdbd, used by initscript
 302
 303 * The 60.nfs eventscript now uses configuration files in
 304   /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
 305   hardcoding them into the script.
 306
 307 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
 308
 309 * The NoIPTakeoverOnDisabled tunable has been renamed to
 310   NoIPHostOnAllDisabled and now works properly when set on individual
 311   nodes.
 312
 313 * New ctdb subcommand "runstate" prints the current internal runstate.
 314   Runstates are used for serialising startup.
 315
 316 Important bug fixes
 317 -------------------
 318
 319 * The Unix domain socket is now set to non-blocking after the
 320   connection succeeds.  This avoids connections failing with EAGAIN
 321   and not being retried.
 322
 323 * Fetching from the log ringbuffer now succeeds if the buffer is full.
 324
 325 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
 326
 327 * The statd-callout script now runs as root via sudo.
 328
 329 * "ctdb delip" no longer fails if it is unable to move the IP.
 330
 331 * A race in the ctdb tool's ipreallocate code was fixed.  This fixes
 332   potential bugs in the "disable", "enable", "stop", "continue",
 333   "ban", "unban", "ipreallocate" and "sync" commands.
 334
 335 * The monitor cancellation code could sometimes hang indefinitely.
 336   This could cause "ctdb stop" and "ctdb shutdown" to fail.
 337
 338 Important internal changes
 339 --------------------------
 340
 341 * The socket I/O handling has been optimised to improve performance.
 342
 343 * IPs will not be assigned to nodes during CTDB initialisation.  They
 344   will only be assigned to nodes that are in the "running" runstate.
 345
 346 * Improved database locking code.  One improvement is to use a
 347   standalone locking helper executable - the avoids creating many
 348   forked copies of ctdbd and potentially running a node out of memory.
 349
 350 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
 351   "ipreallocated" events.
 352
 353 * Message handlers are now indexed, providing a significant
 354   performance improvement.