share/man/man5/hammer.5

   1 .\"
   2 .\" Copyright (c) 2008
   3 .\"     The DragonFly Project.  All rights reserved.
   4 .\"
   5 .\" Redistribution and use in source and binary forms, with or without
   6 .\" modification, are permitted provided that the following conditions
   7 .\" are met:
   8 .\"
   9 .\" 1. Redistributions of source code must retain the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer.
  11 .\" 2. Redistributions in binary form must reproduce the above copyright
  12 .\"    notice, this list of conditions and the following disclaimer in
  13 .\"    the documentation and/or other materials provided with the
  14 .\"    distribution.
  15 .\" 3. Neither the name of The DragonFly Project nor the names of its
  16 .\"    contributors may be used to endorse or promote products derived
  17 .\"    from this software without specific, prior written permission.
  18 .\"
  19 .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21 .\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  22 .\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
  23 .\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  24 .\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
  25 .\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  26 .\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  27 .\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  28 .\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
  29 .\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  30 .\" SUCH DAMAGE.
  31 .\"
  32 .Dd July 7, 2017
  33 .Dt HAMMER 5
  34 .Os
  35 .Sh NAME
  36 .Nm HAMMER
  37 .Nd HAMMER file system
  38 .Sh SYNOPSIS
  39 To compile this driver into the kernel,
  40 place the following line in your
  41 kernel configuration file:
  42 .Bd -ragged -offset indent
  43 .Cd "options HAMMER"
  44 .Ed
  45 .Pp
  46 Alternatively, to load the driver as a
  47 module at boot time, place the following line in
  48 .Xr loader.conf 5 :
  49 .Bd -literal -offset indent
  50 hammer_load="YES"
  51 .Ed
  52 .Pp
  53 To mount via
  54 .Xr fstab 5 :
  55 .Bd -literal -offset indent
  56 /dev/ad0s1d[:/dev/ad1s1d:...]   /mnt hammer rw 2 0
  57 .Ed
  58 .Sh DESCRIPTION
  59 The
  60 .Nm
  61 file system provides facilities to store file system data onto disk devices
  62 and is intended to replace
  63 .Xr ffs 5
  64 as the default file system for
  65 .Dx .
  66 .Pp
  67 Among its features are instant crash recovery,
  68 large file systems spanning multiple volumes,
  69 data integrity checking,
  70 data deduplication,
  71 fine grained history retention and snapshots,
  72 pseudo-filesystems (PFSs),
  73 mirroring capability and
  74 unlimited number of files and links.
  75 .Pp
  76 All functions related to managing
  77 .Nm
  78 file systems are provided by the
  79 .Xr newfs_hammer 8 ,
  80 .Xr mount_hammer 8 ,
  81 .Xr hammer 8 ,
  82 .Xr sysctl 8 ,
  83 .Xr chflags 1 ,
  84 and
  85 .Xr undo 1
  86 utilities.
  87 .Pp
  88 For a more detailed introduction refer to the paper and slides listed in the
  89 .Sx SEE ALSO
  90 section.
  91 For some common usages of
  92 .Nm
  93 see the
  94 .Sx EXAMPLES
  95 section below.
  96 .Pp
  97 Description of
  98 .Nm
  99 features:
 100 .Ss Instant Crash Recovery
 101 After a non-graceful system shutdown,
 102 .Nm
 103 file systems will be brought back into a fully coherent state
 104 when mounting the file system, usually within a few seconds.
 105 .Pp
 106 In the unlikely case
 107 .Nm
 108 mount fails due redo recovery (stage 2 recovery) being corrupted, a
 109 workaround to skip this stage can be applied by setting the following tunable:
 110 .Bd -literal -offset indent
 111 vfs.hammer.skip_redo=<value>
 112 .Ed
 113 .Pp
 114 Possible values are:
 115 .Bl -tag -width indent
 116 .It 0
 117 Run redo recovery normally and fail to mount in the case of error (default).
 118 .It 1
 119 Run redo recovery but continue mounting if an error appears.
 120 .It 2
 121 Completely bypass redo recovery.
 122 .El
 123 .Pp
 124 Related commands:
 125 .Xr mount_hammer 8
 126 .Ss Large File Systems & Multi Volume
 127 A
 128 .Nm
 129 file system can be up to 1 Exabyte in size.
 130 It can span up to 256 volumes,
 131 each volume occupies a
 132 .Dx
 133 disk slice or partition, or another special file,
 134 and can be up to 4096 TB in size.
 135 Minimum recommended
 136 .Nm
 137 file system size is 50 GB.
 138 For volumes over 2 TB in size
 139 .Xr gpt 8
 140 and
 141 .Xr disklabel64 8
 142 normally need to be used.
 143 .Pp
 144 Related
 145 .Xr hammer 8
 146 commands:
 147 .Cm volume-add ,
 148 .Cm volume-del ,
 149 .Cm volume-list ,
 150 .Cm volume-blkdevs ;
 151 see also
 152 .Xr newfs_hammer 8
 153 .Ss Data Integrity Checking
 154 .Nm
 155 has high focus on data integrity,
 156 CRC checks are made for all major structures and data.
 157 .Nm
 158 snapshots implements features to make data integrity checking easier:
 159 The atime and mtime fields are locked to the ctime
 160 for files accessed via a snapshot.
 161 The
 162 .Fa st_dev
 163 field is based on the PFS
 164 .Ar shared-uuid
 165 and not on any real device.
 166 This means that archiving the contents of a snapshot with e.g.\&
 167 .Xr tar 1
 168 and piping it to something like
 169 .Xr md5 1
 170 will yield a consistent result.
 171 The consistency is also retained on mirroring targets.
 172 .Ss Data Deduplication
 173 To save disk space data deduplication can be used.
 174 Data deduplication will identify data blocks which occur multiple times
 175 and only store one copy, multiple reference will be made to this copy.
 176 .Pp
 177 Related
 178 .Xr hammer 8
 179 commands:
 180 .Cm dedup ,
 181 .Cm dedup-simulate ,
 182 .Cm cleanup ,
 183 .Cm config
 184 .Ss Transaction IDs
 185 The
 186 .Nm
 187 file system uses 64-bit transaction ids to refer to historical
 188 file or directory data.
 189 Transaction ids used by
 190 .Nm
 191 are monotonically increasing over time.
 192 In other words:
 193 when a transaction is made,
 194 .Nm
 195 will always use higher transaction ids for following transactions.
 196 A transaction id is given in hexadecimal format
 197 .Li 0x016llx ,
 198 such as
 199 .Li 0x00000001061a8ba6 .
 200 .Pp
 201 Related
 202 .Xr hammer 8
 203 commands:
 204 .Cm snapshot ,
 205 .Cm snap ,
 206 .Cm snaplo ,
 207 .Cm snapq ,
 208 .Cm snapls ,
 209 .Cm synctid
 210 .Ss History & Snapshots
 211 History metadata on the media is written with every sync operation, so that
 212 by default the resolution of a file's history is 30-60 seconds until the next
 213 prune operation.
 214 Prior versions of files and directories are generally accessible by appending
 215 .Ql @@
 216 and a transaction id to the name.
 217 The common way of accessing history, however, is by taking snapshots.
 218 .Pp
 219 Snapshots are softlinks to prior versions of directories and their files.
 220 Their data will be retained across prune operations for as long as the
 221 softlink exists.
 222 Removing the softlink enables the file system to reclaim the space
 223 again upon the next prune & reblock operations.
 224 In
 225 .Nm
 226 Version 3+ snapshots are also maintained as file system meta-data.
 227 .Pp
 228 Related
 229 .Xr hammer 8
 230 commands:
 231 .Cm cleanup ,
 232 .Cm history ,
 233 .Cm snapshot ,
 234 .Cm snap ,
 235 .Cm snaplo ,
 236 .Cm snapq ,
 237 .Cm snaprm ,
 238 .Cm snapls ,
 239 .Cm config ,
 240 .Cm viconfig ;
 241 see also
 242 .Xr undo 1
 243 .Ss Pruning & Reblocking
 244 Pruning is the act of deleting file system history.
 245 By default only history used by the given snapshots
 246 and history from after the latest snapshot will be retained.
 247 By setting the per PFS parameter
 248 .Cm prune-min ,
 249 history is guaranteed to be saved at least this time interval.
 250 All other history is deleted.
 251 Reblocking will reorder all elements and thus defragment the file system and
 252 free space for reuse.
 253 After pruning a file system must be reblocked to recover all available space.
 254 Reblocking is needed even when using the
 255 .Cm nohistory
 256 .Xr mount_hammer 8
 257 option or
 258 .Xr chflags 1
 259 flag.
 260 .Pp
 261 Related
 262 .Xr hammer 8
 263 commands:
 264 .Cm cleanup ,
 265 .Cm snapshot ,
 266 .Cm prune ,
 267 .Cm prune-everything ,
 268 .Cm rebalance ,
 269 .Cm reblock ,
 270 .Cm reblock-btree ,
 271 .Cm reblock-inodes ,
 272 .Cm reblock-dirs ,
 273 .Cm reblock-data
 274 .Ss Pseudo-Filesystems (PFSs)
 275 A pseudo-filesystem, PFS for short, is a sub file system in a
 276 .Nm
 277 file system.
 278 All disk space in a
 279 .Nm
 280 file system is shared between all PFSs in it,
 281 so each PFS is free to use all remaining space.
 282 A
 283 .Nm
 284 file system supports up to 65536 PFSs.
 285 The root of a
 286 .Nm
 287 file system is PFS# 0, it is called the root PFS and is always a master PFS.
 288 .Pp
 289 A non-root PFS can be either master or slave.
 290 Slaves are always read-only,
 291 so they can't be updated by normal file operations, only by
 292 .Xr hammer 8
 293 operations like mirroring and pruning.
 294 Upgrading slaves to masters and downgrading masters to slaves are supported.
 295 .Pp
 296 It is recommended to use a
 297 .Nm null
 298 mount to access a PFS, except for root PFS;
 299 this way no tools are confused by the PFS root being a symlink
 300 and inodes not being unique across a
 301 .Nm
 302 file system.
 303 .Pp
 304 Many
 305 .Xr hammer 8
 306 operations operates per PFS,
 307 this includes mirroring, offline deduping, pruning, reblocking and rebalancing.
 308 .Pp
 309 Related
 310 .Xr hammer 8
 311 commands:
 312 .Cm pfs-master ,
 313 .Cm pfs-slave ,
 314 .Cm pfs-status ,
 315 .Cm pfs-update ,
 316 .Cm pfs-destroy ,
 317 .Cm pfs-upgrade ,
 318 .Cm pfs-downgrade ;
 319 see also
 320 .Xr mount_null 8
 321 .Ss Mirroring
 322 Mirroring is copying of all data in a file system, including snapshots
 323 and other historical data.
 324 In order to allow inode numbers to be duplicated on the slaves
 325 .Nm
 326 mirroring feature uses PFSs.
 327 A master or slave PFS can be mirrored to a slave PFS.
 328 I.e.\& for mirroring multiple slaves per master are supported,
 329 but multiple masters per slave are not.
 330 .Nm
 331 does not support multi-master clustering and mirroring.
 332 .Pp
 333 Related
 334 .Xr hammer 8
 335 commands:
 336 .Cm mirror-copy ,
 337 .Cm mirror-stream ,
 338 .Cm mirror-read ,
 339 .Cm mirror-read-stream ,
 340 .Cm mirror-write ,
 341 .Cm mirror-dump
 342 .Ss Fsync Flush Modes
 343 The
 344 .Nm
 345 file system implements several different
 346 .Fn fsync
 347 flush modes, the mode used is set via the
 348 .Va vfs.hammer.flush_mode
 349 sysctl, see
 350 .Xr hammer 8
 351 for details.
 352 .Ss Unlimited Number of Files and Links
 353 There is no limit on the number of files or links in a
 354 .Nm
 355 file system, apart from available disk space.
 356 .Ss NFS Export
 357 .Nm
 358 file systems support NFS export.
 359 NFS export of PFSs is done using
 360 .Nm null
 361 mounts (for file/directory in root PFS
 362 .Nm null
 363 mount is not needed).
 364 For example, to export the PFS
 365 .Pa /hammer/pfs/data ,
 366 create a
 367 .Nm null
 368 mount, e.g.\& to
 369 .Pa /hammer/data
 370 and export the latter path.
 371 .Pp
 372 Don't export a directory containing a PFS (e.g.\&
 373 .Pa /hammer/pfs
 374 above).
 375 Only
 376 .Nm null
 377 mount for PFS root
 378 (e.g.\&
 379 .Pa /hammer/data
 380 above) should be exported (subdirectory may be escaped if exported).
 381 .Ss File System Versions
 382 As new features have been introduced to
 383 .Nm
 384 a version number has been bumped.
 385 Each
 386 .Nm
 387 file system has a version, which can be upgraded to support new features.
 388 .Pp
 389 Related
 390 .Xr hammer 8
 391 commands:
 392 .Cm version ,
 393 .Cm version-upgrade ;
 394 see also
 395 .Xr newfs_hammer 8
 396 .Sh EXAMPLES
 397 .Ss Preparing the File System
 398 To create and mount a
 399 .Nm
 400 file system use the
 401 .Xr newfs_hammer 8
 402 and
 403 .Xr mount_hammer 8
 404 commands.
 405 Note that all
 406 .Nm
 407 file systems must have a unique name on a per-machine basis.
 408 .Bd -literal -offset indent
 409 newfs_hammer -L HOME /dev/ad0s1d
 410 mount_hammer /dev/ad0s1d /home
 411 .Ed
 412 .Pp
 413 Similarly, multi volume file systems can be created and mounted by
 414 specifying additional arguments.
 415 .Bd -literal -offset indent
 416 newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d
 417 mount_hammer /dev/ad0s1d /dev/ad1s1d /home
 418 .Ed
 419 .Pp
 420 Once created and mounted,
 421 .Nm
 422 file systems need periodic clean up making snapshots, pruning and reblocking,
 423 in order to have access to history and file system not to fill up.
 424 For this it is recommended to use the
 425 .Xr hammer 8
 426 .Cm cleanup
 427 metacommand.
 428 .Pp
 429 By default,
 430 .Dx
 431 is set up to run
 432 .Nm hammer Cm cleanup
 433 nightly via
 434 .Xr periodic 8 .
 435 .Pp
 436 It is also possible to perform these operations individually via
 437 .Xr crontab 5 .
 438 For example, to reblock the
 439 .Pa /home
 440 file system every night at 2:15 for up to 5 minutes:
 441 .Bd -literal -offset indent
 442 15 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e
 443         >/dev/null 2>&1
 444 .Ed
 445 .Ss Snapshots
 446 The
 447 .Xr hammer 8
 448 utility's
 449 .Cm snapshot
 450 command provides several ways of taking snapshots.
 451 They all assume a directory where snapshots are kept.
 452 .Bd -literal -offset indent
 453 mkdir /snaps
 454 hammer snapshot /home /snaps/snap1
 455 (...after some changes in /home...)
 456 hammer snapshot /home /snaps/snap2
 457 .Ed
 458 .Pp
 459 The softlinks in
 460 .Pa /snaps
 461 point to the state of the
 462 .Pa /home
 463 directory at the time each snapshot was taken, and could now be used to copy
 464 the data somewhere else for backup purposes.
 465 .Pp
 466 By default,
 467 .Dx
 468 is set up to create nightly snapshots of all
 469 .Nm
 470 file systems via
 471 .Xr periodic 8
 472 and to keep them for 60 days.
 473 .Ss Pruning
 474 A snapshot directory is also the argument to the
 475 .Xr hammer 8
 476 .Cm prune
 477 command which frees historical data from the file system that is not
 478 pointed to by any snapshot link and is not from after the latest snapshot
 479 and is older than
 480 .Cm prune-min .
 481 .Bd -literal -offset indent
 482 rm /snaps/snap1
 483 hammer prune /snaps
 484 .Ed
 485 .Ss Mirroring
 486 Mirroring is set up using
 487 .Nm
 488 pseudo-filesystems (PFSs).
 489 To associate the slave with the master its shared UUID should be set to
 490 the master's shared UUID as output by the
 491 .Nm hammer Cm pfs-master
 492 command.
 493 .Bd -literal -offset indent
 494 hammer pfs-master /home/pfs/master
 495 hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid>
 496 .Ed
 497 .Pp
 498 The
 499 .Pa /home/pfs/slave
 500 link is unusable for as long as no mirroring operation has taken place.
 501 .Pp
 502 To mirror the master's data, either pipe a
 503 .Cm mirror-read
 504 command into a
 505 .Cm mirror-write
 506 or, as a short-cut, use the
 507 .Cm mirror-copy
 508 command (which works across a
 509 .Xr ssh 1
 510 connection as well).
 511 Initial mirroring operation has to be done to the PFS path (as
 512 .Xr mount_null 8
 513 can't access it yet).
 514 .Bd -literal -offset indent
 515 hammer mirror-copy /home/pfs/master /home/pfs/slave
 516 .Ed
 517 .Pp
 518 It is also possible to have the target PFS auto created
 519 by just issuing the same
 520 .Cm mirror-copy
 521 command, if the target PFS doesn't exist you will be prompted
 522 if you would like to create it.
 523 You can even omit the prompting by using the
 524 .Fl y
 525 flag:
 526 .Bd -literal -offset indent
 527 hammer -y mirror-copy /home/pfs/master /home/pfs/slave
 528 .Ed
 529 .Pp
 530 After this initial step
 531 .Nm null
 532 mount can be setup for
 533 .Pa /home/pfs/slave .
 534 Further operations can use
 535 .Nm null
 536 mounts.
 537 .Bd -literal -offset indent
 538 mount_null /home/pfs/master /home/master
 539 mount_null /home/pfs/slave /home/slave
 540
 541 hammer mirror-copy /home/master /home/slave
 542 .Ed
 543 .Ss NFS Export
 544 To NFS export from the
 545 .Nm
 546 file system
 547 .Pa /hammer
 548 the directory
 549 .Pa /hammer/non-pfs
 550 without PFSs, and the PFS
 551 .Pa /hammer/pfs/data ,
 552 the latter is
 553 .Nm null
 554 mounted to
 555 .Pa /hammer/data .
 556 .Pp
 557 Add to
 558 .Pa /etc/fstab
 559 (see
 560 .Xr fstab 5 ) :
 561 .Bd -literal -offset indent
 562 /hammer/pfs/data /hammer/data null rw
 563 .Ed
 564 .Pp
 565 Add to
 566 .Pa /etc/exports
 567 (see
 568 .Xr exports 5 ) :
 569 .Bd -literal -offset indent
 570 /hammer/non-pfs
 571 /hammer/data
 572 .Ed
 573 .Sh DIAGNOSTICS
 574 .Bl -diag
 575 .It "hammer: System has insuffient buffers to rebalance the tree.  nbuf < %d"
 576 Rebalancing a
 577 .Nm
 578 PFS uses quite a bit of memory and
 579 can't be done on low memory systems.
 580 It has been reported to fail on 512MB systems.
 581 Rebalancing isn't critical for
 582 .Nm
 583 file system operation;
 584 it is done by
 585 .Nm hammer
 586 .Cm rebalance ,
 587 often as part of
 588 .Nm hammer
 589 .Cm cleanup .
 590 .El
 591 .Sh SEE ALSO
 592 .Xr chflags 1 ,
 593 .Xr md5 1 ,
 594 .Xr tar 1 ,
 595 .Xr undo 1 ,
 596 .Xr exports 5 ,
 597 .Xr ffs 5 ,
 598 .Xr fstab 5 ,
 599 .Xr disklabel64 8 ,
 600 .Xr gpt 8 ,
 601 .Xr hammer 8 ,
 602 .Xr mount_hammer 8 ,
 603 .Xr mount_null 8 ,
 604 .Xr newfs_hammer 8 ,
 605 .Xr periodic 8 ,
 606 .Xr sysctl 8
 607 .Rs
 608 .%A Matthew Dillon
 609 .%D June 2008
 610 .%O http://www.dragonflybsd.org/hammer/hammer.pdf
 611 .%T "The HAMMER Filesystem"
 612 .Re
 613 .Rs
 614 .%A Matthew Dillon
 615 .%D October 2008
 616 .%O http://www.dragonflybsd.org/presentations/nycbsdcon08/
 617 .%T "Slideshow from NYCBSDCon 2008"
 618 .Re
 619 .Rs
 620 .%A Michael Neumann
 621 .%D January 2010
 622 .%O http://www.ntecs.de/talks/HAMMER.pdf
 623 .%T "Slideshow for a presentation held at KIT (http://www.kit.edu)"
 624 .Re
 625 .Sh FILESYSTEM PERFORMANCE
 626 The
 627 .Nm
 628 file system has a front-end which processes VNOPS and issues necessary
 629 block reads from disk, and a back-end which handles meta-data updates
 630 on-media and performs all meta-data write operations.
 631 Bulk file write operations are handled by the front-end.
 632 Because
 633 .Nm
 634 defers meta-data updates virtually no meta-data read operations will be
 635 issued by the frontend while writing large amounts of data to the file system
 636 or even when creating new files or directories, and even though the
 637 kernel prioritizes reads over writes the fact that writes are cached by
 638 the drive itself tends to lead to excessive priority given to writes.
 639 .Pp
 640 There are four bioq sysctls, shown below with default values,
 641 which can be adjusted to give reads a higher priority:
 642 .Bd -literal -offset indent
 643 kern.bioq_reorder_minor_bytes: 262144
 644 kern.bioq_reorder_burst_bytes: 3000000
 645 kern.bioq_reorder_minor_interval: 5
 646 kern.bioq_reorder_burst_interval: 60
 647 .Ed
 648 .Pp
 649 If a higher read priority is desired it is recommended that the
 650 .Va kern.bioq_reorder_minor_interval
 651 be increased to 15, 30, or even 60, and the
 652 .Va kern.bioq_reorder_burst_bytes
 653 be decreased to 262144 or 524288.
 654 .Sh HISTORY
 655 The
 656 .Nm
 657 file system first appeared in
 658 .Dx 1.11 .
 659 .Sh AUTHORS
 660 .An -nosplit
 661 The
 662 .Nm
 663 file system was designed and implemented by
 664 .An Matthew Dillon Aq Mt dillon@backplane.com ,
 665 data deduplication was added by
 666 .An Ilya Dryomov .
 667 This manual page was written by
 668 .An Sascha Wildner
 669 and updated by
 670 .An Thomas Nikolajsen .