doc/storage.texi

   1 @c Copyright (c) 1998 - 2001 Kungliga Tekniska Högskolan
   2 @c (Royal Institute of Technology, Stockholm, Sweden).
   3 @c All rights reserved.
   4
   5 @c $Id$
   6
   7 @node Organization of data, AFS and the real world, AFS infrastructure, Top
   8 @comment  node-name,  next,  previous,  up
   9
  10 @chapter Organization of data
  11
  12 This chapter describes how data is stored and how AFS is different from,
  13 for example, NFS. It also describes how data is kept consistent and what
  14 the requirements were and how that inpacted on the design.
  15
  16 @menu
  17 * Requirements::
  18 * Data organization::
  19 * Callbacks::
  20 * Volume management::
  21 * Relationship between pts uid and unix uid::
  22 @end menu
  23
  24 @node Requirements, Data organization, Organization of data, Organization of data
  25 @comment  node-name,  next,  previous,  up
  26 @section Requirements
  27
  28 @itemize @bullet
  29 @item Scalability
  30
  31 It should be possible to use AFS with hundred-thousands of users without
  32 problems.
  33
  34 Writes that are done to different parts of the filesystem should not
  35 affect each other. It should be possible to distribute out the reads and
  36 writes over many fileservers. If you have a file that is accessed by
  37 many clients, it should be possible to distribute the load.
  38
  39 @comment What has this to do with requirements?
  40 @comment If there is multiple writes to the same file, are you sure that isn't a
  41 @comment database.
  42
  43 @item Transparent to users
  44
  45 Users should not need to know where their files are stored. It should be
  46 possible to move their files while they are using their files.
  47
  48 @item Easy to admin
  49
  50 It should be easy for a administrator to make changes to the
  51 filesystem. For example to change quota for a user or project. It should
  52 also be possible to move the users data for a fileserver to a less
  53 loaded one, or one with more diskspace available.
  54
  55 Some benefits of using AFS are:
  56
  57 @itemize @bullet
  58 @item user-transparent data migration
  59 @item an ability for on-line backups;
  60 @item data replication that provides both load balancing and robustness of
  61 critical data
  62 @item global name space without automounters and other add-ons;
  63 @item @@sys variables for platform-independent paths to binary location;
  64 @item enhanced security;
  65 @item client-side caching;
  66 @end itemize
  67 @end itemize
  68
  69 @section Anti-requirements
  70
  71 @itemize @bullet
  72 @item No databases
  73
  74 AFS isn't constructed for storing databases. It would be possible to use
  75 AFS for storing a database if a layer above for locking and synchronizing
  76 data would be provided.
  77
  78 One of the problems is that AFS doesn't include mandatory byte-range
  79 locks. AFS uses advisory locking on whole files.
  80
  81 If you need a real database, use one, they are much more efficent on
  82 solving a database problem. Don't use AFS.
  83
  84 @end itemize
  85
  86 @node Data organization, Callbacks, Requirements, Organization of data
  87 @comment  node-name,  next,  previous,  up
  88 @section Volume
  89
  90 A volume is a unit that is smaller than a partition. It is usually (or should
  91 be) a well defined area, like a user's home directory, a project work
  92 area, or a program distribution.
  93
  94 Quota is controlled on volume-level. All day-to-day management is done
  95 on volumes.
  96
  97 @section Partition
  98
  99 In AFS a partition is what normally is named a partition. All partions
 100 that afs is using are named a special way, @file{/vicepNN}, where NN is
 101 ranged from a to z, continuing with aa to zz. The fileserver (and
 102 volser) automaticly picks upp all partitions starting with @file{/vicep}
 103
 104 Volumes are stored in a partition. Volumes can't span several
 105 partitions. Partitions are added when the fileserver is created or when
 106 a new disk is added to a filesystem.
 107
 108 @section Volume cloning and read-only clones
 109
 110 A clone of a volume is often needed for volume operations. A clone is
 111 a copy-on-write copy of a volume, the clone is the read-only version.
 112
 113 Two special versions of a clone are the read-only volume and the backup
 114 volume. The read-only volume is a snapshot of a read-write volume (that
 115 is what a clone is) that can be replicated to several fileservers to
 116 distribute the load. Each fileserver plus partition where a read-only
 117 clone is located is called a replication-site. It usually does not make
 118 sense to have more than one read-only clone on each fileserver.
 119
 120 The backup volume is a clone that typically is made (with @code{vos
 121 backupsys}) each night to enable the user to retrieve yesterday's data
 122 when they happen to remove a file. This is a very useful feature, since
 123 it lessens the load on the system-administrators to restore files from
 124 backup. The volume is usually mounted in the root user's home directory
 125 under the name OldFiles. A special feature of the backup volume is that
 126 you can't follow mountpoints out of a backup volume.
 127
 128 @section Mountpoints
 129
 130 Volumes are independent of each other. To glue together the file tree
 131 there are @samp{mountpoint}s. Mountpoints are really symlinks that are
 132 formated in a special way so that they point out a volume and an
 133 optional cell. An AFS-cache-manager will show a mountpoint as directory
 134 and in fact it will be the root directory of the target volume.
 135
 136 @node Callbacks, Volume management, Data organization, Organization of data
 137 @comment  node-name,  next,  previous,  up
 138 @section Callbacks
 139
 140 Callbacks are messages that enable the AFS-cache-manager to keep the
 141 files without asking the server if there is newer version of the file.
 142
 143 A callback is a promise from the fileserver that it will notify the
 144 client if the file (or directory) changes within the timelimit of the
 145 callback.
 146
 147 For contents of read-only volumes there is only one callback per volume
 148 called a volume callback and it will be broken when the read-only volume
 149 is updated.
 150
 151 The time range of callbacks is from 5 to 60 minutes depending on
 152 how many users of the file exist.
 153
 154 @node Volume management, Relationship between pts uid and unix uid, Callbacks, Organization of data
 155 @comment  node-name,  next,  previous,  up
 156 @section Volume management
 157
 158 All volume management is done with the @code{vos} command. To get a list
 159 of all commands @code{vos help} can be used. For help on a specific vos
 160 subcommand, @code{vos subcommand -h} can be used.
 161
 162 @itemize @bullet
 163 @item Create
 164
 165 @example
 166 vos create mim c HO.staff.lha.fluff -quota 400000
 167 @end example
 168
 169 @item Move
 170
 171 Volumes can be moved from a server to another, even when users are using
 172 the volume.
 173
 174 @item Replicate
 175
 176 Read-only volumes can be replicated over several servers, they are first
 177 added with @code{vos addsite}, and the replicated with @code{vos
 178 release} over the servers.
 179
 180 @item Release
 181
 182 When you want to distribute the changes in the readwrite volume to the
 183 read-only clones.
 184
 185 @item Remove
 186
 187 Volumes can be removed
 188
 189 Note that you shouldn't remove the last readonly volume since this makes
 190 clients misbehave. If you are moving the volume you should rather add a
 191 new RO to the new server and then remove it from the old server.
 192
 193 @item Backup and restoration of volumes.
 194
 195 @code{vos backup} and @code{vos backupsys} creates the backup volume.
 196
 197 To stream a volume out to a @file{file} or @file{stdout} you use
 198 @code{vos dump}. The opposite command is named @code{vos restore}.
 199
 200 @end itemize
 201
 202 @node Relationship between pts uid and unix uid, , Volume management, Organization of data
 203 @comment  node-name,  next,  previous,  up
 204 @section Relationship between pts uid and unix uid
 205
 206 @cindex pts
 207 @cindex uid
 208
 209 Files in AFS are created with the pts uid of the token that was valid at
 210 the time. The pts uid number is then by commands like @code{ls -l}
 211 interpreted as a unix uid and translated into a username. If the pts and
 212 the unix uids differ, this might confuse the user as it looks like as
 213 her files are owned by someone else. This is however not the case.
 214 Complications can occur if programs impose further access restrictions
 215 based on these wrongly interpreted uids instead of using the
 216 @code{access()} system call for that purpose. Graphical file browsers
 217 are typically prone to that problem with the effect that the users are
 218 not able to see their own files in these tools.