docs/Samba3-HOWTO/TOSHARG-HighAvailability.xml

   1 <?xml version="1.0" encoding="iso-8859-1"?>
   2 <!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
   3 <chapter id="SambaHA">
   4 <chapterinfo>
   5         &author.jht;
   6         &author.jeremy;
   7 </chapterinfo>
   8
   9 <title>High Availability</title>
  10
  11 <sect1>
  12 <title>Features and Benefits</title>
  13
  14 <para>
  15 Network administrators are often concerned about the availability of file and print
  16 services. Network users are inclined toward intolerance of the services they depend
  17 on to perform vital task responsibilities.
  18 </para>
  19
  20 <para>
  21 A sign in a computer room served to remind staff of their responsibilities. It read:
  22 </para>
  23
  24 <blockquote>
  25 <para>
  26 All humans fail, in both great and small ways we fail continually. Machines fail too.
  27 Computers are machines that are managed by humans, the fallout from failure
  28 can be spectacular. Your responsibility is to deal with failure, to anticipate it
  29 and to eliminate it as far as is humanly and economically wise to achieve.
  30 Are your actions part of the problem or part of the solution?
  31 </para>
  32 </blockquote>
  33
  34 <para>
  35 If we are to deal with failure in a planned and productive manner, then first we must
  36 understand the problem. That is the purpose of this chapter.
  37 </para>
  38
  39 <para>
  40 Parenthetically, in the following discussion there are seeds of information on how to
  41 provision a network infrastructure against failure. Our purpose here is not to provide
  42 a lengthy dissertation on the subject of high availability. Additionally, we have made
  43 a conscious decision to not provide detailed working examples of high availability
  44 solutions; instead we present an overview of the issues in the hope that someone will
  45 rise to the challenge of providing a detailed document that is focused purely on
  46 presentation of the current state of knowledge and practice in high availability as it
  47 applies to the deployment of Samba and other CIFS/SMB technologies.
  48 </para>
  49
  50 </sect1>
  51
  52 <sect1>
  53 <title>Technical Discussion</title>
  54
  55 <para>
  56 The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
  57 conference that was held at Goettingen, Germany, in April 2003. Material has been added
  58 from other sources, but it was Jeremy who inspired the structure that follows.
  59 </para>
  60
  61         <sect2>
  62         <title>The Ultimate Goal</title>
  63
  64         <para>
  65         All clustering technologies aim to achieve one or more of the following:
  66         </para>
  67
  68         <itemizedlist>
  69                 <listitem><para>Obtain the maximum affordable computational power.</para></listitem>
  70                 <listitem><para>Obtain faster program execution.</para></listitem>
  71                 <listitem><para>Deliver unstoppable services.</para></listitem>
  72                 <listitem><para>Avert points of failure.</para></listitem>
  73                 <listitem><para>Exact most effective utilization of resources.</para></listitem>
  74         </itemizedlist>
  75
  76         <para>
  77         A clustered file server ideally has the following properties:
  78         </para>
  79
  80         <itemizedlist>
  81                 <listitem><para>All clients can connect transparently to any server.</para></listitem>
  82                 <listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
  83                 <listitem><para>All servers server out the same set of files.</para></listitem>
  84                 <listitem><para>All file changes are immediately seen on all servers.</para>
  85                         <itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
  86                 <listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
  87         </itemizedlist>
  88
  89         </sect2>
  90
  91         <sect2>
  92         <title>Why Is This So Hard?</title>
  93
  94         <para>
  95         In short, the problem is one of <emphasis>state</emphasis>.
  96         </para>
  97
  98         <itemizedlist>
  99                 <listitem>
 100                         <para>
 101                         All TCP/IP connections are dependent on state information.
 102                         </para>
 103                         <para>
 104                         The TCP connection involves a packet sequence number. This
 105                         sequence number would need to be dynamically updated on all
 106                         machines in the cluster to effect seamless TCP fail-over.
 107                         </para>
 108                 </listitem>
 109                 <listitem>
 110                         <para>
 111                         CIFS/SMB (the Windows networking protocols) uses TCP connections.
 112                         </para>
 113                         <para>
 114                         This means that from a basic design perspective, fail-over is not
 115                         seriously considered.
 116                         <itemizedlist>
 117                                 <listitem><para>
 118                                 All current SMB clusters are fail-over solutions
 119                                 &smbmdash; they rely on the clients to reconnect. They provide server
 120                                 fail-over, but clients can lose information due to a server failure.
 121                                 </para></listitem>
 122                         </itemizedlist>
 123                         </para>
 124                 </listitem>
 125                 <listitem>
 126                         <para>
 127                         Servers keep state information about client connections.
 128                         <itemizedlist>
 129                                 <listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
 130                                 <listitem><para>Every file open must be compared with other file opens
 131                                                 to check share modes.</para></listitem>
 132                         </itemizedlist>
 133                         </para>
 134                 </listitem>
 135         </itemizedlist>
 136
 137                 <sect3>
 138                 <title>The Front-End Challenge</title>
 139
 140                 <para>
 141                 To make it possible for a cluster of file servers to appear as a single server that has one
 142                 name and one IP address, the incoming TCP data streams from clients must be processed by the
 143                 front end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
 144                 layer level and then feed the SMB packet to different servers in the cluster.
 145                 </para>
 146
 147                 <para>
 148                 One could split all IPC$ connections and RPC calls to one server to handle printing and user
 149                 lookup requirements. RPC Printing handles are shared between different IPC4 sessions &smbmdash; it is
 150                 hard to split this across clustered servers!
 151                 </para>
 152
 153                 <para>
 154                 Conceptually speaking, all other servers would then provide only file services. This is a simpler
 155                 problem to concentrate on.
 156                 </para>
 157
 158                 </sect3>
 159
 160                 <sect3>
 161                 <title>De-multiplexing SMB Requests</title>
 162
 163                 <para>
 164                 De-multiplexing of SMB requests requires knowledge of SMB state information,
 165                 all of which must be held by the front-end <emphasis>virtual</emphasis> server.
 166                 This is a perplexing and complicated problem to solve.
 167                 </para>
 168
 169                 <para>
 170                 Windows XP and later have changed semantics so state information (vuid, tid, fid)
 171                 must match for a successful operation. This makes things simpler than before and is a
 172                 positive step forward.
 173                 </para>
 174
 175                 <para>
 176                 SMB requests are sent by vuid to their associated server. No code exists today to
 177                 affect this solution. This problem is conceptually similar to the problem of
 178                 correctly handling requests from multiple requests from Windows 2000
 179                 Terminal Server in Samba.
 180                 </para>
 181
 182                 <para>
 183                 One possibility is to start by exposing the server pool to clients directly.
 184                 This could eliminate the de-multiplexing step.
 185                 </para>
 186
 187                 </sect3>
 188
 189                 <sect3>
 190                 <title>The Distributed File System Challenge</title>
 191
 192                 <para>
 193 <indexterm><primary>Distributed File Systems</primary></indexterm>
 194                 There exists many distributed file systems for UNIX and Linux.
 195                 </para>
 196
 197                 <para>
 198                 Many could be adopted to backend our cluster, so long as awareness of SMB
 199                 semantics is kept in mind (share modes, locking and oplock issues in particular).
 200                 Common free distributed file systems include:
 201 <indexterm><primary>NFS</primary></indexterm>
 202 <indexterm><primary>AFS</primary></indexterm>
 203 <indexterm><primary>OpenGFS</primary></indexterm>
 204 <indexterm><primary>Lustre</primary></indexterm>
 205                 </para>
 206
 207                 <itemizedlist>
 208                         <listitem><para>NFS</para></listitem>
 209                         <listitem><para>AFS</para></listitem>
 210                         <listitem><para>OpenGFS</para></listitem>
 211                         <listitem><para>Lustre</para></listitem>
 212                 </itemizedlist>
 213
 214                 <para>
 215                 The server pool (cluster) can use any distributed file system backend if all SMB
 216                 semantics are performed within this pool.
 217                 </para>
 218
 219                 </sect3>
 220
 221                 <sect3>
 222                 <title>Restrictive Constraints on Distributed File Systems</title>
 223
 224                 <para>
 225                 Where a clustered server provides purely SMB services, oplock handling
 226                 may be done within the server pool without imposing a need for this to
 227                 be passed to the backend file system pool.
 228                 </para>
 229
 230                 <para>
 231                 On the other hand, where the server pool also provides NFS or other file services,
 232                 it will be essential that the implementation be oplock aware so it can
 233                 interoperate with SMB services. This is a significant challenge today. A failure
 234                 to provide this will result in a significant loss of performance that will be
 235                 sorely noted by users of Microsoft Windows clients.
 236                 </para>
 237
 238                 <para>
 239                 Last, all state information must be shared across the server pool.
 240                 </para>
 241
 242                 </sect3>
 243
 244                 <sect3>
 245                 <title>Server Pool Communications</title>
 246
 247                 <para>
 248                 Most backend file systems support POSIX file semantics. This makes it difficult
 249                 to push SMB semantics back into the file system. POSIX locks have different properties
 250                 and semantics from SMB locks.
 251                 </para>
 252
 253                 <para>
 254                 All <command>smbd</command> processes in the server pool must of necessity communicate
 255                 very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
 256                 uses is not suitable for use across a network. Clustered <command>smbd</command>'s must use something else.
 257                 </para>
 258
 259                 </sect3>
 260
 261                 <sect3>
 262                 <title>Server Pool Communications Demands</title>
 263
 264                 <para>
 265                 High speed inter-server communications in the server pool is a design prerequisite
 266                 for a fully functional system. Possibilities for this include:
 267                 </para>
 268
 269                 <itemizedlist>
 270                         <listitem><para>
 271                         Proprietary shared memory bus (example: Myrinet or SCI [Scalable Coherent Interface]).
 272                         These are high cost items.
 273                         </para></listitem>
 274
 275                         <listitem><para>
 276                         Gigabit ethernet (now quite affordable).
 277                         </para></listitem>
 278
 279                         <listitem><para>
 280                         Raw ethernet framing (to bypass TCP and UDP overheads).
 281                         </para></listitem>
 282                 </itemizedlist>
 283
 284                 <para>
 285                 We have yet to identify metrics for  performance demands to enable this to happen
 286                 effectively.
 287                 </para>
 288
 289                 </sect3>
 290
 291                 <sect3>
 292                 <title>Required Modifications to Samba</title>
 293
 294                 <para>
 295                 Samba needs to be significantly modified to work with a high-speed server inter-connect
 296                 system to permit transparent fail-over clustering.
 297                 </para>
 298
 299                 <para>
 300                 Particular functions inside Samba that will be affected include:
 301                 </para>
 302
 303                 <itemizedlist>
 304                         <listitem><para>
 305                         The locking database, oplock notifications,
 306                         and the share mode database.
 307                         </para></listitem>
 308
 309                         <listitem><para>
 310                         Failure semantics need to be defined. Samba behaves the same way as Windows.
 311                         When oplock messages fail, a file open request is allowed, but this is
 312                         potentially dangerous in a clustered environment. So how should inter-server
 313                         pool failure semantics function and how should this be implemented?
 314                         </para></listitem>
 315
 316                         <listitem><para>
 317                         Should this be implemented using a point-to-point lock manager, or can this
 318                         be done using multicast techniques?
 319                         </para></listitem>
 320
 321                 </itemizedlist>
 322
 323                 </sect3>
 324         </sect2>
 325
 326         <sect2>
 327         <title>A Simple Solution</title>
 328
 329         <para>
 330         Allowing fail-over servers to handle different functions within the exported file system
 331         removes the problem of requiring a distributed locking protocol.
 332         </para>
 333
 334         <para>
 335         If only one server is active in a pair, the need for high speed server interconnect is avoided.
 336         This allows the use of existing high availability solutions, instead of inventing a new one.
 337         This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
 338         complex file name space. Since there is now not a single file system, administrators
 339         must remember where all services are located &smbmdash; a complexity not easily dealt with.
 340         </para>
 341
 342         <para>
 343         The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
 344         servers. Backend file space integrity is the responsibility of the administrator.
 345         </para>
 346
 347         </sect2>
 348
 349         <sect2>
 350         <title>High Availability Server Products</title>
 351
 352         <para>
 353         Fail-over servers must communicate in order to handle resource fail-over. This is essential
 354         for high availability services. The use of a dedicated heartbeat is a common technique to
 355         introduce some intelligence into the fail-over process. This is often done over a dedicated
 356         link (LAN or serial).
 357         </para>
 358
 359         <para>
 360 <indexterm><primary>SCSI</primary></indexterm>
 361         Many fail-over solutions (like Red Hat Cluster Manager, as well as Microsoft Wolfpack)
 362         can use a shared SCSI of Fiber Channel disk storage array for fail-over communication.
 363         Information regarding Red Hat high availability solutions for Samba may be obtained from:
 364         <ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com.</ulink>
 365         </para>
 366
 367         <para>
 368         The Linux High Availability project is a resource worthy of consultation if your desire is
 369         to build a highly available Samba file server solution. Please consult the home page at
 370         <ulink url="http://www.linux-ha.org/">www.linux-ha.org/.</ulink>
 371         </para>
 372
 373         <para>
 374         Front-end server complexity remains a challenge for high availability as it needs to deal
 375         gracefully with backend failures, while at the same time it needs to provide continuity of service
 376         to all network clients.
 377         </para>
 378
 379         </sect2>
 380
 381         <sect2>
 382         <title>MS-DFS: The Poor Man's Cluster</title>
 383
 384         <para>
 385 <indexterm><primary>MS-DFS</primary></indexterm>
 386 <indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
 387         MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
 388         complexity back to the network client, something already included by Microsoft.
 389         MS-DFS creates the illusion of a simple, continuous file system name space, that even
 390         works at the file level.
 391         </para>
 392
 393         <para>
 394         Above all, at the cost of complexity of management, a distributed (pseudo-cluster) can
 395         be created using existing Samba functionality.
 396         </para>
 397
 398         </sect2>
 399
 400         <sect2>
 401         <title>Conclusions</title>
 402
 403         <itemizedlist>
 404                 <listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
 405                 <listitem><para>Client fail-over is the best we can do today.</para></listitem>
 406                 <listitem><para>Much more work is needed before a practical and manageable high
 407                                 availability transparent cluster solution will be possible.</para></listitem>
 408                 <listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
 409         </itemizedlist>
 410
 411         </sect2>
 412
 413 </sect1>
 414 </chapter>