fusd/doc/fusd.tex

   1 %
   2 %
   3 % FUSD - Framework for User-Space Devices
   4 % Programming Manual & Tutorial
   5 %
   6 % Jeremy Elson, (c) 2001 Sensoria Corporation, 2003 UCLA
   7 % Released under open-source, BSD license
   8 % See LICENSE file for full license
   9 %
  10 % $Id: fusd.tex,v 1.63 2003/08/20 22:00:55 jelson Exp $
  11
  12 \documentclass{article}
  13 \addtolength{\topmargin}{-.5in}        % repairing LaTeX's huge margins...
  14 \addtolength{\textheight}{1in}         % more margin hacking
  15 \addtolength{\textwidth}{1.5in}
  16 \addtolength{\oddsidemargin}{-0.75in}
  17 \addtolength{\evensidemargin}{-0.75in}
  18
  19 \usepackage{graphicx,float,alltt,tabularx}
  20 \usepackage{wrapfig,floatflt}
  21 \usepackage{amsmath}
  22 \usepackage{latexsym}
  23 \usepackage{moreverb}
  24 \usepackage{times}
  25 \usepackage{html}
  26 %\usepackage{draftcopy}
  27
  28 %\setcounter{bottomnumber}{3}
  29 %\renewcommand{\topfraction}{0}
  30 %\renewcommand{\bottomfraction}{0.7}
  31 %\renewcommand{\textfraction}{0}
  32 %\renewcommand{\floatpagefraction}{2.0}
  33
  34 \renewcommand{\topfraction}{1.0}
  35 \renewcommand{\bottomfraction}{1.0}
  36 \renewcommand{\textfraction}{0.0}
  37 \renewcommand{\floatpagefraction}{0.9}
  38
  39 \floatstyle{ruled}
  40 \newfloat{Program}{tp}{lop}
  41
  42
  43 \title{FUSD:
  44 A Linux {\bf F}ramework for {\bf U}ser-{\bf S}pace {\bf D}evices}
  45
  46 \author{Jeremy Elson\\
  47 jelson@circlemud.org\\
  48 http://www.circlemud.org/\tilde{}jelson/software/fusd}
  49 \date{19 August 2003, Documentation for FUSD 1.10}
  50
  51 \begin{document}
  52
  53 %%%%%%%%%%%%%%%%%%%%%%%%% Title Page %%%%%%%%%%%%%%%%%%%%%%%%%
  54
  55 \begin{center}
  56 \begin{latexonly}\vspace*{2in}\end{latexonly}
  57 {\Huge FUSD:} \\
  58 \vspace{2\baselineskip}
  59 {\huge A Linux {\bf F}ramework for {\bf U}ser-{\bf S}pace {\bf D}evices}
  60
  61 \begin{latexonly}\vspace{2in}\end{latexonly}
  62 \vspace{\baselineskip}
  63
  64 \vfill
  65
  66 {\large Jeremy Elson \\
  67 \begin{latexonly}\vspace{.5\baselineskip}\end{latexonly}}
  68 \vspace{\baselineskip}
  69 {\tt jelson@circlemud.org\\
  70 http://www.circlemud.org/jelson/software/fusd}
  71
  72 \vspace{2\baselineskip}
  73 19 August 2003\\
  74 Documentation for FUSD 1.10\\
  75
  76 \end{center}
  77 \thispagestyle{empty}
  78 \clearpage
  79
  80
  81
  82 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  83
  84 \begin{latexonly}
  85 \pagenumbering{roman}
  86
  87 \tableofcontents
  88 \bigskip
  89 \listof{Program}{List of Example Programs}
  90 \setlength{\parskip}{10pt}
  91
  92 \clearpage
  93 \end{latexonly}
  94
  95 % This resets the page counter to 1
  96 \pagenumbering{arabic}
  97 \addtolength{\parskip}{0.5\baselineskip}
  98
  99 \section{Introduction}
 100
 101 \subsection{What is FUSD?}
 102
 103 FUSD (pronounced {\em fused}) is a Linux framework for proxying device
 104 file callbacks into user-space, allowing device files to be
 105 implemented by daemons instead of kernel code.  Despite being
 106 implemented in user-space, FUSD devices can look and act just like any
 107 other file under /dev which is implemented by kernel callbacks.
 108
 109 A user-space device driver can do many of the things that kernel
 110 drivers can't, such as perform a long-running computation, block while
 111 waiting for an event, or read files from the file system.  Unlike
 112 kernel drivers, a user-space device driver can {\em use other device
 113 drivers}---that is, access the network, talk to a serial port, get
 114 interactive input from the user, pop up GUI windows, or read from
 115 disks.  User-space drivers implemented using FUSD can be much easier
 116 to debug; it is impossible for them to crash the machine, are easily
 117 traceable using tools such as {\tt gdb}, and can be killed and
 118 restarted without rebooting even if they become corrupted.  FUSD
 119 drivers don't have to be in C---Perl, Python, or any other language
 120 that knows how to read from and write to a file descriptor can work
 121 with FUSD.  User-space drivers can be swapped out, whereas kernel
 122 drivers lock physical memory.
 123
 124 Of course, as with almost everything, there are trade-offs.
 125 User-space drivers are slower than kernel drivers because they require
 126 three times as many system calls, and additional memory copies (see
 127 section~\ref{performance}).  User-space drivers can not receive
 128 interrupts, and do not have the full power to modify arbitrary kernel
 129 data structures as kernel drivers do.  Despite these limitations, we
 130 have found user-space device drivers to be a powerful programming
 131 paradigm with a wide variety of uses (see Section~\ref{use-cases}).
 132
 133 FUSD is free software, distributed under a GPL-compatible license (the
 134 ``new'' BSD license, with the advertising clause removed).
 135
 136 \subsection{How does FUSD work?}
 137
 138 FUSD drivers are conceptually similar to kernel drivers: a set of
 139 callback functions called in response to system calls made on file
 140 descriptors by user programs.  FUSD's C library provides a device
 141 registration function, similar to the kernel's {\tt
 142 devfs\_register\_chrdev()} function, to create new devices.  {\tt
 143 fusd\_register()} accepts the device name and a structure full of
 144 pointers.  Those pointers are callback functions which are called in
 145 response to certain user system calls---for example, when a process
 146 tries to open, close, read from, or write to the device file.  The
 147 callback functions should conform to the standard definitions of POSIX
 148 system call behavior.  In many ways, the user-space FUSD callback
 149 functions are identical to their kernel counterparts.
 150
 151 Perhaps the best way to show what FUSD does is by example.
 152 Program~\ref{helloworld.c} is a simple FUSD device driver.  When the
 153 program is run, a device called {\tt /dev/hello-world} appears under
 154 the {\tt /dev} directory.  If that device is read (e.g., using {\tt
 155 cat}), the read returns {\tt Hello, world!} followed by an EOF.
 156 Finally, when the driver is stopped (e.g., by hitting Control-C), the
 157 device file disappears.
 158
 159 \begin{Program}
 160 \listinginput[5]{1}{helloworld.c.example}
 161 \caption{helloworld.c: A simple program using FUSD to
 162         create {\tt /dev/hello-world}}
 163 \label{helloworld.c}
 164 \end{Program}
 165
 166 On line 40 of the source, we use {\tt fusd\_register()} to create the
 167 {\tt /dev/hello-world} device, passing pointers to callbacks for the
 168 open(), close() and read() system calls.  (Lines 36--39 use the GNU C
 169 extension that allows initializer field naming; the 2.4 series of
 170 Linux kernels use also that extension for the same purpose.)  The
 171 ``Hello, World'' read() callback itself is virtually identical to what
 172 a kernel driver for this device would look like.  It can inspect and
 173 modify the user's file pointer, copy data into the user-provided
 174 buffer, control the system call return value (either positive, EOF, or
 175 error), and so forth.
 176
 177 The proxying of kernel system calls that makes this kind of program
 178 possible is implemented by FUSD, using a combination of a kernel
 179 module and cooperating user-space library.  The kernel module
 180 implements a character device, {\tt /dev/fusd}, which is used as a
 181 control channel between the two.  fusd\_register() uses this channel
 182 to send a message to the FUSD kernel module, telling the name of the
 183 device the user wants to register.  The kernel module, in turn,
 184 registers that device with the kernel proper using devfs.  devfs and
 185 the kernel don't know anything unusual is happening; it appears from
 186 their point of view that the registered devices are simply being
 187 implemented by the FUSD module.
 188
 189 Later, when kernel makes a callback due to a system call (e.g.\ when
 190 the character device file is opened or read), the FUSD kernel module's
 191 callback blocks the calling process, marshals the arguments of the
 192 callback into a message and sends it to user-space.  Once there, the
 193 library half of FUSD unmarshals it and calls whatever user-space
 194 callback the FUSD driver passed to fusd\_register().  When that
 195 user-space callback returns a value, the process happens in reverse:
 196 the return value and its side-effects are marshaled by the library
 197 and sent to the kernel.  The FUSD kernel module unmarshals this
 198 message, matches it up with a corresponding outstanding request, and
 199 completes the system call.  The calling process is completely unaware
 200 of this trickery; it simply enters the kernel once, blocks, unblocks,
 201 and returns from the system call---just as it would for any other
 202 blocking call.
 203
 204 One of the primary design goals of FUSD is {\em stability}.  It should
 205 not be possible for a FUSD driver to corrupt or crash the kernel,
 206 either due to error or malice.  Of course, a buggy driver itself may
 207 corrupt itself (e.g., due to a buffer overrun).  However, strict error
 208 checking is implemented at the user-kernel boundary which should
 209 prevent drivers from corrupting the kernel or any other user-space
 210 process---including the errant driver's own clients, and other FUSD
 211 drivers.
 212
 213
 214 \subsection{What FUSD {\em Isn't}}
 215
 216 FUSD looks similar to certain other Linux facilities that are already
 217 available.  It also skirts near a few of the kernel's hot-button
 218 political issues.  So, to avoid confusion, we present a list of
 219 things that FUSD is {\em not}.
 220
 221 \begin{itemize}
 222
 223 \item {\bf A FUSD driver is not a kernel module.}  Kernel modules
 224 allow---well, modularity of kernel code.  They let you insert and
 225 remove kernel modules dynamically after the kernel boots.  However,
 226 once inserted, the kernel modules are actually part of the kernel
 227 proper.  They run in the kernel's address space, with all the same
 228 privileges and restrictions that native kernel code does.  A FUSD
 229 device driver, in contrast, is more similar to a daemon---a program
 230 that runs as a user-space process, with a process ID.
 231
 232 \item {\bf FUSD is not, and doesn't replace, devfs.}  When a FUSD
 233 driver registers a FUSD device, it automatically creates a device file
 234 in {\tt /dev}.  However, FUSD is not a replacement for devfs---quite
 235 the contrary, FUSD creates those device files by {\em using} devfs.
 236 In a normal Linux system, only kernel modules proper---not user-space
 237 programs---can register with devfs (see above).
 238
 239 \item {\bf FUSD is not UDI.}  UDI, the \htmladdnormallinkfoot{Uniform
 240 Driver Interface}{http://www.projectudi.org}, aims to create a binary
 241 API for drivers that is uniform across operating systems.  It's true
 242 that FUSD could conceivably be used for a similar purpose (inasmuch as
 243 it defines a system call messaging structure).  However, this was not
 244 the goal of FUSD as much as an accidental side effect.  We do not
 245 advocate publishing drivers in binary-only form, even though FUSD does
 246 make this possible in some cases.
 247
 248 \item {\bf FUSD is not an attempt to turn Linux into a microkernel.}
 249 We aren't trying to port existing drivers into user-space for a
 250 variety of reasons (not the least of which is performance).  We've
 251 used FUSD as a tool to write new drivers that are much easier from
 252 user-space than they would be in the kernel; see
 253 Section~\ref{use-cases} for use cases.
 254
 255
 256 \end{itemize}
 257
 258
 259 \subsection{Related Work}
 260
 261 FUSD is a new implementation, but certainly not a new idea---the
 262 theory of its operation is the same as any microkernel operating
 263 system.  A microkernel (roughly speaking) is one that implements only
 264 very basic resource protection and message passing in the kernel.
 265 Implementation of device drivers, file systems, network stacks, and so
 266 forth are relegated to userspace.  Patrick Bridges maintains a list of
 267 such \htmladdnormallinkfoot{microkernel operating systems}{http://www.cs.arizona.edu/people/bridges/os/microkernel.html}.
 268
 269 Also related is the idea of a user-space filesystem, which has been
 270 implemented in a number of contexts.  Some examples include Klaus
 271 Schauser's \htmladdnormallinkfoot{UFO
 272 Project}{http://www.cs.ucsb.edu/projects/ufo/index.html} for Solaris,
 273 and Jeremy Fitzhardinge's (no longer maintained)
 274 \htmladdnormallinkfoot{UserFS}{http://www.goop.org/~jeremy/userfs/}
 275 for Linux 1.x.  The \htmladdnormallinkfoot{UFO
 276 paper}{http://www.cs.ucsb.edu/projects/ufo/97-usenix-ufo.ps} is also
 277 notable because it has a good survey of similar projects that
 278 integrate user-space code with system calls.
 279
 280 \subsection{Limitations and Future Work}
 281
 282 In its current form, FUSD is useful and has proven to be quite
 283 stable---we use it in production systems.  However, it does have some
 284 limitations that could benefit from the attention of developers.
 285 Contributions to correct any of these deficiencies are welcomed!
 286 (Many of these limitations will not make sense without having read the
 287 rest of the documentation first.)
 288
 289
 290 \begin{itemize}
 291 \item Currently, FUSD only supports implementation of character
 292 devices.  Block devices and network devices are not supported yet.
 293
 294 \item The kernel has 15 different callbacks in its {\tt
 295 file\_operations} structure.  The current version of FUSD does not
 296 proxy some of the more obscure ones out to userspace.
 297
 298 \item Currently, all system calls that FUSD understands are proxied
 299 from the FUSD kernel module to userspace.  Only the userspace library
 300 knows which callbacks have actually been registered by the FUSD
 301 driver.  For example, the kernel may proxy a write() system call to
 302 user-space even if the driver has not registered a write() callback
 303 with fusd\_register().
 304
 305 fusd\_register() should, but currently does not, tell the kernel
 306 module which callbacks it wants to receive, per-device.  This will be
 307 more efficient because it will prevent useless system calls for
 308 unsupported operations.  In addition, it will lead to more logical and
 309 consistent behavior by allowing the kernel to use its default
 310 implementations of certain functions such as writev(), instead of
 311 being fooled into thinking the driver has an implementation of it in
 312 cases where it doesn't.
 313
 314 \item It should be possible to write a FUSD library in any language
 315 that supports reads and writes on raw file descriptors.  In the
 316 future, it might be possible to write FUSD device drivers in a variety
 317 of languages---Perl, Python, maybe even Java.  However, the current
 318 implementation has only a C library.
 319
 320 \item It's possible for drivers that use FUSD to deadlock---for
 321 example, if a driver tries to open itself.  In this one case, FUSD
 322 returns {\tt -EDEADLOCK}.  However, deadlock protection should be
 323 expanded to more general detection of cycles of arbitrary length.
 324
 325 \item FUSD should provide a /proc interface that gives debugging and
 326 status information, and allows parameter tuning.
 327
 328 \item FUSD was written with efficiency in mind, but a number of
 329 important optimizations have not yet been implemented.  Specifically,
 330 we'd like to try to reduce the number of memory copies by using a
 331 buffer shared between user and kernel space to pass messages.
 332
 333 \item FUSD currently requires devfs, which is used to dynamically
 334 create device files under {\tt /dev} when a FUSD driver registers
 335 itself.   This is, perhaps, the most convenient and useful paradigm
 336 for FUSD.  However, some users have asked if it's possible to use FUSD
 337 without devfs.  This should be possible if FUSD drivers bind to device
 338 major numbers instead of device file names.
 339
 340 \end{itemize}
 341
 342
 343
 344
 345 \subsection{Author Contact Information and Acknowledgments}
 346
 347 The original version of FUSD was written by Jeremy Elson
 348 \htmladdnormallink{(jelson@circlemud.org)}{mailto:jelson@circlemud.org}
 349 and Lewis Girod at Sensoria Corporation.
 350 Sensoria no longer maintains public releases of FUSD, but the same
 351 authors have since forked the last public release and continue to
 352 maintain FUSD from the University of California, Los Angeles.
 353
 354 If you have bug reports, patches, suggestions, or any other comments,
 355 please feel free to contact the authors.
 356
 357 FUSD has two
 358 \htmladdnormallinkfoot{SourceForge}{http://www.sourceforge.net}-host
 359 mailing lists: a low-traffic list for announcements ({\tt fusd-announce})
 360 and a list for general discussion ({\tt fusd-devel}).  Subscription
 361 information for both lists is available at the
 362 \htmladdnormallink{SourceForge's FUSD mailing list
 363 page}{http://sourceforge.net/mail/?group_id=36326}.
 364
 365 For the latest releases and information about FUSD, please see the
 366 \htmladdnormallinkfoot{official FUSD home
 367 page}{http://www.circlemud.org/jelson/software/fusd}.
 368
 369
 370
 371 \subsection{Licensing Information}
 372
 373 FUSD is free software, distributed under a GPL-compatible license (the
 374 ``new'' BSD license, with the advertising clause removed).  The
 375 license is enumerated in its entirety below.
 376
 377 Copyright (c) 2001, Sensoria Corporation; (c) 2003 University of
 378 California, Los Angeles.  All rights reserved.
 379
 380 Redistribution and use in source and binary forms, with or without
 381 modification, are permitted provided that the following conditions are
 382 met:
 383 \begin{itemize}
 384 \item Redistributions of source code must retain the above copyright
 385 notice, this list of conditions and the following disclaimer.
 386
 387 \item Redistributions in binary form must reproduce the above
 388 copyright notice, this list of conditions and the following disclaimer
 389 in the documentation and/or other materials provided with the
 390 distribution.
 391
 392 \item Neither the names of Sensoria Corporation or UCLA, nor the
 393 names of other contributors may be used to endorse or promote products
 394 derived from this software without specific prior written permission.
 395 \end{itemize}
 396
 397 THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
 398 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 399 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 400 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS
 401 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 402 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 403 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
 404 BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 405 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
 406 OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
 407 IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 408
 409 \section{Why use FUSD?}
 410 \label{use-cases}
 411
 412 One basic question about FUSD that one might ask is: what is it good
 413 for?  Why use it?  In this section, we describe some of the situations
 414 in which FUSD has been the solution for us.
 415
 416 \subsection{Device Driver Layering}
 417
 418 A problem that comes up frequently in modern operating systems is
 419 contention for a single resource by multiple competing processes.  In
 420 UNIX, it's the job of a device driver to coordinate access to such
 421 resources.  By accepting requests from user processes and (for
 422 example) queuing and serializing them, it becomes safe for processes
 423 that know nothing about each other to make requests in parallel to the
 424 same resource.  Of course, kernel drivers do this job already, but
 425 they typically operate on top of hardware directly.  However, kernel
 426 drivers can't easily be layered on top of {\em other device drivers}.
 427
 428 For example, consider a device such as a modem that is connected to a
 429 host via a serial port.  Let's say we want to implement a device
 430 driver that allows multiple users to dial the telephone (e.g., {\tt
 431 echo 1-310-555-1212 > /dev/phone-dialer}).  Such a driver should be
 432 layered {\em on top of} the serial port driver---that is, it most
 433 likely wants to write to {\tt /dev/ttyS0}, not directly to the UART
 434 hardware itself.
 435
 436 While it is possible to write to a logical file from within a kernel
 437 device driver, it is both tricky and considered bad practice.  In the
 438 \htmladdnormallinkfoot{words of kernel hacker Dick Johnson}
 439 {http://www.uwsg.indiana.edu/hypermail/linux/kernel/0005.3/0061.html},
 440 ``You should never write a [kernel] module that requires reading or
 441 writing to any logical device. The kernel is the thing that translates
 442 physical I/O to logical I/O. Attempting to perform logical I/O in the
 443 kernel is effectively going backwards.''
 444
 445 With FUSD, it's possible to layer device drivers because the driver is
 446 a user-space process, not a kernel module.  A FUSD implementation of
 447 our hypothetical {\tt /dev/phone-dialer} can open {\tt /dev/ttyS0}
 448 just as any other process would.
 449
 450 Typically, such layering is accomplished by system daemons.  For
 451 example, the {\tt lpd} daemon manages printers at a high level.  Since
 452 it is a user-space process, it can access the physical printer devices
 453 using kernel device drivers (for example, using printer or network
 454 drivers).  There a number of advantages to using FUSD instead:
 455 \begin{itemize}
 456 \item Using FUSD, a daemon/driver can create a standard device file
 457 which is accessible by any program that knows how to use the POSIX
 458 system call interface.  Some trickery is possible using named
 459 pipes and FIFOs, but quickly becomes difficult because of multiplexed
 460 writes from multiple processes.
 461 \item FUSD drivers receive the UID, GID, and process ID along with
 462 every file operation, allowing the same sorts of security policies to
 463 be implemented as would be possible with a real kernel driver.  In
 464 contrast, writes to a named pipe, UDP, and so forth are ``anonymous.''
 465 \end{itemize}
 466
 467 \subsection{Use of User-Space Libraries}
 468
 469 Since a FUSD driver is just a regular user-space program, it can
 470 naturally use any of the enormous body of existing libraries that
 471 exist for almost any task.  FUSD drivers can easily incorporate user
 472 interfaces, encryption, network protocols, threads, and almost
 473 anything else.  In contrast, porting arbitrary C code into the kernel
 474 is difficult and usually a bad idea.
 475
 476 \subsection{Driver Memory Protection}
 477
 478 Since FUSD drivers run in their own process space, the rest of the
 479 system is protected from them.  A buggy or malicious FUSD driver, at
 480 the very worst, can only corrupt itself.  It's not possible for it to
 481 corrupt the kernel, other FUSD drivers, or even the processes that are
 482 using its devices.  In contrast, a buggy kernel module can bring down
 483 any process in the system, or the entire kernel itself.
 484
 485 \subsection{Giving libraries language independence and standard
 486 notification interfaces}
 487
 488 One particularly interesting application of FUSD that we've found very
 489 useful is as a way to let regular user-space libraries export device
 490 file APIs.  For example, imagine you had a library which factored
 491 large composite numbers.  Typically, it might have a C
 492 interface---say, a function called {\tt int\ *factorize(int\ bignum)}.
 493 With FUSD, it's possible to create a device file interface---say, a
 494 device called {\tt /dev/factorize} to which clients can {\tt write(2)}
 495 a big number, then {\tt read(2)} back its factors.
 496
 497 This may sound strange, but device file APIs have at least three
 498 advantages over a typical library API.  First, it becomes much more
 499 language independent---any language that can make system calls can
 500 access the factorization library.  Second, the factorization code is
 501 running in a different address space; if it crashes, it won't crash or
 502 corrupt the caller.  Third, and most interestingly, it is possible to
 503 use {\tt select(2)} to wait for the factorization to complete.  {\tt
 504 select(2)} would make it easy for a client to factor a large number
 505 while remaining responsive to {\em other} events that might happen in
 506 the meantime.  In other words, FUSD allows normal user-space libraries
 507 to integrate seamlessly with UNIX's existing, POSIX-standard event
 508 notification interface: {\tt select(2)}.
 509
 510 \subsection{Development and Debugging Convenience}
 511
 512 FUSD processes can be developed and debugged with all the normal
 513 user-space tools.  Buggy drivers won't crash the system, but instead
 514 dump cores that can be analyzed.  All of your favorite visual
 515 debuggers, memory bounds checkers, leak detectors, profilers, and
 516 other tools can be applied to FUSD drivers as they would to any other
 517 program.
 518
 519 \section{Installing FUSD}
 520
 521 This section describes the installation procedure for FUSD.  It
 522 assumes a good working knowledge of Linux system administration.
 523
 524
 525 \subsection{Prerequisites}
 526
 527 Before installing FUSD, make sure you have all of the following
 528 packages installed and working correctly:
 529
 530 \begin{itemize}
 531 \item {\bf Linux kernel 2.4.0 or later}.  FUSD was developed under
 532 2.4.0 and should work with any kernel in the 2.4 series.
 533
 534 \item {\bf devfs installed and running.}  FUSD dynamically registers
 535 devices using devfs, the Linux device filesystem by Richard Gooch.
 536 For FUSD to work, devfs must be installed and running on your system.
 537 For more information about devfs installation, see the
 538 \htmladdnormallinkfoot{devfs home
 539 page}{http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html}.
 540
 541 Note that some distributions make installation devfs easier.  RedHat
 542 7.1, for example, already has all of the necessary daemons and
 543 configuration changes integrated.  devfs can be installed simply by
 544 recompiling the kernel with devfs support enabled and reconfiguring
 545 LILO to pass {\tt "devfs=mount"} to the kernel.
 546 \end{itemize}
 547
 548
 549 \subsection{Compiling FUSD as a Kernel Module}
 550
 551 Before compiling anything, take a look at the Makefile in FUSD's home
 552 directory.  Adjust any constants that are not correct.  In particular,
 553 make sure {\tt KERNEL\_HOME} correctly reflects the place where your
 554 kernel sources are installed, if they aren't in the default location
 555 of {\tt /usr/src/linux}.
 556
 557 Then, type {\tt make}.  It should generate a directory whose name
 558 looks something like {\tt obj.i686-linux}, or some variation depending
 559 on your architecture.  Inside of that directory will be a number of
 560 files, including:
 561 \begin{itemize}
 562 \item kfusd.o -- The FUSD kernel module
 563 \item libfusd.a -- The C library used to talk to the kernel module
 564 \item Example programs -- linked against libfusd.a
 565 \end{itemize}
 566
 567 Compilation of the kernel module will fail if the dependencies
 568 described in the previous section are not satisfied.  The module must
 569 be compiled again Linux kernel must be v2.4.0 or later, and the kernel
 570 must have devfs support enabled.
 571
 572
 573 \subsection{Testing and Troubleshooting}
 574
 575 Once everything has been compiled, give it a try to see if it actually
 576 does something.  First, use {\tt insmod} to insert the FUSD kernel
 577 module, e.g. {\tt insmod obj.i686-linux/kfusd.o}.  A greeting message
 578 similar to ``{\tt fusd: starting, Revision: 1.50}'' should appear in
 579 the kernel log (accessed using the {\tt dmesg} command, or by typing
 580 {\tt cat /proc/kmsg}).  You can verify the module has been inserted by
 581 typing {\tt lsmod}, or alternatively {\tt cat /proc/modules}.
 582
 583 Once the module has been inserted successfully, trying running the
 584 {\tt helloworld} example program.  When run, the program should print
 585 a greeting message similar to {\tt /dev/hello-world should now exist -
 586 calling fusd\_run}.  This means everything is working; the daemon is
 587 now blocked, waiting for requests to the new device.  From another
 588 shell, type {\tt cat /dev/hello-world}.  You should see {\tt Hello,
 589 world!} printed in response.  Try killing the test program; the
 590 corresponding device file should disappear.
 591
 592 If nothing seems to be working, try looking at the kernel message log
 593 (type {\tt dmesg} or {\tt cat /proc/kmsg}) to see if there are any
 594 errors.  If nothing seems obviously wrong, try turning on FUSD kernel
 595 module debugging by defining {\tt CONFIG\_FUSD\_DEBUG} in kfusd.c,
 596 then recompiling and reinserting the module.
 597
 598
 599 \subsection{Installation}
 600
 601 Typing {\tt make install} will copy the FUSD library, header files,
 602 and man pages into {\tt /usr/local}.  The FUSD kernel module is {\em
 603 not} installed automatically because of variations among different
 604 Linux distributions in how this is accomplished.  You may want to
 605 arrange to have the module start automatically on boot by (for
 606 example) copying it into {\tt /lib/modules/your-kernel-version}, and
 607 adding it to {\tt /etc/modules.conf}.
 608
 609
 610 \subsection{Making FUSD Part of the Kernel Proper}
 611
 612 The earlier instructions, by default, create a FUSD kernel module.
 613 If desired, it's also very easy to build FUSD right into the kernel,
 614 instead:
 615 \begin{enumerate}
 616 \item Unpack the 2.4 kernel sources and copy all the files in the {\tt
 617 include} and {\tt kfusd} directories into your kernel source tree,
 618 under {\tt drivers/char}.  For example, if FUSD is in
 619 your home directory, and your kernel is in {\tt /usr/src/linux}:
 620 \begin{verbatim}
 621         cp ~/fusd/kfusd/* ~/fusd/include/* /usr/src/linux/drivers/char
 622 \end{verbatim}
 623
 624 \item Apply the patch found in FUSD's {\tt patches} directory to your
 625 kernel source tree.  For example:
 626 \begin{verbatim}
 627         cd /usr/src/linux
 628         patch -p0 < ~/fusd/patches/fusd-inkernel.patch
 629 \end{verbatim}
 630 The FUSD in-kernel patch doesn't actually change any kernel sources
 631 proper; it just adds FUSD to the kernel configuration menu and
 632 Makefile.
 633 \item Using your kernel configurator of choice (e.g. {\tt make
 634 menuconfig}), turn on the FUSD options.  It will be under the
 635 ``Character devices'' menu.
 636 \item Build and install the kernel as usual.
 637 \end{enumerate}
 638
 639
 640 \section{Basic Device Creation}
 641
 642 Enough introduction---it's time to actually create a basic device
 643 driver using FUSD!
 644
 645 This following sections will illustrate various techniques using
 646 example programs.  To save space, interesting excerpts are shown
 647 instead of entire programs.  However, the {\tt examples} directory
 648 of the FUSD distribution contains all the examples in their
 649 entirety.  They can actually be compiled and run on a system with the
 650 FUSD kernel module installed.
 651
 652 Where this text refers to example program line numbers, it refers to
 653 the line numbers printed alongside the excerpts in the manual---not
 654 the line numbers of the actual programs in the {\tt examples}
 655 directory.
 656
 657
 658 \subsection{Using {\tt fusd\_register} to create a new device}
 659 \label{using-fusd-register}
 660
 661 We saw an example of a simple driver, helloworld.c, in
 662 Program~\ref{helloworld.c} on page~\pageref{helloworld.c}.  Let's go
 663 back and examine that program now in more detail.
 664
 665 The FUSD ball starts rolling when the {\tt fusd\_register} function is
 666 called, as shown on line 40.  This function tells the FUSD kernel
 667 module:
 668 \begin{itemize}
 669 \item {\tt char *name}---The name of the device being created.  The
 670 prefix (such as {\tt /dev/}) must match the location where devfs has
 671 been mounted.  Names containing slashes (e.g., {\tt
 672 /dev/my-devices/dev1}) are legal; devfs creates subdirectories
 673 automatically.
 674 \item {\tt mode\_t mode}---The device's default permissions.  This is
 675 usually specified using an octal constant with a leading 0---{\tt 0666}
 676 (readable and writable by everyone) instead of the incorrect decimal
 677 constant {\tt 666}.
 678 \item {\tt void *device\_info}---Private data that should be passed to
 679 callback functions for this device.  The use of this field is
 680 described in Section~\ref{device-info}.
 681 \item {\tt struct fusd\_file\_operations *fops}---A structure containing
 682 pointers to the callback functions that should be called by FUSD
 683 in response to certain events.
 684 \end{itemize}
 685
 686 If device registration is successful, {\tt fusd\_register} returns a
 687 {\em device handle}---a small integer $\ge0$.  On errors, it returns
 688 -1 and sets the global variable {\tt errno} appropriately.  In
 689 reality, the device handle you get is a plain old file descriptor,
 690 as we'll see in Section~\ref{selecting}.
 691
 692 Although Program~\ref{helloworld.c} only calls {\tt fusd\_register}
 693 once, it can be called multiple times if the FUSD driver is handling
 694 more than one device as we'll see in Program~\ref{drums.c}.
 695
 696 There is intentional similarity between {\tt fusd\_register()} and the
 697 kernel's device registration functions, such as {\tt
 698 devfs\_register()} and {\tt register\_chrdev()}.  In many ways, FUSD's
 699 interface is meant to mirror the kernel interface as closely as
 700 possible.
 701
 702 The {\tt fusd\_file\_operations} structure, defined in {\tt fusd.h},
 703 contains a list of callbacks that are used in response to different
 704 system calls executed on a file.  It is similar to the kernel's {\tt
 705 file\_operations} structure, accepting callbacks for system calls such
 706 as {\tt open()}, {\tt close()}, {\tt read()}, {\tt write()}, and {\tt
 707 ioctl()}.  For the most part, the prototypes of FUSD file operation
 708 callbacks are the same as their kernel cousins, with one important
 709 exception.  The first argument of FUSD callbacks is always a pointer
 710 to a {\tt fusd\_file\_info} structure; it contains information that
 711 can be used to identify the file.  This structure is used instead of
 712 the kernel's {\tt file} and {\tt inode} structures, and will be
 713 described in more detail later.
 714
 715 In lines 35--38 of Program~\ref{helloworld.c}, we create and
 716 initialize a {\tt fusd\_file\_operations} structure.  A GCC-specific C
 717 extension allows us to name structure fields explicitly in the
 718 initializer.  This style may look strange, but it guards against
 719 errors in the future in case the order of fields in the structure ever
 720 changes.  The 2.4 kernel series uses the same trick.
 721
 722 After calling {\tt fusd\_register()} on line 40, the example program
 723 calls {\tt fusd\_run()} on line 44.  This function turns control over
 724 to the FUSD framework.  fusd\_run blocks the driver until one of the
 725 devices it registered needs to be serviced.  Then, it calls the
 726 appropriate callback and blocks again until the next event.
 727
 728 Now, imagine that a user types {\tt cat /dev/hello-world}.  What
 729 happens?  Recall first what the {\tt cat} program itself does: opens a
 730 file, reads from it until it receives an EOF (printing whatever it
 731 reads to stdout), then closes it.  {\tt cat} works the same way
 732 regardless of what it's reading---be it a a FUSD device, a regular
 733 file, a serial port, or anything else.  The {\tt strace} program is a
 734 great way to see this in action; see Appendix~\ref{strace} for
 735 details.
 736
 737 \subsection{The {\tt open} and {\tt close} callbacks}
 738 \label{open-close}
 739
 740 The first two callbacks that most drivers typically implement are {\tt
 741 open} and {\tt close}.  Each of these two functions are passed just
 742 one argument---the {\tt fusd\_file\_info} structure that describes the
 743 instance of the file being opened or closed.  Use of the information
 744 in that structure will be covered in more detail in
 745 Section~\ref{fusd-file-info}.
 746
 747 The semantics of an {\tt open} callback's return value are exactly the
 748 same as inside the kernel:
 749 \begin{itemize}
 750 \item 0 means success, and the file is opened.  If the file is allowed
 751 to open, the kernel returns a valid file descriptor to the client.
 752 Using that descriptor, other callbacks may be called for that file,
 753 including (at least) a {\tt close} callback.
 754
 755 \item A negative number indicates a failure, and that the file should
 756 not be opened.  Such return values should {\em always} be the
 757 specified as a negative {\tt errno} value such as {\tt -EPERM}, {\tt
 758 -EBUSY}, {\tt -ENODEV}, {\tt -ENOMEM}, and so on.  For example, if the
 759 callback returns {\tt -EPERM}, the caller's {\tt open()} will return
 760 -1, with {\tt errno} set to {\tt EPERM}.  A complete list of possible
 761 return values can be found in the Linux kernel sources, under {\tt
 762 include/asm/errno.h}.
 763 \end{itemize}
 764
 765 If an {\tt open} callback returns 0 (success), a driver is {\em
 766 guaranteed} to receive exactly one {\tt close} callback for that file
 767 later.  By the same token, the close callback {\em will not} be called
 768 if the open fails.  Therefore, {\tt open} callbacks that can return
 769 failure must be sure to deallocate any resources they might have
 770 allocated before returning a failure.
 771
 772 Let's return to our example in Program~\ref{helloworld.c}, which
 773 creates the {\tt /dev/hello-world} device.  If a user types {\tt cat
 774 /dev/hello-world}, {\tt cat} will will use the {\tt open(2)} system
 775 call to open the file.  FUSD will then proxy that system call to the
 776 driver and activate the callback that was registered as the {\tt open}
 777 callback.  Recall from line 36 of Program~\ref{helloworld.c} that we
 778 registered {\tt do\_open\_or\_close}, which appears on line 8.
 779
 780 In {\tt helloworld.c}, the {\tt open} callback always returns 0, or
 781 success.  However, in a real driver, something more interesting will
 782 probably happen---permissions checks, memory allocation for
 783 state-keeping, and so forth.  The corresponding {\em de}-allocation of
 784 those resources should occur in the {\tt close} callback, which is
 785 called when a user application calls {\tt close} on their file
 786 descriptor.  {\tt close} callbacks are allowed to return error values,
 787 but this does not prevent the file from actually closing.
 788
 789
 790
 791 \subsection{The {\tt read} callback}
 792 \label{read-callback}
 793
 794 Returning to our {\tt cat /dev/hello-world} example, what happens
 795 after the {\tt open} is successful?  Next, {\tt cat} will try to use
 796 {\tt read(2)}, which will get proxied by FUSD to the function {\tt
 797 do\_read} on line 13.  This function takes some additional arguments
 798 that we didn't see in the open and close callbacks:
 799 \begin{itemize}
 800 \item {\tt struct fusd\_file\_info *file}---The first argument to all
 801 callbacks, containing information which describes the file; see
 802 Section~\ref{fusd-file-info}.
 803 \item {\tt char *user\_buffer}---The buffer that the callback should use to
 804 write data that it is returning to the user.
 805 \item {\tt size\_t user\_length}---The maximum number of bytes
 806 requested by the user.  The driver is allowed to return fewer bytes,
 807 but should never write more then {\tt user\_length} bytes into {\tt
 808 user\_buffer}.
 809 \item {\tt loff\_t *offset}---A pointer to an integer which represents
 810 the caller's offset into the file (i.e., the user's file pointer).
 811 This value can be modified by the callback; any change will be
 812 propagated back to the user's file pointer inside the kernel.
 813 \end{itemize}
 814
 815 The semantics of the return value are the same as if the
 816 callback were being written inside the kernel itself:
 817 \begin{itemize}
 818 \item Positive return values indicate success.  If the call is
 819 successful, and the driver has copied data into {\tt buffer}, the
 820 return value indicates how many bytes were copied.  This number should
 821 never be greater than the {\tt user\_length} argument.
 822 \item A 0 return value indicates EOF has been reached on the file.
 823 \item As in the {\tt open} and {\tt close} callbacks, negative values
 824 (such as -EPERM, -EPIPE, or -ENOMEM) indicate errors. Such values will
 825 cause the user's {\tt read()} to return -1 with errno set
 826 appropriately.
 827 \end{itemize}
 828
 829 The first time a read is done on a device file, the user's file
 830 pointer ({\tt *offset}) is 0.  In the case of this first read, a
 831 greeting message of {\tt Hello, world!} is copied back to the user, as
 832 seen on line 24.  The user's file pointer is then advanced.  The next
 833 read therefore fails the comparison at line 20, falling straight
 834 through to return 0, or EOF.
 835
 836 In this simple program, we also see an example of an error return on
 837 line 22: if the user tries to do a read smaller than the length of the
 838 greeting message, the read will fail with -EINVAL.  (In an actual
 839 driver, it would normally not be an error for a user to provide a
 840 smaller read buffer than the size of the available data.  The right
 841 way for drivers to handle this situation is to return partial data,
 842 then move {\tt *offset} forward so that the remainder is returned on
 843 the next {\tt read()}.  We see an example of this in
 844 Program~\ref{echo.c}.)
 845
 846 \subsection{The {\tt write} callback}
 847
 848 Program~\ref{helloworld.c} illustrated how a driver could return data
 849 {\em to} a client using the {\tt read} callback.  As you might expect, there
 850 is a corresponding {\tt write} callback that allows the driver to
 851 receive data {\em from} a client.  {\tt write} takes four arguments,
 852 similar to the {\tt read} callback:
 853
 854 \begin{itemize}
 855 \item {\tt struct fusd\_file\_info *file}---The first argument to all
 856 callbacks, containing information which describes the file; see
 857 Section~\ref{fusd-file-info}.
 858 \item {\tt const char *user\_buffer}---Pointer to data being written
 859 by the client (read-only).
 860 \item {\tt size\_t user\_length}---The number of bytes pointed to by
 861 {\tt user\_buffer}.
 862 \item {\tt loff\_t *offset}---A pointer to an integer which represents
 863 the caller's offset into the file (i.e., the user's file pointer).
 864 This value can be modified by the callback; any change will be
 865 propagated back to the user's file pointer inside the kernel.
 866 \end{itemize}
 867
 868 The semantics of {\tt write}'s return value are the same as in a
 869 kernel callback:
 870 \begin{itemize}
 871 \item Positive return values indicate success and indicate how many
 872 bytes of the user's buffer were successfully written (i.e.,
 873 successfully processed by the driver in some way).  The return value
 874 may be less than or equal to the {\tt user\_length} argument, but
 875 should never be greater.
 876 \item 0 should only be returned in response to a {\tt write} of length
 877 0.
 878 \item Negative values (such as -EPERM, -EPIPE, or -ENOMEM) indicate
 879 errors.  Such values will cause the user's {\tt write()} to return -1
 880 with errno set appropriately.
 881 \end{itemize}
 882
 883 Program~\ref{echo.c}, echo.c, is an example implementation of a device
 884 ({\tt /dev/echo}) that uses both {\tt read()} and {\tt write()}
 885 callbacks.  A client that tries to {\tt read()} from this device will
 886 get the contents of the most recent {\tt write()}.  For example:\\
 887 \begin{minipage}{\textwidth}
 888 \vspace{\baselineskip}
 889 \begin{verbatim}
 890 % echo Hello there > /dev/echo
 891 % cat /dev/echo
 892 Hello there
 893 % echo Device drivers are fun > /dev/echo
 894 % cat /dev/echo
 895 Device drivers are fun
 896
 897 \end{verbatim}
 898 \end{minipage}
 899
 900 \begin{Program}
 901 \listinginput[5]{1}{echo.c.example}
 902 \caption{echo.c: Using both {\tt read} and {\tt write} callbacks}
 903 \label{echo.c}
 904 \end{Program}
 905
 906 The implementation of {\tt /dev/echo} keeps a global variable, {\tt
 907 data}, which serves as a cache for the data most recently written to
 908 the driver by a client program.  The driver does not assume the data
 909 is null-terminated, so it also keeps track of the number of bytes of
 910 data available.  (These two variables appear on lines 1--2.)
 911
 912 The driver's {\tt write} callback first frees any data which might
 913 have been allocated by a previous call to write (lines 26--29).  Next,
 914 on line 33, it attempts to allocate new memory for the new data
 915 arriving.  If the allocation fails, {\tt -ENOMEM} is returned to the
 916 client.  If the allocation is successful, the driver copies the data
 917 into its local buffer and stores its length (lines 37--38).  Finally,
 918 the driver tells the user that the entire buffer was consumed by
 919 returning a value equal to the number of bytes the user tried to write
 920 ({\tt user\_length}).
 921
 922 The {\tt read} callback has some extra features that we did not see in
 923 Program~\ref{helloworld.c}'s {\tt read()} callback.  The most
 924 important is that it allows the driver to read the available data {\em
 925 incrementally}, instead of requiring that the first {\tt read()}
 926 executed by the client has enough space for all the data the driver
 927 has available.  In other words, a client can do two 50-byte reads,
 928 and expect the same effect as if it had done a single 100-byte read.
 929
 930 This is implemented using {\tt *offset}, the user's file pointer.  If
 931 the user is trying to read past the amount of data we have available,
 932 the driver returns EOF (lines 8--9).  Normally, this happens after the
 933 client has finished reading data.  However, in this driver, it might
 934 happen on a client's first read if nothing has been written to the
 935 driver yet or if the most recent write's memory allocation failed.
 936
 937 If there is data to return, the driver computes the number of bytes
 938 that should be copied back to the client---the minimum of the number
 939 of bytes the user asked for, and the number of bytes of data that this
 940 client hasn't seen yet (line 12).  This data is copied back to the
 941 user's buffer (line 15), and the user's file pointer is advanced
 942 accordingly (line 16).  Finally, on line 19, the client is told how
 943 many bytes were copied to its buffer.
 944
 945
 946 \subsection{Unregistering a device with {\tt fusd\_unregister()}}
 947
 948 All devices registered by a driver are unregistered automatically when
 949 the program exits (or crashes).  However, the {\tt fusd\_unregister()}
 950 function can be used to unregister a device without terminating the
 951 entire driver.  {\tt fusd\_unregister} takes one argument: a device
 952 handle (i.e., the return value from {\tt fusd\_register()}).
 953
 954 A device can be unregistered at any time.  Any client system calls
 955 that are pending when a device is unregistered will return immediately
 956 with an error.  In this case, {\tt errno} will be set to {\tt -EPIPE}.
 957
 958
 959 \section{Using Information in {\tt fusd\_file\_info}}
 960
 961 \label{fusd-file-info}
 962
 963 We mentioned in the previous sections that the first argument to every
 964 callback is a pointer to a {\tt fusd\_file\_info} structure.  This
 965 structure contains information that can be useful to driver
 966 implementers in deciding how to respond to a system call request.
 967
 968 The fields of {\tt fusd\_file\_info} structures fall into several
 969 categories:
 970 \begin{itemize}
 971 \item {\em Read-only.}  The driver can inspect the value, but changing
 972 it will have no effect.
 973 \begin{itemize}
 974 \item {\tt pid\_t pid}: The process ID of the process making the
 975 request
 976 \item {\tt uid\_t uid}: The user ID of the owner of the process making
 977 the request
 978 \item {\tt gid\_t gid}: The group ID of the owner of the process making
 979 the request
 980 \end{itemize}
 981 \item {\em Read-write.}  Any changes to the value will be propagated
 982 back to the kernel and be written to the appropriate in-kernel
 983 structure.
 984 \begin{itemize}
 985 \item {\tt unsigned int flags}: A copy of the {\tt f\_flags} field in
 986 the kernel's {\tt file} structure.  The flags are an or'd-together set
 987 of the kernel's {\tt O\_} series of flags: {\tt O\_NONBLOCK}, {\tt
 988 O\_APPEND}, {\tt O\_SYNC}, etc.
 989 \item {\tt void *device\_info}: The data passed to {\tt
 990 fusd\_register} when the device was registered; see
 991 Section~\ref{device-info} for details
 992 \item {\tt void *private\_data}: A generic per-file-descriptor pointer
 993 usable by the driver for its own purposes, such as to keep state (or a
 994 pointer to state) that should be maintained between operations on the
 995 same instance of an open file.  It is guaranteed to be NULL when the
 996 file is first opened.  See Section~\ref{private-data} for more
 997 details.
 998 \end{itemize}
 999 \item {\em Hidden fields.}  The driver should not touch these fields
1000 (such as {\tt fd}).  They contain state used by the FUSD library to
1001 generate the reply sent to the kernel.
1002 \end{itemize}
1003
1004 {\bf Important note:} the value of the {\tt fusd\_file\_info} pointer
1005 itself has {\em no meaning}.  Repeated requests on the same file
1006 descriptor {\em will not} generate callbacks with identical {\tt
1007 fusd\_file\_info} pointer values, as would be the case with an
1008 in-kernel driver.  In other words, if a driver needs to keep state in
1009 between successive system calls on a user's file descriptor, it {\em
1010 must} store that state using the {\tt private\_data} field.  The {\tt
1011 fusd\_file\_info} pointer itself is ephemeral; the data to which it
1012 points is persistent.
1013
1014 Program~\ref{uid-filter.c} shows an example of how a driver might make
1015 use of the data in the {\tt fusd\_file\_info} structure.  Much of the
1016 driver is identical to helloworld.c.  However, instead of printing a
1017 static greeting, this new program generates a custom message each time
1018 the device file is read, as seen on line 25.  The message contains the
1019 PID of the user process that requested the read ({\tt file->pid}).
1020
1021 \begin{Program}
1022 \listinginput[5]{1}{uid-filter.c.example}
1023 \caption{uid-filter.c: Inspecting data in {\tt fusd\_file\_info} such
1024 as UID and PID of the calling process}
1025 \label{uid-filter.c}
1026 \end{Program}
1027
1028 In addition, Program~\ref{uid-filter.c}'s {\tt open} callback does not
1029 return 0 (success) unconditionally as it did in
1030 Program~\ref{helloworld.c}.  Instead, it checks (on line 7) to make
1031 sure the UID of the process trying to read from the device ({\tt
1032 file->uid}) matches the UID under which the driver itself is running
1033 ({\tt getuid()}).  If they don't match, -EPERM is returned.  In other
1034 words, only the user who ran the driver is allowed to read from the
1035 device that it creates.  If any other user---including root!---tries
1036 to open it, a ``Permission denied'' error will be generated.
1037
1038
1039 \subsection{Registration of Multiple Devices, and Passing Data to Callbacks}
1040
1041 \label{device-info}
1042
1043 Device drivers frequently expose several different ``flavors'' of a
1044 device.  For example, a single magnetic tape drive will often have
1045 many different device files in {\tt /dev}.  Each device file
1046 represents a different combination of options such as
1047 rewind/no-rewind, or compressed/uncompressed.  However, they access
1048 the same physical tape drive.
1049
1050 Traditionally, the device file's {\em minor number} was used to
1051 communicate the desired options with device drivers.  But, since devfs
1052 dynamically (and unpredictably) generates both major and minor numbers
1053 every time a device is registered, a different technique was
1054 developed.  When using devfs, drivers are allowed to associate a value
1055 (of type {\tt void *}) with each device they register.  This facility
1056 takes the place of the minor number.
1057
1058 The devfs solution is also used by FUSD.  The mysterious third
1059 argument to {\tt fusd\_register} that we mentioned in
1060 Section~\ref{using-fusd-register} is an arbitrary piece of data that
1061 can be passed to FUSD when a device is registered.  Later, when a
1062 callback is activated, the contents of that argument are available in
1063 the {\tt device\_info} member of the {\tt fusd\_file\_info} structure.
1064
1065 Program~\ref{drums.c} shows an example of this technique, inspired by
1066 Alessandro Rubini's similar devfs tutorial
1067 \htmladdnormallinkfoot{published in Linux
1068 Magazine}{http://www.linux.it/kerneldocs/devfs/}.  It creates a number
1069 of devices in the {\tt /dev/drums} directory, each of which is useful
1070 for generating a different kind of ``sound''---{\tt /dev/drums/bam},
1071 {\tt /dev/drums/boom}, and so on.  Reading from any of these devices
1072 will return a string equal to the device's name.
1073
1074 \begin{Program}
1075 \listinginput[5]{1}{drums.c.example}
1076 \caption{drums.c: Passing private data to {\tt fusd\_register} and
1077 retrieving it from {\tt device\_info}}
1078 \label{drums.c}
1079 \end{Program}
1080
1081 The first thing to notice about {\tt drums.c} is that it registers
1082 more than one FUSD device.  In the loop starting in line 31, it calls
1083 {\tt fusd\_register()} once for every device named in {\tt
1084 drums\_strings} on line 1.  When {\tt fusd\_run()} is called, it
1085 automatically watches every device the driver registered, and
1086 activates the callbacks associated with each device as needed.
1087 Although {\tt drums.c} uses the same set of callbacks for every device
1088 it registers (as can be seen on line 33), each device could have
1089 different callbacks if desired.  (Not shown is the initialization of
1090 {\tt drums\_fops}, which assigns {\tt drums\_read} to be the {\tt
1091 read} callback.)
1092
1093 If {\tt drums\_read} is called for all 6 types of drums, how does it
1094 know which device it's supposed to be servicing when it gets called?
1095 The answer is in the third argument of {\tt fusd\_register()}, which
1096 we were previously ignoring.  Whatever value is passed to {\tt
1097 fusd\_register()} will be passed back to the callback in the {\tt
1098 device\_info} field of the {\tt fusd\_file\_info} structure.  The name
1099 of the drum sound is passed to {\tt fusd\_register} on line 33, and
1100 later retrieved by the driver on line 12.
1101
1102 Although this example uses a string as its {\tt device\_info}, the
1103 pointer can be used for anything---a mode number, a pointer to a
1104 configuration structure, and so on.
1105
1106
1107 \subsection{The difference between {\tt device\_info} and {\tt
1108 private\_data}}
1109
1110 \label{private-data}
1111
1112 As we mentioned in Section~\ref{fusd-file-info}, the {\tt
1113 fusd\_file\_info} structure has two seemingly similar fields, both of
1114 which can be used by drivers to store their own data: {\tt
1115 device\_info} and {\tt private\_data}.  However, there is an important
1116 difference between them:
1117
1118 \begin{itemize}
1119
1120 \item {\tt private\_data} is stored {\em per file descriptor}.  If 20
1121 processes open a FUSD device (or, one process opens a FUSD device 20
1122 times), each of those 20 file descriptors will have their own copy of
1123 {\tt private\_data} associated with them.  This field is therefore
1124 useful to drivers that need to differentiate multiple requests to a
1125 single device that might be serviced in parallel.  (Note that most
1126 UNIX variants, including Linux, do allow multiple processes to share a
1127 single file descriptor---specifically, if a process {\tt open}s a
1128 file, then {\tt fork}s.  In this case, processes will also share a
1129 single copy of {\tt private\_data}.)
1130
1131 The first time a FUSD driver sees {\tt private\_data} (in the {\tt
1132 open} callback), it is guaranteed to be NULL.  Any changes to it by a
1133 driver callback will only affect the state associated with that single
1134 file descriptor.
1135
1136 \item {\tt device\_info} is kept {\em per device}.  That is, {\em all}
1137 clients of a device share a {\em single} copy of {\tt device\_info}.
1138 Unlike {\tt private\_data}, which is always initialized to NULL, {\tt
1139 device\_info} is always initialized to whatever value the driver
1140 passed to {\tt fusd\_register} as described in the previous section.
1141 If a callback changes the copy of {\tt device\_info} in the {\tt
1142 fusd\_file\_info} structure, this has no effect; {\tt device\_info}
1143 can only be set at registration time, with {\tt fusd\_register}.
1144
1145 \end{itemize}
1146
1147 In short, {\tt device\_info} is used to differentiate {\em devices}.
1148 {\tt private\_data} is used to differentiate {\em users of those
1149 devices}.
1150
1151 Program~\ref{drums2.c}, drums2.c, illustrates the difference between
1152 {\tt device\_info} and {\tt private\_data}.  Like the original
1153 drums.c, it creates a bunch of devices in {\tt /dev/drums/}, each of
1154 which ``plays'' a different sound.  However, it also does something
1155 new: keeps track of how many times each device has been opened.  Every
1156 read to any drum gives you the name of its sound as well as your
1157 unique ``user number''.  And, instead of returning just a single line
1158 (as drums.c did), it will keep generating more ``sound'' every time a
1159 {\tt read()} system call arrives.
1160
1161 \begin{Program}
1162 \listinginput[5]{1}{drums2.c.example}
1163 \caption{drums2.c: Using both {\tt device\_info} and {\tt private\_data}}
1164 \label{drums2.c}
1165 \end{Program}
1166
1167 The trick is that we want to keep users separate from each other.  For
1168 example, user one might type:\\
1169 \begin{minipage}{\textwidth}
1170 \vspace{\baselineskip}
1171 \begin{verbatim}
1172 % more /dev/drums/bam
1173 You are user 1 to hear a drum go 'bam'!
1174 You are user 1 to hear a drum go 'bam'!
1175 You are user 1 to hear a drum go 'bam'!
1176 ...
1177
1178 \end{verbatim}
1179 \end{minipage}
1180
1181 Meanwhile, another user in a different shell might type the same
1182 command at the same time, and get different results:\\
1183 \begin{minipage}{\textwidth}
1184 \vspace{\baselineskip}
1185 \begin{verbatim}
1186 % more /dev/drums/bam
1187 You are user 2 to hear a drum go 'bam'!
1188 You are user 2 to hear a drum go 'bam'!
1189 You are user 2 to hear a drum go 'bam'!
1190 ...
1191
1192 \end{verbatim}
1193 \end{minipage}
1194
1195 The idea is that no matter how long those two users go on reading
1196 their devices, the driver always generates a message that is specific
1197 to that user.  The two users' data are not intermingled.
1198
1199 To implement this, Program~\ref{drums2.c} introduces a new {\tt
1200 drum\_info} structure (lines 1-4), which keeps track of both the
1201 drum's name, and the number of time each drum device has been opened.
1202 An instance of this structure, {\tt drums}, is initialized on lines
1203 4-8.  Note that the call to {\tt fusd\_register} (line 45) now passes
1204 a pointer to a {\tt drum\_info} structure.  (This {\tt drum\_info *}
1205 pointer is shared by every instance of a client that opens a
1206 particular type of drum.)
1207
1208 Each time a drum device is opened, its {\tt drum\_info} structure is
1209 retrieved from {\tt device\_info} (line 15).  Then, on line 18, the
1210 {\tt num\_users} field is incremented and the new user number is
1211 stored in {\tt fusd\_file\_info}'s {\tt private\_data} field.  To
1212 reiterate our earlier point: {\em {\tt device\_info} contains
1213 information global to all users of a device, while {\tt private\_data}
1214 has information specific to a particular user of the device.}
1215
1216 It's also worthwhile to note that when we increment {\tt num\_users}
1217 on line 18, a simple {\tt num\_users++} is correct.  If this was a
1218 driver inside the kernel, we'd have to use something like {\tt
1219 atomic\_inc()} because a plain {\tt i++} is not atomic.  Such a
1220 non-atomic statement will result in a race condition on SMP platforms,
1221 if an interrupt handler also touches {\tt num\_users}, or in some
1222 future Linux kernel that is preemptive.  Since this FUSD driver is
1223 just a plain, single-threaded user-space application, good old {\tt
1224 ++} still works.
1225
1226
1227 \section{Writing {\tt ioctl} Callbacks}
1228
1229 The POSIX API provides for a function called {\tt ioctl}, which allows
1230 ``out-of-band'' configuration information to be passed to a device
1231 driver through a file descriptor.  Using FUSD, you can write a device
1232 driver with a callback to handle {\tt ioctl} requests from clients.
1233 For the most part, it's just like writing a callback for {\tt read} or
1234 {\tt write}, as we've seen in previous sections.  From the client's
1235 point of view, {\tt ioctl} traditionally takes three arguments: a file
1236 descriptor, a command number, and a pointer to any additional data
1237 that might be required for the command.
1238
1239 \subsection{Using macros to generate {\tt ioctl} command numbers}
1240
1241 The Linux header file {\tt /usr/include/asm/ioctl.h} defines macros
1242 that {\em must} be used to create the {\tt ioctl} command number.
1243 These macros take various combinations of three arguments:
1244
1245 \begin{itemize}
1246
1247 \item {\tt type}---an 8-bit integer selected to be specific to the
1248 device driver.  {\tt type} should be chosen so as not to conflict with
1249 other drivers that might be ``listening'' to the same file descriptor.
1250 (Inside the kernel, for example, the TCP and IP stacks use distinct
1251 numbers since an {\tt ioctl} sent to a socket file descriptor might be
1252 examined by both stacks.)
1253
1254 \item {\tt number}---an 8-bit integer ``command number.''  Within a
1255 driver, distinct numbers should be chosen for each different kind of
1256 {\tt ioctl} command that the driver services.
1257
1258 \item {\tt data\_type}---The name of a type used to compute how many
1259 bytes are exchanged between the client and the driver.  This argument
1260 is, for example, the name of a structure.
1261
1262 \end{itemize}
1263
1264 The macros used to generate command numbers are:
1265
1266 \begin{itemize}
1267
1268 \item {\tt \_IO(int type, int number)} -- used for a simple ioctl that
1269 sends nothing but the type and number, and receives back nothing but
1270 an (integer) retval.
1271
1272 \item {\tt \_IOR(int type, int number, data\_type)} -- used for an
1273 ioctl that reads data {\em from} the device driver.  The driver will
1274 be allowed to return {\tt sizeof(data\_type)} bytes to the user.
1275
1276 \item {\tt \_IOW(int type, int number, data\_type)} -- similar to
1277 \_IOR, but used to write data {\em to} the driver.
1278
1279 \item {\tt \_IORW(int type, int number, data\_type)} -- a combination
1280 of {\tt \_IOR} and {\tt \_IOW}.  That is, data is both written to the
1281 driver and then read back from the driver by the client.
1282 \end{itemize}
1283
1284 \begin{Program}
1285 \listinginput[5]{1}{ioctl.h.example}
1286 \caption{ioctl.h: Using the {\tt \_IO} macros to generate {\tt ioctl}
1287 command numbers}
1288 \label{ioctl.h}
1289 \end{Program}
1290
1291 Program~\ref{ioctl.h} is an example header file showing the use of
1292 these macros.  In real programs, the client executing an ioctl and the
1293 driver that services it must share the same header file.
1294
1295 \subsection{Example client calls and driver callbacks}
1296
1297 Program~\ref{ioctl-client.c} shows a client program that executes {\tt
1298 ioctl}s using the ioctl command numbers defined in
1299 Program~\ref{ioctl.h}.  The {\tt ioctl\_data\_t} is
1300 application-specific; our simple test program defines it as a
1301 structure containing two arrays of characters.  The first {\tt ioctl}
1302 call (line 10) sends the command {\tt IOCTL\_TEST3}, which retrieves
1303 strings {\em from} the driver.  The second {\tt ioctl} uses the
1304 command {\tt IOCTL\_TEST4} (line 18), which sends strings {\em to} the
1305 driver.
1306
1307 \begin{Program}
1308 \listinginput[5]{1}{ioctl-client.c.example}
1309 \caption{ioctl-client.c: A program that makes {\tt ioctl} requests on
1310 a file descriptor}
1311 \label{ioctl-client.c}
1312 \end{Program}
1313
1314 The portion of the FUSD driver that services these calls is shown in
1315 Program~\ref{ioctl-server.c}.
1316
1317 \begin{Program}
1318 \listinginput[5]{1}{ioctl-server.c.example}
1319 \caption{ioctl-server.c: A driver that handles {\tt ioctl} requests}
1320 \label{ioctl-server.c}
1321 \end{Program}
1322
1323 The ioctl example header file and test programs shown in this document
1324 (Programs~\ref{ioctl.h}, \ref{ioctl-client.c}, and
1325 \ref{ioctl-server.c}) are actually contained in a larger, single
1326 example program included in the FUSD distribution called {\tt
1327 ioctl.c}.  That source code shows other variations on calling and
1328 servicing {\tt ioctl} commands.
1329
1330
1331 \section{Integrating FUSD With Your Application Using {\tt fusd\_dispatch()}}
1332 \label{selecting}
1333
1334 The example applications we've seen so far have something in common:
1335 after initialization and device registration, they call {\tt
1336 fusd\_run()}.  This gives up control of the program's flow, turning it
1337 over to the FUSD library instead.  This worked fine for our simple
1338 example programs, but doesn't work in a real program that needs to
1339 wait for events other than FUSD callbacks.  For this reason, our
1340 framework provides another way to activate callbacks that does not
1341 require the driver to give up control of its {\tt main()}.
1342
1343 \subsection{Using {\tt fusd\_dispatch()}}
1344
1345 Recall from Section~\ref{using-fusd-register} that {\tt
1346 fusd\_register} returns a {\em file descriptor} for every device that
1347 is successfully registered.  This file descriptor can be used to
1348 activate device callbacks ``manually,'' without passing control of the
1349 application to {\tt fusd\_run()}.  Whenever the file descriptor
1350 becomes readable according to {\tt select(2)}, it should be passed to
1351 {\tt fusd\_dispatch()}, which in turn will activate callbacks in the
1352 same way that {\tt fusd\_run()} does.  In other words, an application
1353 can:
1354 \begin{enumerate}
1355 \item Save the file descriptors returned by {\tt fusd\_register()};
1356 \item Add those FUSD file descriptors to an {\tt fd\_set} that is
1357 passed to {\tt select}, along with any other file
1358 descriptors that might be interesting to the application; and
1359 \item Pass every FUSD file descriptor that {\tt select} indicates is
1360 readable to {\tt fusd\_dispatch}.
1361 \end{enumerate}
1362
1363 {\tt fusd\_dispatch()} returns 0 if at least one callback was
1364 successfully activated.  On error, -1 is returned with {\tt errno} set
1365 appropriately.  {\tt fusd\_dispatch()} will never block---if no
1366 messages are available from the kernel, it will return -1 with {\tt
1367 errno} set to {\tt EAGAIN}.
1368
1369 \subsection{Helper Functions for Constructing an {\tt fd\_set}}
1370
1371 The FUSD library provides two (optional) utility functions that can
1372 make it easier to write applications that integrate FUSD into their
1373 own {\tt select()} loops.  Specifically:
1374 \begin{itemize}
1375 \item {\tt void fusd\_fdset\_add(fd\_set *set, int *max)}---is meant
1376 to help construct an {\tt fd\_set} that will be passed as the
1377 ``readable fds'' set to select.  This function adds the file
1378 descriptors of all previously registered FUSD devices to the fd\_set
1379 {\tt set}.  It assumes that {\tt set} has already been initialized by
1380 the caller.  The integer {\tt max} is updated to reflect the largest
1381 file descriptor number in the set.  {\tt max} is not changed if the
1382 value passed to {\tt fusd\_fdset\_add} is already larger than the
1383 largest FUSD file descriptor added to the set.
1384
1385 \item {\tt void fusd\_dispatch\_fdset(fd\_set *set)}---is meant to be
1386 called on the {\tt fd\_set} that is {\em returned} by select.  It
1387 assumes that {\tt set} contains a set file descriptors that {\tt
1388 select()} has indicated are readable.  {\tt fusd\_dispatch\_fdset()}
1389 calls {\tt fusd\_dispatch} on every descriptor in {\tt set} that is a
1390 valid FUSD descriptor.  Non-FUSD descriptors in {\tt set} are
1391 ignored.
1392 \end{itemize}
1393
1394
1395 \begin{Program}
1396 \listinginput[5]{1}{drums3.c.example}
1397 \caption{drums3.c: Waiting for both FUSD and non-FUSD events in a
1398 {\tt select} loop}
1399 \label{drums3.c}
1400 \end{Program}
1401
1402 The excerpt of {\tt drums3.c} shown in Program~\ref{drums3.c}
1403 demonstrates the use of these helper functions.  This program is
1404 similar to the earlier drums.c example: it creates a number of musical
1405 instruments such as {\tt /dev/drums/bam} and {\tt /dev/drums/boom}.
1406 However, in addition to servicing its musical callbacks, the driver
1407 also prints a prompt to standard input asking how ``loud'' the drums
1408 should be.  Instead of turning control of {\tt main()} over to {\tt
1409 fusd\_run()} as in the previous examples, {\tt drums3} uses {\tt
1410 select()} to simultaneously watch its FUSD file descriptors and standard
1411 input.  It responds to input from both sources.
1412
1413 On lines 2--5, an {\tt fd\_set} and its associated ``max'' value are
1414 initialized to contain stdin's file descriptor.  On line 9, we use
1415 {\tt fusd\_fdset\_add} to add the FUSD file descriptors for all
1416 registered devices.  (Not shown in this excerpt is the device
1417 registration, which is the same as the registration code we saw in
1418 {\tt drums.c}.)  On line 13 we call select, which blocks until one of
1419 the fd's in the set is readable.  On lines 17 and 18, we check to see
1420 if standard input is readable; if so, a function is called which reads
1421 the user's response from standard input and prints a new prompt.
1422 Finally, on line 21, we call {\tt fusd\_dispatch\_fdset}, which in
1423 turn will activate the callbacks for devices that have pending system
1424 calls waiting to be serviced.
1425
1426 It's worth reiterating that drivers are not required to use the FUSD
1427 helper functions {\tt fusd\_fdset\_add} and {\tt
1428 fusd\_dispatch\_fdset}.  If it's more convenient, a driver can
1429 manually save all of the file descriptors returned by {\tt
1430 fusd\_register}, construct its own {\tt fd\_set}, and then call {\tt
1431 fusd\_dispatch} on each descriptor that is readable.  This method is
1432 sometimes required for integration with other frameworks that want to
1433 take over your {\tt main()}.  For example, the
1434 \htmladdnormallinkfoot{GTK user interface
1435 framework}{http://www.gtk.org} is event-driven and requires that you
1436 pass control of your {\tt main} to it.  However, it does allow you to
1437 give it a file descriptor and a function pointer, saying ``Call this
1438 callback when {\tt select} indicates this file descriptor has become
1439 readable.''  A GTK application that implements FUSD devices can work
1440 by giving GTK all the FUSD file descriptors individually, and calling
1441 {\tt fusd\_dispatch()} when GTK calls the associated callbacks.
1442
1443
1444
1445 \section{Implementing Blocking System Calls}
1446
1447 All of the example drivers that we've seen until now have had an
1448 important feature missing: they never had to {\em wait} for anything.
1449 So far, a driver's response to a system call has always been
1450 immediately available---allowing the driver to response immediately.
1451 However, real devices are often not that lucky: they usually have to
1452 wait for something to happen before completing a client's system call.
1453 For example, a driver might be waiting for data to arrive from the
1454 serial port or over the network, or even waiting for a user action.
1455
1456 In situations like this, a basic capability most device drivers must
1457 have is the ability to {\em block} the caller.  Blocking operations
1458 are important because they provide a simple interface to user programs
1459 that does flow control, rather than something more expensive like
1460 continuous polling.  For example, user programs expect to be able to
1461 execute a statement like {\tt read(fd, buf, sizeof(buf))}, and expect
1462 the read call to block (stop the flow of the calling program) until
1463 data is available.  This is much simpler and more efficient than
1464 polling repeatedly.
1465
1466 In the following sections, we'll describe how to block and unblock
1467 system calls for devices that use FUSD.
1468
1469
1470 \subsection{Blocking the caller by blocking the driver}
1471
1472 The easiest (but least useful) way to block a client's system call is
1473 simply to block the driver, too.  For example, consider
1474 Program~\ref{console-read.c}, which implements a device called {\tt
1475 /dev/console-read}.  Whenever a process tries to read from this
1476 device, the driver prints a prompt to standard input, asking for a
1477 reply.  (The prompt appears in the shell the driver was run in, not
1478 the shell that's trying to read from the device.)  When the user
1479 enters a line of text, the response is returned to the client that did
1480 the original {\tt read()}.  By blocking the driver waiting for the
1481 reply, the client that issued the system call is blocked as well.
1482
1483 \begin{Program}
1484 \listinginput[5]{1}{console-read.c.example}
1485 \caption{console-read.c: A simple blocking system call}
1486 \label{console-read.c}
1487 \end{Program}
1488
1489 Blocking the driver this way is safe---unlike programming in the
1490 kernel proper, where doing something like this would block the entire
1491 system.  It's also easy to implement, as seen from the example above.
1492 However, it makes the driver unresponsive to system call requests that
1493 might be coming from other clients.  If another process tries to do
1494 anything at all with a blocked driver's device---even an {\tt
1495 open()}---it will block until the driver wakes up again.  This
1496 limitation makes blocking drivers inappropriate for any device driver
1497 that expects to service more than one client at a time.
1498
1499
1500 \subsection{Blocking the caller using {\tt -FUSD\_NOREPLY};
1501 unblocking it using {\tt fusd\_return()}}
1502 \label{fusd-noreply}
1503
1504 If a device driver expects more than one client at a time---as is
1505 often the case---a slightly different programming model is needed for
1506 system calls that can potentially block.  Instead of blocking, the
1507 driver immediately sends a message to the FUSD framework that says, in
1508 essence, ``Don't unblock the client that issued this system call, but
1509 continue sending additional system call requests that might be coming
1510 from other clients.''  Driver callbacks can send this message to FUSD
1511 by returning the special value {\tt -FUSD\_NOREPLY} instead of a
1512 normal system call return value.
1513
1514 Before a callback blocks the caller by returning {\tt -FUSD\_NOREPLY},
1515 it must save the {\tt fusd\_file\_info} pointer that was provided to
1516 the callback as its first argument.  Later, when an event occurs which
1517 allows the client's blocked system call to complete, the driver should
1518 call {\tt fusd\_return()}, which will unblock the calling process and
1519 complete its system call.  {\tt fusd\_return()} takes two arguments:
1520 \begin{itemize}
1521 \item The {\tt fusd\_file\_info} pointer that the callback saved
1522 earlier; and
1523 \item The system call's return value (in other words, the value that
1524 would have been returned by the callback function had it not returned
1525 {\tt -FUSD\_NOREPLY}).  FUSD itself {\em does not} examine the return
1526 value passed as the second argument to {\tt fusd\_return}; it simply
1527 propagates that value back to the kernel as the return value of the
1528 blocked system call.
1529 \end{itemize}
1530
1531 Drivers should never call {\tt fusd\_return} more than once on a
1532 single {\tt fusd\_file\_info} pointer.  Doing so will have undefined
1533 results, similar to calling {\tt free()} twice on the same pointer.
1534
1535 It also bears repeating that a callback can call {\em either} call
1536 fusd\_return() explicitly {\em or} return a normal return value (i.e.,
1537 not {\tt -FUSD\_NOREPLY}), but not both.
1538
1539 {\tt -FUSD\_NOREPLY} and {\tt fusd\_return()} make it easy for a
1540 driver to block a process, then unblock it later when data becomes
1541 available.  When the callback returns {\tt -FUSD\_NOREPLY}, the driver
1542 is freed up to wait for other events, even though the process making
1543 the system call is still blocked.  The driver can then wait for
1544 something to happen that unblocks the original caller---for example,
1545 another FUSD event, data from a serial port, or data from the network.
1546 (Recall from Section~\ref{selecting} that a FUSD driver can
1547 simultaneously wait for both FUSD and non-FUSD events.)
1548
1549 FUSD includes an example program, {\tt pager.c}, which demonstrates
1550 these techniques.  The pager driver implements a simple notification
1551 interface which lets any number of ``waiters'' wait for a signal from
1552 a ``notifier.''  All the waiters wait by trying to read from {\tt
1553 /dev/pager/notify}.  Those reads will block until a notifier writes
1554 the string {\tt page} to {\tt /dev/pager/input}.  It's easy to try
1555 the application out---run the driver, and then open three other
1556 shells.  In two of them, type {\tt cat /dev/pager/notify}.  The reads
1557 will block.  Then, in the third shell, type {\tt echo page >
1558 /dev/pager/input}---the other two shells should become unblocked.
1559
1560 Let's take a look at how this application is implemented, step by
1561 step.
1562
1563 \subsubsection{Keeping Per-Client State}
1564
1565 The first thing to notice about {\tt pager.c} is that it keeps {\em
1566 per-client state}.  That is, for every file descriptor open to the
1567 driver, a structure is allocated that has information relating to that
1568 file descriptor.  Previous driver examples were, for the most part,
1569 {\em reactive}---they received requests, and immediately generated
1570 responses.  Since there was never more than one request outstanding,
1571 there was no need to keep a list of them.  The pager application is
1572 the first one that must keep track of an arbitrary number of requests
1573 that might be outstanding at the same time.  The first excerpt of {\tt
1574 pager.c}, which appears in Program~\ref{pager-open.c}, shows the code
1575 which creates this per-client state.  Lines 1--6 define a structure,
1576 {\tt pager\_client}, which keeps all the information we need about
1577 each client attached to the driver.  The {\tt open} callback for {\tt
1578 /dev/pager/notify}, shown on lines 12--31, allocates memory for an
1579 instance of this structure and adds it to a linked list.  (If the
1580 memory allocation fails, an error is returned to the client on line
1581 18; this will prevent the file from opening.)  Note on line 25 that we
1582 use the {\tt private\_data} field to store a pointer to the client
1583 state; this allows the structure to be retrieved when later callbacks
1584 on this file descriptor arrive.  The memory is deallocated when the
1585 file is closed; we'll see that in a later section.
1586
1587 \begin{Program}
1588 \listinginput[5]{1}{pager-open.c.example}
1589 \caption{pager.c (Part 1): Creating state for every client using the
1590 driver}
1591 \label{pager-open.c}
1592 \end{Program}
1593
1594 Another thing to notice about the open callback is the use of the {\tt
1595 last\_page\_seen} variable.  The driver gives a sequence number to
1596 every page it receives; {\tt last\_page\_seen} stores the number of
1597 the most recent page seen by a client.  When a new client arrives
1598 (i.e., it opens {\tt /dev/pager/notify}), its {\tt last\_page\_seen}
1599 state is set equal to the page that has most recently arrived; this
1600 forces a new client to wait for the {\em next} page, rather than
1601 immediately being notified of a page that has arrived in the past.
1602
1603 \subsubsection{Blocking and completing reads}
1604
1605 The next part of {\tt pager.c} is shown in Program~\ref{pager-read.c}.
1606 The {\tt pager\_notify\_read} function seen on line 1 is registered as
1607 the {\tt read} callback for the {\tt /dev/pager/notify} device.  It
1608 blocks the read request using the technique we described earlier: it
1609 stores the {\tt fusd\_file\_info} pointer in that client's state
1610 structure, and returns {\tt -FUSD\_NOREPLY}.  (Note that the pointer
1611 to the client's state structure comes from the {\tt private\_data}
1612 field of {\tt fusd\_file\_info}, where the open callback stored it.)
1613
1614 \begin{Program}
1615 \listinginput[5]{1}{pager-read.c.example}
1616 \caption{pager.c (Part 2): Block clients' {\tt read} requests, and later
1617 completing the blocked reads}
1618 \label{pager-read.c}
1619 \end{Program}
1620
1621
1622 {\tt pager\_notify\_complete\_read} {\em unblocks} previously blocked
1623 reads.  This function first checks to see that there is, in fact, a blocked
1624 read (line 19).  It then checks to see if a page has arrived that the
1625 client hasn't seen yet (line 23).  Finally, it updates the client
1626 state and unblocks the blocked read by calling {\tt fusd\_return}.
1627 Note the second argument to {\tt fusd\_return} is a 0; as we
1628 saw in Section~\ref{read-callback}, a 0 return value to a {\tt read}
1629 system call means EOF.  (The system call will be unblocked regardless
1630 of the return value.)
1631
1632 {\tt pager\_notify\_complete\_read} is called every time a new page
1633 arrives.  New pages are processed by {\tt pager\_input\_write} (line
1634 34), which is the {\tt write} callback for {\tt /dev/pager/input}.
1635 After recording the fact that a new page has arrived, it calls {\tt
1636 pager\_notify\_complete\_read} for each client that has an open file
1637 descriptor.  This will complete the reads of any clients who have not
1638 yet seen this new data, and have no effect on clients that don't have
1639 outstanding reads.
1640
1641 There is another interesting point to notice about {\tt
1642 pager\_notify\_read}.  On line 12, after it stores the blocked system
1643 call's pointer, but before we return {\tt -FUSD\_NOREPLY}, it calls
1644 the completion function.  This has the effect of returning any data
1645 that might already be available back to the caller immediately.  If
1646 that happens, we will end up calling {\tt fusd\_return} {\em before}
1647 we return {\tt -FUSD\_NOREPLY}.  This probably seems strange, but it's
1648 legal.  Recall that a callback can call fusd\_return() explicitly {\em
1649 or} return a normal (not {\tt -FUSD\_NOREPLY}) return value, but not
1650 both; the order doesn't matter.
1651
1652 \subsubsection{Using {\tt fusd\_destroy()} to clean up client state}
1653 \label{fusd-destroy}
1654
1655 Finally, let's take a look at one last aspect of the pager program:
1656 how it cleans up the per-client state when a client leaves.  This is
1657 mostly straightforward, with one exception: a client may have an
1658 outstanding read request out when a close request comes in.  Normally,
1659 a client can't make another system call request while a previous
1660 system call is still blocked.  However, the {\tt close} system call is
1661 an exception: it gets called when a client dies (for example, if it
1662 receives an interrupt signal).  If a {\tt close} comes in while
1663 another system call is still outstanding, the state associated with
1664 the outstanding request should be freed to avoid a memory leak.  The
1665 {\tt fusd\_destroy} function is used to do this, seen on linen 12-14
1666 of Program~\ref{pager-close.c}.
1667
1668 \begin{Program}
1669 \listinginput[5]{1}{pager-close.c.example}
1670 \caption{pager.c (Part 3): Cleaning up when a client leaves}
1671 \label{pager-close.c}
1672 \end{Program}
1673
1674
1675 \subsection{Retrieving a blocked system call's arguments from a {\tt
1676 fusd\_file\_info} pointer}
1677
1678 \label{logring}
1679
1680 In the previous section, we showed how the {\tt fusd\_return} function
1681 can be used to specify the return value of a system call that was
1682 previously blocked.  However, many system calls have side effects in
1683 addition to returning a value---for example, in a {\tt read()}
1684 request, the data being returned has to be copied into the caller's
1685 buffer.  To facilitate this, FUSD provides accessor functions that let
1686 drivers retrieve the arguments that had been passed to its callbacks
1687 at the time the call was originally issued.  For example, the {\tt
1688 fusd\_get\_read\_buffer()} function will return a pointer to the data
1689 buffer that is provided with {\tt read()} callbacks.  Drivers can use
1690 these accessor functions to affect change to a client {\em before}
1691 calling {\tt fusd\_return()}.
1692
1693 The following accessor functions are available, all of which take a
1694 single {\tt fusd\_file\_info *} argument:
1695 \begin{itemize}
1696 \item {\tt int char *fusd\_get\_read\_buffer}---The destination buffer
1697 for data that a driver is returning to a process doing a {\tt read()}.
1698 \item {\tt const char *fusd\_get\_write\_buffer}---The source buffer
1699 containing data sent to the driver by a process doing a {\tt write()}.
1700 \item {\tt fusd\_get\_length}---The length (in bytes) of the buffer
1701 for either a {\tt read()} or a {\tt write()}.
1702 \item {\tt loff\_t fusd\_get\_offset}---The file descriptor's byte
1703 offset, typically used in {\tt read()} and {\tt write()} callbacks.
1704 \item {\tt int fusd\_get\_ioctl\_request}---An ioctl's request
1705 ``command number'' (i.e., the first argument of an ioctl).
1706 \item {\tt int fusd\_get\_ioctl\_arg}---The second argument of an
1707 ioctl for non-data-bearing {\tt ioctl} requests (i.e., {\tt \_IO}
1708 commands).
1709 \item {\tt void *fusd\_get\_ioctl\_buffer}---The data buffer for
1710 data-bearing {\tt ioctl} requests ({\tt \_IOR}, {\tt \_IOW}, and
1711 {\tt \_IORW} commands).
1712 \item {\tt int fusd\_get\_poll\_diff\_cached\_state}---See
1713 Section~\ref{selectable}.
1714 \end{itemize}
1715
1716 We got away without using these accessor functions in our {\tt
1717 pager.c} example because the pager doesn't actually return data---it
1718 just blocks and unblocks {\tt read} calls.  However, the FUSD
1719 distribution contains another example program, {\tt logring}, that
1720 demonstrates their use.
1721
1722 {\tt logring} makes it easy to access the most recent (and only the most
1723 recent) output from a process. It works just like {\tt tail -f} on a
1724 log file, except that the storage required never grows. This can be
1725 useful in embedded systems where there isn't enough memory or disk
1726 space for keeping complete log files, but the most recent debugging
1727 messages are sometimes needed (e.g., after an error is observed).
1728
1729 {\tt logring} uses FUSD to implement a character device, {\tt
1730 /dev/logring}, that acts like a named pipe that has a finite, circular
1731 buffer.  The size of the buffer is given as a command-line argument.
1732 As more data is written into the buffer, the oldest data is discarded.
1733 A process that reads from the logring device will first read the
1734 existing buffer, then block and see new data as it's written, similar
1735 to monitoring a log file using {\tt tail -f}.
1736
1737 You can run this example program by typing {\tt logring <logsize>},
1738 where {\tt logsize} is the size of the circular buffer in bytes.
1739 Then, type {\tt cat /dev/logring} in a shell.  The {\tt cat} process
1740 will block, waiting for data.  From another shell, write to the
1741 logring (e.g., {\tt echo Hi there > /dev/logring}).  The {\tt cat}
1742 process will see the message appear.
1743
1744 (This example program is based on {\em emlog}, a (real) Linux kernel
1745 module with identical functionality.  If you find logring useful, but
1746 want to use it on a system that does not have FUSD, check out the
1747 original
1748 \htmladdnormallinkfoot{emlog}{http://www.circlemud.org/jelson/software/emlog}.)
1749
1750
1751
1752
1753
1754 \section{Implementing {\tt select}able Devices}
1755 \label{selectable}
1756
1757 One important feature that almost every device driver in a system
1758 should have is support for the {\tt select(2)} system call.  {\tt
1759 select} allows clients to assemble a set of file descriptors and ask
1760 to be notified when one of them becomes readable or writable.  This
1761 simple feature is deceptively powerful---it allows clients to wait for
1762 any number of a set of possible events to occur.  This is
1763 fundamentally different than (for example) a blocking read, which only
1764 unblocks on one kind of event.  In this section, we'll describe how
1765 FUSD can be used to create a device whose state can be queried by a
1766 client's call to {\tt select(2)}.
1767
1768 This section is limited to a discussion what a FUSD driver writer
1769 needs to know to implement a selectable device.  Details of the FUSD
1770 implementation required to support this feature are described in
1771 Section~\ref{poll-diff-implementation}
1772
1773
1774 \subsection{Poll state and the {\tt poll\_diff} callback}
1775
1776 FUSD's implementation of selectable devices depends on the concept of
1777 {\em poll state}.  A file descriptor's poll state is a bitmask that
1778 describes its current properties---readable, writable, or exception
1779 raised.  These three states correspond to {\tt select(2)}'s three
1780 {\tt fd\_set}s.  FUSD has constants used to describe these states:
1781 \begin{itemize}
1782 \item {\tt FUSD\_NOTIFY\_INPUT}---Input is available; a read will not
1783 block.
1784 \item {\tt FUSD\_NOTIFY\_OUTPUT}---Output space is available; a write
1785 will not block.
1786 \item {\tt FUSD\_NOTIFY\_EXCEPT}---An exception has occurred.
1787 \end{itemize}
1788
1789 These constants can be combined with C's bitwise-or operator.  For
1790 example, a descriptor that is both readable and writable is expressed
1791 as {\tt FUSD\_NOTIFY\_INPUT | FUSD\_NOTIFY\_OUTPUT}.  0 means a file
1792 descriptor is not readable, not writable, and not in the exception
1793 set.
1794
1795 For a FUSD device to be selectable, its driver must implement a
1796 callback called {\tt poll\_diff}.  This callback is very different
1797 than the others; it is not a ``direct line'' between the client and
1798 the driver as is the case with a call such as {\tt ioctl}.  A driver's
1799 response to {\tt poll\_diff} is {\em not} the return value seen by a
1800 client's call to {\tt select}.  When a client tries to {\tt select} on
1801 a set of file descriptors, the kernel collects the responses from all
1802 the appropriate callbacks---{\tt poll} for file descriptors managed by
1803 kernel drivers, and {\tt poll\_diff} callbacks those managed by FUSD
1804 drivers---and synthesizes all of that information into the return
1805 value seen by the client.
1806
1807 FUSD keeps a cache of the poll state it has most recently received
1808 from each FUSD device driver, initially assumed to be 0.  This state
1809 is returned to clients trying to {\tt select()} on devices managed by
1810 those drivers.  Under certain conditions, FUSD sends a query to the
1811 driver in order to ensure that the kernel's poll state cache is up to
1812 date.  This query takes the form of a {\tt poll\_diff} callback
1813 activation, which is given a single argument: the poll state that FUSD
1814 currently has cached.  The driver should consult its internal data
1815 structures to determine the actual, current poll state (i.e., whether
1816 or not buffers have readable data).  Then:
1817 \begin{itemize}
1818 \item If the FUSD cache is incorrect (that is, the current true poll
1819 state is different than FUSD's cached state), the current poll state
1820 should be returned immediately.
1821 \item If the FUSD cache is up to date (that is, it matches the real
1822 current state), the callback should save the {\tt fusd\_file\_info}
1823 pointer and return {\tt -FUSD\_NOREPLY}.  Later, when the poll
1824 state changes, the driver can call {\tt fusd\_return()} to update
1825 FUSD's cache.
1826 \end{itemize}
1827
1828 In other words, when a driver's {\tt poll\_diff} callback is
1829 activated, the kernel is effectively saying to the driver, ``Here is
1830 what I think the current poll state of this file descriptor is; let me
1831 know when that state {\em changes}.''  The driver can either respond
1832 immediately (if the kernel's cache is already known to be out of
1833 date), or return {\tt -FUSD\_NOREPLY} if no update is immediately
1834 necessary.  Later, when the poll state changes (for example, if new
1835 data arrives that makes a device readable), the driver can used its
1836 saved {\tt fusd\_file\_info} pointer to send a poll state update to
1837 the kernel.
1838
1839 When a FUSD driver sends a poll state update, it might (or might not)
1840 have the effect of waking up a client that was blocked in {\tt
1841 select(2)}.  On the same note, it's worth reiterating that a {\tt
1842 -FUSD\_NOREPLY} response to a {\tt poll\_diff} callback {\em does not}
1843 necessarily block the client---other descriptors in the client's {\tt
1844 select} set might be readable, for example.
1845
1846 \subsection{Receiving a {\tt poll\_diff} request when the previous one
1847 has not been returned yet}
1848 \label{multiple-polldiffs}
1849
1850 Calls such as {\tt read} and {\tt write} are synchronous from the
1851 standpoint of an individual client---a request is made, and the
1852 requester blocks until a reply is received.  This means that there
1853 can't ever be more than a single {\tt read} request outstanding for a
1854 single client at a time.  (The driver as a whole may be keeping track
1855 of many outstanding {\tt read} requests in parallel, but no two of them will
1856 be from the same client file descriptor.)
1857
1858 As we mentioned in the previous section, the {\tt poll\_diff} callback
1859 is different from other callbacks.  It is not part of a synchronous
1860 request/reply sequence that causes the client to block.  It is also an
1861 interface to the {\em kernel}, not directly to the client.  So, it
1862 {\em is} possible to receive a {\tt poll\_diff} request while there is
1863 already one outstanding.  This happens if the kernel's poll state
1864 cache changes, causing it to notify the driver that it has a new
1865 cached value.
1866
1867 This is easy to handle; the client should simply
1868 \begin{enumerate}
1869 \item Destroy the old (now out-of-date) {\tt poll\_diff} request
1870 using the {\tt fusd\_destroy} function we saw in
1871 Section~\ref{fusd-destroy}.
1872 \item Either respond to or save the new {\tt poll\_diff} request,
1873 exactly as described in the previous section.
1874 \end{enumerate}
1875
1876 The next section will show an example of this technique.
1877
1878
1879 \subsection{Adding {\tt select} support to {\tt pager.c}}
1880
1881 Given the explanation of {\tt poll\_diff} in the previous sections, it
1882 might seem that implementing a selectable device is a daunting task.
1883 It's actually not as bad as it sounds---the example code may well be
1884 shorter than its explanation!
1885
1886 \begin{Program}
1887 \listinginput[5]{1}{pager-polldiff.c.example}
1888 \caption{pager.c (Part 4): Supporting {\tt select(2)} by implementing a
1889 {\tt poll\_diff} callback}
1890 \label{pager-polldiff.c}
1891 \end{Program}
1892
1893 Program~\ref{pager-polldiff.c} shows the implementation of {\tt
1894 poll\_diff} in {\tt pager.c}, which makes its notification interface
1895 ({\tt /dev/pager/notify}) selectable.  It is decomposed into a ``top
1896 half'' and ``bottom half'' function, exactly as we did for the
1897 blocking {\tt read} implementation in Program~\ref{pager-read.c}.
1898 First, on lines 1--20, we see the the callback for {\tt poll\_diff}
1899 callback itself.  It is virtually identical to the {\tt read} callback
1900 in Program~\ref{pager-read.c}.  The main difference is that it first
1901 checks (on line 12) to see if a {\tt poll\_diff} request is already
1902 outstanding when a new request comes in.  If so, the out-of-date
1903 request is destroyed using {\tt fusd\_destroy}, as we described in
1904 Section~\ref{multiple-polldiffs}.
1905
1906 The bottom half is shown on lines 22-46.  First, on lines 32--35, it
1907 computes the current poll state---if a page has arrived that the
1908 client hasn't seen yet, the file is readable; otherwise, it isn't.
1909 Next, the driver compares the current poll state with the poll state
1910 that the kernel has cached.  If the kernel's cache is out of date, the
1911 current state is returned to the kernel.  Otherwise, it does nothing.
1912
1913 As with the {\tt read} callback we saw previously, notice that {\tt
1914 pager\_notify\_complete\_polldiff} is called in two different cases:
1915 \begin{enumerate}
1916 \item It is called immediately from the {\tt pager\_notify\_polldiff}
1917 callback itself.  This causes the current poll state to be returned to
1918 the kernel immediately when the request arrives if the driver already
1919 knows the kernel's state needs to be updated.
1920 \item It is called when new data arrives that causes the poll state to
1921 change.  Refer back to Program~\ref{pager-read.c} on
1922 page~\pageref{pager-read.c}; in the callback that receives new pages,
1923 notice on line 45 that the {\tt poll\_diff} completion function is called
1924 alongside the {\tt read} completion function.
1925 \end{enumerate}
1926
1927 With this {\tt poll\_diff} implementation, it is possible for a client
1928 to open {\tt /dev/pager/notify}, and block in a {\tt select(2)} system
1929 call.  If another client writes {\tt page} to {\tt /dev/pager/input},
1930 the first client's {\tt select} will unblock, indicating the file has
1931 become readable.
1932
1933 For additional example code, take a look at the {\tt logring} example
1934 program we first mentioned in Section~\ref{logring}.  It also supports
1935 {\tt select} by implementing a similar {\tt poll\_diff} callback.
1936
1937 \section{Performance of User-Space Devices}
1938 \label{performance}
1939
1940 This section hasn't been written yet.  I have some pretty graphs and
1941 whatnot, but no time to write about them here before the release.
1942
1943 \section{FUSD Implementation Notes}
1944
1945 In this section, we describe some of the details of how FUSD is
1946 implemented.  It's not necessary to understand these details in order
1947 to use FUSD.  However, these notes can be useful for people who are
1948 trying to understand the FUSD framework itself---hackers, debuggers,
1949 or the generally curious.
1950
1951 \subsection{The situation with {\tt poll\_diff}}
1952 \label{poll-diff-implementation}
1953
1954
1955 In-kernel device drivers support select by implementing a callback
1956 called {\tt poll}.  This driver's callback is supposed to do two
1957 things.  First, it should return the current state of a file
1958 descriptor---a combination of being readable, writable, or having
1959 exceptions.  Second, it should provide a pointer to one of the
1960 driver's internal wait queues that will be awakened whenever the state
1961 changes.  The {\tt poll} call itself should never block---it should
1962 just instantaneously report what the {\em current} state is.
1963
1964 FUSD's implementation of selectable devices is different, but attempts
1965 to maintain three properties that we thought to be most important from
1966 the point of view of a client using {\tt select}.  Specifically:
1967 \begin{enumerate}
1968 \item The {\tt select(2)} call itself should never become blocked.
1969 For example, if one file descriptor in its set isn't readable, that
1970 shouldn't prevent it from reporting other file descriptors that are.
1971 \item If {\tt select(2)} indicates a file descriptor is readable (or
1972 writable), a read (or write) on that file descriptor shouldn't block.
1973 \item Clients should be allowed to seamlessly {\tt select} on any set
1974 of file descriptors, even if that set contains a mix of both FUSD and
1975 non-FUSD devices.
1976 \end{enumerate}
1977
1978
1979 The FUSD kernel module keeps a cache of the driver's most recent
1980 answer for each file descriptor, initially assumed to be 0.  When the
1981 kernel module's internal {\tt poll} callback is activated, it:
1982 \begin{enumerate}
1983 \item Dispatches a {\em non-}blocking {\tt poll\_diff} to the
1984 associated user-space driver, asking for a cache update---if and only
1985 if there isn't already an outstanding poll diff request out that has
1986 the same value.
1987 \item Immediately returns the cached value to the kernel
1988 \end{enumerate}
1989
1990 In addition, the cached value's readable bit is cleared on every read;
1991 the writable bit is cleared on every write.  This is necessary to
1992 prevent old poll state---which says ``device is readable''---from
1993 being returned out of the cache when it might be invalid.  FUSD
1994 assumes that any read to a device can make it potentially unreadable.
1995 This mechanism is what causes an updated poll diff to be sent to a
1996 client before the previous one has been returned.
1997
1998 (this section isn't finished yet; fancy time diagrams coming someday)
1999
2000 \subsection{Restartable System Calls}
2001
2002 No time to write this section yet...
2003
2004
2005 \appendix
2006
2007 \section{Using {\tt strace}}
2008 \label{strace}
2009
2010 This section hasn't been written yet.  Contributions are welcome.
2011
2012 \end{document}
2013