Document the --log-summary option in manpage
[xiph/unicode.git] / fusd / doc / fusd.tex
blob63e5b7ec7f2fe6862002a47cf6dbd6a1062f4aa2
3 % FUSD - Framework for User-Space Devices
4 % Programming Manual & Tutorial
6 % Jeremy Elson, (c) 2001 Sensoria Corporation, 2003 UCLA
7 % Released under open-source, BSD license
8 % See LICENSE file for full license
10 % $Id: fusd.tex,v 1.63 2003/08/20 22:00:55 jelson Exp $
12 \documentclass{article}
13 \addtolength{\topmargin}{-.5in} % repairing LaTeX's huge margins...
14 \addtolength{\textheight}{1in} % more margin hacking
15 \addtolength{\textwidth}{1.5in}
16 \addtolength{\oddsidemargin}{-0.75in}
17 \addtolength{\evensidemargin}{-0.75in}
19 \usepackage{graphicx,float,alltt,tabularx}
20 \usepackage{wrapfig,floatflt}
21 \usepackage{amsmath}
22 \usepackage{latexsym}
23 \usepackage{moreverb}
24 \usepackage{times}
25 \usepackage{html}
26 %\usepackage{draftcopy}
28 %\setcounter{bottomnumber}{3}
29 %\renewcommand{\topfraction}{0}
30 %\renewcommand{\bottomfraction}{0.7}
31 %\renewcommand{\textfraction}{0}
32 %\renewcommand{\floatpagefraction}{2.0}
34 \renewcommand{\topfraction}{1.0}
35 \renewcommand{\bottomfraction}{1.0}
36 \renewcommand{\textfraction}{0.0}
37 \renewcommand{\floatpagefraction}{0.9}
39 \floatstyle{ruled}
40 \newfloat{Program}{tp}{lop}
43 \title{FUSD:
44 A Linux {\bf F}ramework for {\bf U}ser-{\bf S}pace {\bf D}evices}
46 \author{Jeremy Elson\\
47 jelson@circlemud.org\\
48 http://www.circlemud.org/\tilde{}jelson/software/fusd}
49 \date{19 August 2003, Documentation for FUSD 1.10}
51 \begin{document}
53 %%%%%%%%%%%%%%%%%%%%%%%%% Title Page %%%%%%%%%%%%%%%%%%%%%%%%%
55 \begin{center}
56 \begin{latexonly}\vspace*{2in}\end{latexonly}
57 {\Huge FUSD:} \\
58 \vspace{2\baselineskip}
59 {\huge A Linux {\bf F}ramework for {\bf U}ser-{\bf S}pace {\bf D}evices}
61 \begin{latexonly}\vspace{2in}\end{latexonly}
62 \vspace{\baselineskip}
64 \vfill
66 {\large Jeremy Elson \\
67 \begin{latexonly}\vspace{.5\baselineskip}\end{latexonly}}
68 \vspace{\baselineskip}
69 {\tt jelson@circlemud.org\\
70 http://www.circlemud.org/jelson/software/fusd}
72 \vspace{2\baselineskip}
73 19 August 2003\\
74 Documentation for FUSD 1.10\\
76 \end{center}
77 \thispagestyle{empty}
78 \clearpage
82 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
84 \begin{latexonly}
85 \pagenumbering{roman}
87 \tableofcontents
88 \bigskip
89 \listof{Program}{List of Example Programs}
90 \setlength{\parskip}{10pt}
92 \clearpage
93 \end{latexonly}
95 % This resets the page counter to 1
96 \pagenumbering{arabic}
97 \addtolength{\parskip}{0.5\baselineskip}
99 \section{Introduction}
101 \subsection{What is FUSD?}
103 FUSD (pronounced {\em fused}) is a Linux framework for proxying device
104 file callbacks into user-space, allowing device files to be
105 implemented by daemons instead of kernel code. Despite being
106 implemented in user-space, FUSD devices can look and act just like any
107 other file under /dev which is implemented by kernel callbacks.
109 A user-space device driver can do many of the things that kernel
110 drivers can't, such as perform a long-running computation, block while
111 waiting for an event, or read files from the file system. Unlike
112 kernel drivers, a user-space device driver can {\em use other device
113 drivers}---that is, access the network, talk to a serial port, get
114 interactive input from the user, pop up GUI windows, or read from
115 disks. User-space drivers implemented using FUSD can be much easier
116 to debug; it is impossible for them to crash the machine, are easily
117 traceable using tools such as {\tt gdb}, and can be killed and
118 restarted without rebooting even if they become corrupted. FUSD
119 drivers don't have to be in C---Perl, Python, or any other language
120 that knows how to read from and write to a file descriptor can work
121 with FUSD. User-space drivers can be swapped out, whereas kernel
122 drivers lock physical memory.
124 Of course, as with almost everything, there are trade-offs.
125 User-space drivers are slower than kernel drivers because they require
126 three times as many system calls, and additional memory copies (see
127 section~\ref{performance}). User-space drivers can not receive
128 interrupts, and do not have the full power to modify arbitrary kernel
129 data structures as kernel drivers do. Despite these limitations, we
130 have found user-space device drivers to be a powerful programming
131 paradigm with a wide variety of uses (see Section~\ref{use-cases}).
133 FUSD is free software, distributed under a GPL-compatible license (the
134 ``new'' BSD license, with the advertising clause removed).
136 \subsection{How does FUSD work?}
138 FUSD drivers are conceptually similar to kernel drivers: a set of
139 callback functions called in response to system calls made on file
140 descriptors by user programs. FUSD's C library provides a device
141 registration function, similar to the kernel's {\tt
142 devfs\_register\_chrdev()} function, to create new devices. {\tt
143 fusd\_register()} accepts the device name and a structure full of
144 pointers. Those pointers are callback functions which are called in
145 response to certain user system calls---for example, when a process
146 tries to open, close, read from, or write to the device file. The
147 callback functions should conform to the standard definitions of POSIX
148 system call behavior. In many ways, the user-space FUSD callback
149 functions are identical to their kernel counterparts.
151 Perhaps the best way to show what FUSD does is by example.
152 Program~\ref{helloworld.c} is a simple FUSD device driver. When the
153 program is run, a device called {\tt /dev/hello-world} appears under
154 the {\tt /dev} directory. If that device is read (e.g., using {\tt
155 cat}), the read returns {\tt Hello, world!} followed by an EOF.
156 Finally, when the driver is stopped (e.g., by hitting Control-C), the
157 device file disappears.
159 \begin{Program}
160 \listinginput[5]{1}{helloworld.c.example}
161 \caption{helloworld.c: A simple program using FUSD to
162 create {\tt /dev/hello-world}}
163 \label{helloworld.c}
164 \end{Program}
166 On line 40 of the source, we use {\tt fusd\_register()} to create the
167 {\tt /dev/hello-world} device, passing pointers to callbacks for the
168 open(), close() and read() system calls. (Lines 36--39 use the GNU C
169 extension that allows initializer field naming; the 2.4 series of
170 Linux kernels use also that extension for the same purpose.) The
171 ``Hello, World'' read() callback itself is virtually identical to what
172 a kernel driver for this device would look like. It can inspect and
173 modify the user's file pointer, copy data into the user-provided
174 buffer, control the system call return value (either positive, EOF, or
175 error), and so forth.
177 The proxying of kernel system calls that makes this kind of program
178 possible is implemented by FUSD, using a combination of a kernel
179 module and cooperating user-space library. The kernel module
180 implements a character device, {\tt /dev/fusd}, which is used as a
181 control channel between the two. fusd\_register() uses this channel
182 to send a message to the FUSD kernel module, telling the name of the
183 device the user wants to register. The kernel module, in turn,
184 registers that device with the kernel proper using devfs. devfs and
185 the kernel don't know anything unusual is happening; it appears from
186 their point of view that the registered devices are simply being
187 implemented by the FUSD module.
189 Later, when kernel makes a callback due to a system call (e.g.\ when
190 the character device file is opened or read), the FUSD kernel module's
191 callback blocks the calling process, marshals the arguments of the
192 callback into a message and sends it to user-space. Once there, the
193 library half of FUSD unmarshals it and calls whatever user-space
194 callback the FUSD driver passed to fusd\_register(). When that
195 user-space callback returns a value, the process happens in reverse:
196 the return value and its side-effects are marshaled by the library
197 and sent to the kernel. The FUSD kernel module unmarshals this
198 message, matches it up with a corresponding outstanding request, and
199 completes the system call. The calling process is completely unaware
200 of this trickery; it simply enters the kernel once, blocks, unblocks,
201 and returns from the system call---just as it would for any other
202 blocking call.
204 One of the primary design goals of FUSD is {\em stability}. It should
205 not be possible for a FUSD driver to corrupt or crash the kernel,
206 either due to error or malice. Of course, a buggy driver itself may
207 corrupt itself (e.g., due to a buffer overrun). However, strict error
208 checking is implemented at the user-kernel boundary which should
209 prevent drivers from corrupting the kernel or any other user-space
210 process---including the errant driver's own clients, and other FUSD
211 drivers.
214 \subsection{What FUSD {\em Isn't}}
216 FUSD looks similar to certain other Linux facilities that are already
217 available. It also skirts near a few of the kernel's hot-button
218 political issues. So, to avoid confusion, we present a list of
219 things that FUSD is {\em not}.
221 \begin{itemize}
223 \item {\bf A FUSD driver is not a kernel module.} Kernel modules
224 allow---well, modularity of kernel code. They let you insert and
225 remove kernel modules dynamically after the kernel boots. However,
226 once inserted, the kernel modules are actually part of the kernel
227 proper. They run in the kernel's address space, with all the same
228 privileges and restrictions that native kernel code does. A FUSD
229 device driver, in contrast, is more similar to a daemon---a program
230 that runs as a user-space process, with a process ID.
232 \item {\bf FUSD is not, and doesn't replace, devfs.} When a FUSD
233 driver registers a FUSD device, it automatically creates a device file
234 in {\tt /dev}. However, FUSD is not a replacement for devfs---quite
235 the contrary, FUSD creates those device files by {\em using} devfs.
236 In a normal Linux system, only kernel modules proper---not user-space
237 programs---can register with devfs (see above).
239 \item {\bf FUSD is not UDI.} UDI, the \htmladdnormallinkfoot{Uniform
240 Driver Interface}{http://www.projectudi.org}, aims to create a binary
241 API for drivers that is uniform across operating systems. It's true
242 that FUSD could conceivably be used for a similar purpose (inasmuch as
243 it defines a system call messaging structure). However, this was not
244 the goal of FUSD as much as an accidental side effect. We do not
245 advocate publishing drivers in binary-only form, even though FUSD does
246 make this possible in some cases.
248 \item {\bf FUSD is not an attempt to turn Linux into a microkernel.}
249 We aren't trying to port existing drivers into user-space for a
250 variety of reasons (not the least of which is performance). We've
251 used FUSD as a tool to write new drivers that are much easier from
252 user-space than they would be in the kernel; see
253 Section~\ref{use-cases} for use cases.
256 \end{itemize}
259 \subsection{Related Work}
261 FUSD is a new implementation, but certainly not a new idea---the
262 theory of its operation is the same as any microkernel operating
263 system. A microkernel (roughly speaking) is one that implements only
264 very basic resource protection and message passing in the kernel.
265 Implementation of device drivers, file systems, network stacks, and so
266 forth are relegated to userspace. Patrick Bridges maintains a list of
267 such \htmladdnormallinkfoot{microkernel operating systems}{http://www.cs.arizona.edu/people/bridges/os/microkernel.html}.
269 Also related is the idea of a user-space filesystem, which has been
270 implemented in a number of contexts. Some examples include Klaus
271 Schauser's \htmladdnormallinkfoot{UFO
272 Project}{http://www.cs.ucsb.edu/projects/ufo/index.html} for Solaris,
273 and Jeremy Fitzhardinge's (no longer maintained)
274 \htmladdnormallinkfoot{UserFS}{http://www.goop.org/~jeremy/userfs/}
275 for Linux 1.x. The \htmladdnormallinkfoot{UFO
276 paper}{http://www.cs.ucsb.edu/projects/ufo/97-usenix-ufo.ps} is also
277 notable because it has a good survey of similar projects that
278 integrate user-space code with system calls.
280 \subsection{Limitations and Future Work}
282 In its current form, FUSD is useful and has proven to be quite
283 stable---we use it in production systems. However, it does have some
284 limitations that could benefit from the attention of developers.
285 Contributions to correct any of these deficiencies are welcomed!
286 (Many of these limitations will not make sense without having read the
287 rest of the documentation first.)
290 \begin{itemize}
291 \item Currently, FUSD only supports implementation of character
292 devices. Block devices and network devices are not supported yet.
294 \item The kernel has 15 different callbacks in its {\tt
295 file\_operations} structure. The current version of FUSD does not
296 proxy some of the more obscure ones out to userspace.
298 \item Currently, all system calls that FUSD understands are proxied
299 from the FUSD kernel module to userspace. Only the userspace library
300 knows which callbacks have actually been registered by the FUSD
301 driver. For example, the kernel may proxy a write() system call to
302 user-space even if the driver has not registered a write() callback
303 with fusd\_register().
305 fusd\_register() should, but currently does not, tell the kernel
306 module which callbacks it wants to receive, per-device. This will be
307 more efficient because it will prevent useless system calls for
308 unsupported operations. In addition, it will lead to more logical and
309 consistent behavior by allowing the kernel to use its default
310 implementations of certain functions such as writev(), instead of
311 being fooled into thinking the driver has an implementation of it in
312 cases where it doesn't.
314 \item It should be possible to write a FUSD library in any language
315 that supports reads and writes on raw file descriptors. In the
316 future, it might be possible to write FUSD device drivers in a variety
317 of languages---Perl, Python, maybe even Java. However, the current
318 implementation has only a C library.
320 \item It's possible for drivers that use FUSD to deadlock---for
321 example, if a driver tries to open itself. In this one case, FUSD
322 returns {\tt -EDEADLOCK}. However, deadlock protection should be
323 expanded to more general detection of cycles of arbitrary length.
325 \item FUSD should provide a /proc interface that gives debugging and
326 status information, and allows parameter tuning.
328 \item FUSD was written with efficiency in mind, but a number of
329 important optimizations have not yet been implemented. Specifically,
330 we'd like to try to reduce the number of memory copies by using a
331 buffer shared between user and kernel space to pass messages.
333 \item FUSD currently requires devfs, which is used to dynamically
334 create device files under {\tt /dev} when a FUSD driver registers
335 itself. This is, perhaps, the most convenient and useful paradigm
336 for FUSD. However, some users have asked if it's possible to use FUSD
337 without devfs. This should be possible if FUSD drivers bind to device
338 major numbers instead of device file names.
340 \end{itemize}
345 \subsection{Author Contact Information and Acknowledgments}
347 The original version of FUSD was written by Jeremy Elson
348 \htmladdnormallink{(jelson@circlemud.org)}{mailto:jelson@circlemud.org}
349 and Lewis Girod at Sensoria Corporation.
350 Sensoria no longer maintains public releases of FUSD, but the same
351 authors have since forked the last public release and continue to
352 maintain FUSD from the University of California, Los Angeles.
354 If you have bug reports, patches, suggestions, or any other comments,
355 please feel free to contact the authors.
357 FUSD has two
358 \htmladdnormallinkfoot{SourceForge}{http://www.sourceforge.net}-host
359 mailing lists: a low-traffic list for announcements ({\tt fusd-announce})
360 and a list for general discussion ({\tt fusd-devel}). Subscription
361 information for both lists is available at the
362 \htmladdnormallink{SourceForge's FUSD mailing list
363 page}{http://sourceforge.net/mail/?group_id=36326}.
365 For the latest releases and information about FUSD, please see the
366 \htmladdnormallinkfoot{official FUSD home
367 page}{http://www.circlemud.org/jelson/software/fusd}.
371 \subsection{Licensing Information}
373 FUSD is free software, distributed under a GPL-compatible license (the
374 ``new'' BSD license, with the advertising clause removed). The
375 license is enumerated in its entirety below.
377 Copyright (c) 2001, Sensoria Corporation; (c) 2003 University of
378 California, Los Angeles. All rights reserved.
380 Redistribution and use in source and binary forms, with or without
381 modification, are permitted provided that the following conditions are
382 met:
383 \begin{itemize}
384 \item Redistributions of source code must retain the above copyright
385 notice, this list of conditions and the following disclaimer.
387 \item Redistributions in binary form must reproduce the above
388 copyright notice, this list of conditions and the following disclaimer
389 in the documentation and/or other materials provided with the
390 distribution.
392 \item Neither the names of Sensoria Corporation or UCLA, nor the
393 names of other contributors may be used to endorse or promote products
394 derived from this software without specific prior written permission.
395 \end{itemize}
397 THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
398 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
399 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
400 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS
401 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
402 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
403 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
404 BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
405 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
406 OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
407 IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
409 \section{Why use FUSD?}
410 \label{use-cases}
412 One basic question about FUSD that one might ask is: what is it good
413 for? Why use it? In this section, we describe some of the situations
414 in which FUSD has been the solution for us.
416 \subsection{Device Driver Layering}
418 A problem that comes up frequently in modern operating systems is
419 contention for a single resource by multiple competing processes. In
420 UNIX, it's the job of a device driver to coordinate access to such
421 resources. By accepting requests from user processes and (for
422 example) queuing and serializing them, it becomes safe for processes
423 that know nothing about each other to make requests in parallel to the
424 same resource. Of course, kernel drivers do this job already, but
425 they typically operate on top of hardware directly. However, kernel
426 drivers can't easily be layered on top of {\em other device drivers}.
428 For example, consider a device such as a modem that is connected to a
429 host via a serial port. Let's say we want to implement a device
430 driver that allows multiple users to dial the telephone (e.g., {\tt
431 echo 1-310-555-1212 > /dev/phone-dialer}). Such a driver should be
432 layered {\em on top of} the serial port driver---that is, it most
433 likely wants to write to {\tt /dev/ttyS0}, not directly to the UART
434 hardware itself.
436 While it is possible to write to a logical file from within a kernel
437 device driver, it is both tricky and considered bad practice. In the
438 \htmladdnormallinkfoot{words of kernel hacker Dick Johnson}
439 {http://www.uwsg.indiana.edu/hypermail/linux/kernel/0005.3/0061.html},
440 ``You should never write a [kernel] module that requires reading or
441 writing to any logical device. The kernel is the thing that translates
442 physical I/O to logical I/O. Attempting to perform logical I/O in the
443 kernel is effectively going backwards.''
445 With FUSD, it's possible to layer device drivers because the driver is
446 a user-space process, not a kernel module. A FUSD implementation of
447 our hypothetical {\tt /dev/phone-dialer} can open {\tt /dev/ttyS0}
448 just as any other process would.
450 Typically, such layering is accomplished by system daemons. For
451 example, the {\tt lpd} daemon manages printers at a high level. Since
452 it is a user-space process, it can access the physical printer devices
453 using kernel device drivers (for example, using printer or network
454 drivers). There a number of advantages to using FUSD instead:
455 \begin{itemize}
456 \item Using FUSD, a daemon/driver can create a standard device file
457 which is accessible by any program that knows how to use the POSIX
458 system call interface. Some trickery is possible using named
459 pipes and FIFOs, but quickly becomes difficult because of multiplexed
460 writes from multiple processes.
461 \item FUSD drivers receive the UID, GID, and process ID along with
462 every file operation, allowing the same sorts of security policies to
463 be implemented as would be possible with a real kernel driver. In
464 contrast, writes to a named pipe, UDP, and so forth are ``anonymous.''
465 \end{itemize}
467 \subsection{Use of User-Space Libraries}
469 Since a FUSD driver is just a regular user-space program, it can
470 naturally use any of the enormous body of existing libraries that
471 exist for almost any task. FUSD drivers can easily incorporate user
472 interfaces, encryption, network protocols, threads, and almost
473 anything else. In contrast, porting arbitrary C code into the kernel
474 is difficult and usually a bad idea.
476 \subsection{Driver Memory Protection}
478 Since FUSD drivers run in their own process space, the rest of the
479 system is protected from them. A buggy or malicious FUSD driver, at
480 the very worst, can only corrupt itself. It's not possible for it to
481 corrupt the kernel, other FUSD drivers, or even the processes that are
482 using its devices. In contrast, a buggy kernel module can bring down
483 any process in the system, or the entire kernel itself.
485 \subsection{Giving libraries language independence and standard
486 notification interfaces}
488 One particularly interesting application of FUSD that we've found very
489 useful is as a way to let regular user-space libraries export device
490 file APIs. For example, imagine you had a library which factored
491 large composite numbers. Typically, it might have a C
492 interface---say, a function called {\tt int\ *factorize(int\ bignum)}.
493 With FUSD, it's possible to create a device file interface---say, a
494 device called {\tt /dev/factorize} to which clients can {\tt write(2)}
495 a big number, then {\tt read(2)} back its factors.
497 This may sound strange, but device file APIs have at least three
498 advantages over a typical library API. First, it becomes much more
499 language independent---any language that can make system calls can
500 access the factorization library. Second, the factorization code is
501 running in a different address space; if it crashes, it won't crash or
502 corrupt the caller. Third, and most interestingly, it is possible to
503 use {\tt select(2)} to wait for the factorization to complete. {\tt
504 select(2)} would make it easy for a client to factor a large number
505 while remaining responsive to {\em other} events that might happen in
506 the meantime. In other words, FUSD allows normal user-space libraries
507 to integrate seamlessly with UNIX's existing, POSIX-standard event
508 notification interface: {\tt select(2)}.
510 \subsection{Development and Debugging Convenience}
512 FUSD processes can be developed and debugged with all the normal
513 user-space tools. Buggy drivers won't crash the system, but instead
514 dump cores that can be analyzed. All of your favorite visual
515 debuggers, memory bounds checkers, leak detectors, profilers, and
516 other tools can be applied to FUSD drivers as they would to any other
517 program.
519 \section{Installing FUSD}
521 This section describes the installation procedure for FUSD. It
522 assumes a good working knowledge of Linux system administration.
525 \subsection{Prerequisites}
527 Before installing FUSD, make sure you have all of the following
528 packages installed and working correctly:
530 \begin{itemize}
531 \item {\bf Linux kernel 2.4.0 or later}. FUSD was developed under
532 2.4.0 and should work with any kernel in the 2.4 series.
534 \item {\bf devfs installed and running.} FUSD dynamically registers
535 devices using devfs, the Linux device filesystem by Richard Gooch.
536 For FUSD to work, devfs must be installed and running on your system.
537 For more information about devfs installation, see the
538 \htmladdnormallinkfoot{devfs home
539 page}{http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html}.
541 Note that some distributions make installation devfs easier. RedHat
542 7.1, for example, already has all of the necessary daemons and
543 configuration changes integrated. devfs can be installed simply by
544 recompiling the kernel with devfs support enabled and reconfiguring
545 LILO to pass {\tt "devfs=mount"} to the kernel.
546 \end{itemize}
549 \subsection{Compiling FUSD as a Kernel Module}
551 Before compiling anything, take a look at the Makefile in FUSD's home
552 directory. Adjust any constants that are not correct. In particular,
553 make sure {\tt KERNEL\_HOME} correctly reflects the place where your
554 kernel sources are installed, if they aren't in the default location
555 of {\tt /usr/src/linux}.
557 Then, type {\tt make}. It should generate a directory whose name
558 looks something like {\tt obj.i686-linux}, or some variation depending
559 on your architecture. Inside of that directory will be a number of
560 files, including:
561 \begin{itemize}
562 \item kfusd.o -- The FUSD kernel module
563 \item libfusd.a -- The C library used to talk to the kernel module
564 \item Example programs -- linked against libfusd.a
565 \end{itemize}
567 Compilation of the kernel module will fail if the dependencies
568 described in the previous section are not satisfied. The module must
569 be compiled again Linux kernel must be v2.4.0 or later, and the kernel
570 must have devfs support enabled.
573 \subsection{Testing and Troubleshooting}
575 Once everything has been compiled, give it a try to see if it actually
576 does something. First, use {\tt insmod} to insert the FUSD kernel
577 module, e.g. {\tt insmod obj.i686-linux/kfusd.o}. A greeting message
578 similar to ``{\tt fusd: starting, Revision: 1.50}'' should appear in
579 the kernel log (accessed using the {\tt dmesg} command, or by typing
580 {\tt cat /proc/kmsg}). You can verify the module has been inserted by
581 typing {\tt lsmod}, or alternatively {\tt cat /proc/modules}.
583 Once the module has been inserted successfully, trying running the
584 {\tt helloworld} example program. When run, the program should print
585 a greeting message similar to {\tt /dev/hello-world should now exist -
586 calling fusd\_run}. This means everything is working; the daemon is
587 now blocked, waiting for requests to the new device. From another
588 shell, type {\tt cat /dev/hello-world}. You should see {\tt Hello,
589 world!} printed in response. Try killing the test program; the
590 corresponding device file should disappear.
592 If nothing seems to be working, try looking at the kernel message log
593 (type {\tt dmesg} or {\tt cat /proc/kmsg}) to see if there are any
594 errors. If nothing seems obviously wrong, try turning on FUSD kernel
595 module debugging by defining {\tt CONFIG\_FUSD\_DEBUG} in kfusd.c,
596 then recompiling and reinserting the module.
599 \subsection{Installation}
601 Typing {\tt make install} will copy the FUSD library, header files,
602 and man pages into {\tt /usr/local}. The FUSD kernel module is {\em
603 not} installed automatically because of variations among different
604 Linux distributions in how this is accomplished. You may want to
605 arrange to have the module start automatically on boot by (for
606 example) copying it into {\tt /lib/modules/your-kernel-version}, and
607 adding it to {\tt /etc/modules.conf}.
610 \subsection{Making FUSD Part of the Kernel Proper}
612 The earlier instructions, by default, create a FUSD kernel module.
613 If desired, it's also very easy to build FUSD right into the kernel,
614 instead:
615 \begin{enumerate}
616 \item Unpack the 2.4 kernel sources and copy all the files in the {\tt
617 include} and {\tt kfusd} directories into your kernel source tree,
618 under {\tt drivers/char}. For example, if FUSD is in
619 your home directory, and your kernel is in {\tt /usr/src/linux}:
620 \begin{verbatim}
621 cp ~/fusd/kfusd/* ~/fusd/include/* /usr/src/linux/drivers/char
622 \end{verbatim}
624 \item Apply the patch found in FUSD's {\tt patches} directory to your
625 kernel source tree. For example:
626 \begin{verbatim}
627 cd /usr/src/linux
628 patch -p0 < ~/fusd/patches/fusd-inkernel.patch
629 \end{verbatim}
630 The FUSD in-kernel patch doesn't actually change any kernel sources
631 proper; it just adds FUSD to the kernel configuration menu and
632 Makefile.
633 \item Using your kernel configurator of choice (e.g. {\tt make
634 menuconfig}), turn on the FUSD options. It will be under the
635 ``Character devices'' menu.
636 \item Build and install the kernel as usual.
637 \end{enumerate}
640 \section{Basic Device Creation}
642 Enough introduction---it's time to actually create a basic device
643 driver using FUSD!
645 This following sections will illustrate various techniques using
646 example programs. To save space, interesting excerpts are shown
647 instead of entire programs. However, the {\tt examples} directory
648 of the FUSD distribution contains all the examples in their
649 entirety. They can actually be compiled and run on a system with the
650 FUSD kernel module installed.
652 Where this text refers to example program line numbers, it refers to
653 the line numbers printed alongside the excerpts in the manual---not
654 the line numbers of the actual programs in the {\tt examples}
655 directory.
658 \subsection{Using {\tt fusd\_register} to create a new device}
659 \label{using-fusd-register}
661 We saw an example of a simple driver, helloworld.c, in
662 Program~\ref{helloworld.c} on page~\pageref{helloworld.c}. Let's go
663 back and examine that program now in more detail.
665 The FUSD ball starts rolling when the {\tt fusd\_register} function is
666 called, as shown on line 40. This function tells the FUSD kernel
667 module:
668 \begin{itemize}
669 \item {\tt char *name}---The name of the device being created. The
670 prefix (such as {\tt /dev/}) must match the location where devfs has
671 been mounted. Names containing slashes (e.g., {\tt
672 /dev/my-devices/dev1}) are legal; devfs creates subdirectories
673 automatically.
674 \item {\tt mode\_t mode}---The device's default permissions. This is
675 usually specified using an octal constant with a leading 0---{\tt 0666}
676 (readable and writable by everyone) instead of the incorrect decimal
677 constant {\tt 666}.
678 \item {\tt void *device\_info}---Private data that should be passed to
679 callback functions for this device. The use of this field is
680 described in Section~\ref{device-info}.
681 \item {\tt struct fusd\_file\_operations *fops}---A structure containing
682 pointers to the callback functions that should be called by FUSD
683 in response to certain events.
684 \end{itemize}
686 If device registration is successful, {\tt fusd\_register} returns a
687 {\em device handle}---a small integer $\ge0$. On errors, it returns
688 -1 and sets the global variable {\tt errno} appropriately. In
689 reality, the device handle you get is a plain old file descriptor,
690 as we'll see in Section~\ref{selecting}.
692 Although Program~\ref{helloworld.c} only calls {\tt fusd\_register}
693 once, it can be called multiple times if the FUSD driver is handling
694 more than one device as we'll see in Program~\ref{drums.c}.
696 There is intentional similarity between {\tt fusd\_register()} and the
697 kernel's device registration functions, such as {\tt
698 devfs\_register()} and {\tt register\_chrdev()}. In many ways, FUSD's
699 interface is meant to mirror the kernel interface as closely as
700 possible.
702 The {\tt fusd\_file\_operations} structure, defined in {\tt fusd.h},
703 contains a list of callbacks that are used in response to different
704 system calls executed on a file. It is similar to the kernel's {\tt
705 file\_operations} structure, accepting callbacks for system calls such
706 as {\tt open()}, {\tt close()}, {\tt read()}, {\tt write()}, and {\tt
707 ioctl()}. For the most part, the prototypes of FUSD file operation
708 callbacks are the same as their kernel cousins, with one important
709 exception. The first argument of FUSD callbacks is always a pointer
710 to a {\tt fusd\_file\_info} structure; it contains information that
711 can be used to identify the file. This structure is used instead of
712 the kernel's {\tt file} and {\tt inode} structures, and will be
713 described in more detail later.
715 In lines 35--38 of Program~\ref{helloworld.c}, we create and
716 initialize a {\tt fusd\_file\_operations} structure. A GCC-specific C
717 extension allows us to name structure fields explicitly in the
718 initializer. This style may look strange, but it guards against
719 errors in the future in case the order of fields in the structure ever
720 changes. The 2.4 kernel series uses the same trick.
722 After calling {\tt fusd\_register()} on line 40, the example program
723 calls {\tt fusd\_run()} on line 44. This function turns control over
724 to the FUSD framework. fusd\_run blocks the driver until one of the
725 devices it registered needs to be serviced. Then, it calls the
726 appropriate callback and blocks again until the next event.
728 Now, imagine that a user types {\tt cat /dev/hello-world}. What
729 happens? Recall first what the {\tt cat} program itself does: opens a
730 file, reads from it until it receives an EOF (printing whatever it
731 reads to stdout), then closes it. {\tt cat} works the same way
732 regardless of what it's reading---be it a a FUSD device, a regular
733 file, a serial port, or anything else. The {\tt strace} program is a
734 great way to see this in action; see Appendix~\ref{strace} for
735 details.
737 \subsection{The {\tt open} and {\tt close} callbacks}
738 \label{open-close}
740 The first two callbacks that most drivers typically implement are {\tt
741 open} and {\tt close}. Each of these two functions are passed just
742 one argument---the {\tt fusd\_file\_info} structure that describes the
743 instance of the file being opened or closed. Use of the information
744 in that structure will be covered in more detail in
745 Section~\ref{fusd-file-info}.
747 The semantics of an {\tt open} callback's return value are exactly the
748 same as inside the kernel:
749 \begin{itemize}
750 \item 0 means success, and the file is opened. If the file is allowed
751 to open, the kernel returns a valid file descriptor to the client.
752 Using that descriptor, other callbacks may be called for that file,
753 including (at least) a {\tt close} callback.
755 \item A negative number indicates a failure, and that the file should
756 not be opened. Such return values should {\em always} be the
757 specified as a negative {\tt errno} value such as {\tt -EPERM}, {\tt
758 -EBUSY}, {\tt -ENODEV}, {\tt -ENOMEM}, and so on. For example, if the
759 callback returns {\tt -EPERM}, the caller's {\tt open()} will return
760 -1, with {\tt errno} set to {\tt EPERM}. A complete list of possible
761 return values can be found in the Linux kernel sources, under {\tt
762 include/asm/errno.h}.
763 \end{itemize}
765 If an {\tt open} callback returns 0 (success), a driver is {\em
766 guaranteed} to receive exactly one {\tt close} callback for that file
767 later. By the same token, the close callback {\em will not} be called
768 if the open fails. Therefore, {\tt open} callbacks that can return
769 failure must be sure to deallocate any resources they might have
770 allocated before returning a failure.
772 Let's return to our example in Program~\ref{helloworld.c}, which
773 creates the {\tt /dev/hello-world} device. If a user types {\tt cat
774 /dev/hello-world}, {\tt cat} will will use the {\tt open(2)} system
775 call to open the file. FUSD will then proxy that system call to the
776 driver and activate the callback that was registered as the {\tt open}
777 callback. Recall from line 36 of Program~\ref{helloworld.c} that we
778 registered {\tt do\_open\_or\_close}, which appears on line 8.
780 In {\tt helloworld.c}, the {\tt open} callback always returns 0, or
781 success. However, in a real driver, something more interesting will
782 probably happen---permissions checks, memory allocation for
783 state-keeping, and so forth. The corresponding {\em de}-allocation of
784 those resources should occur in the {\tt close} callback, which is
785 called when a user application calls {\tt close} on their file
786 descriptor. {\tt close} callbacks are allowed to return error values,
787 but this does not prevent the file from actually closing.
791 \subsection{The {\tt read} callback}
792 \label{read-callback}
794 Returning to our {\tt cat /dev/hello-world} example, what happens
795 after the {\tt open} is successful? Next, {\tt cat} will try to use
796 {\tt read(2)}, which will get proxied by FUSD to the function {\tt
797 do\_read} on line 13. This function takes some additional arguments
798 that we didn't see in the open and close callbacks:
799 \begin{itemize}
800 \item {\tt struct fusd\_file\_info *file}---The first argument to all
801 callbacks, containing information which describes the file; see
802 Section~\ref{fusd-file-info}.
803 \item {\tt char *user\_buffer}---The buffer that the callback should use to
804 write data that it is returning to the user.
805 \item {\tt size\_t user\_length}---The maximum number of bytes
806 requested by the user. The driver is allowed to return fewer bytes,
807 but should never write more then {\tt user\_length} bytes into {\tt
808 user\_buffer}.
809 \item {\tt loff\_t *offset}---A pointer to an integer which represents
810 the caller's offset into the file (i.e., the user's file pointer).
811 This value can be modified by the callback; any change will be
812 propagated back to the user's file pointer inside the kernel.
813 \end{itemize}
815 The semantics of the return value are the same as if the
816 callback were being written inside the kernel itself:
817 \begin{itemize}
818 \item Positive return values indicate success. If the call is
819 successful, and the driver has copied data into {\tt buffer}, the
820 return value indicates how many bytes were copied. This number should
821 never be greater than the {\tt user\_length} argument.
822 \item A 0 return value indicates EOF has been reached on the file.
823 \item As in the {\tt open} and {\tt close} callbacks, negative values
824 (such as -EPERM, -EPIPE, or -ENOMEM) indicate errors. Such values will
825 cause the user's {\tt read()} to return -1 with errno set
826 appropriately.
827 \end{itemize}
829 The first time a read is done on a device file, the user's file
830 pointer ({\tt *offset}) is 0. In the case of this first read, a
831 greeting message of {\tt Hello, world!} is copied back to the user, as
832 seen on line 24. The user's file pointer is then advanced. The next
833 read therefore fails the comparison at line 20, falling straight
834 through to return 0, or EOF.
836 In this simple program, we also see an example of an error return on
837 line 22: if the user tries to do a read smaller than the length of the
838 greeting message, the read will fail with -EINVAL. (In an actual
839 driver, it would normally not be an error for a user to provide a
840 smaller read buffer than the size of the available data. The right
841 way for drivers to handle this situation is to return partial data,
842 then move {\tt *offset} forward so that the remainder is returned on
843 the next {\tt read()}. We see an example of this in
844 Program~\ref{echo.c}.)
846 \subsection{The {\tt write} callback}
848 Program~\ref{helloworld.c} illustrated how a driver could return data
849 {\em to} a client using the {\tt read} callback. As you might expect, there
850 is a corresponding {\tt write} callback that allows the driver to
851 receive data {\em from} a client. {\tt write} takes four arguments,
852 similar to the {\tt read} callback:
854 \begin{itemize}
855 \item {\tt struct fusd\_file\_info *file}---The first argument to all
856 callbacks, containing information which describes the file; see
857 Section~\ref{fusd-file-info}.
858 \item {\tt const char *user\_buffer}---Pointer to data being written
859 by the client (read-only).
860 \item {\tt size\_t user\_length}---The number of bytes pointed to by
861 {\tt user\_buffer}.
862 \item {\tt loff\_t *offset}---A pointer to an integer which represents
863 the caller's offset into the file (i.e., the user's file pointer).
864 This value can be modified by the callback; any change will be
865 propagated back to the user's file pointer inside the kernel.
866 \end{itemize}
868 The semantics of {\tt write}'s return value are the same as in a
869 kernel callback:
870 \begin{itemize}
871 \item Positive return values indicate success and indicate how many
872 bytes of the user's buffer were successfully written (i.e.,
873 successfully processed by the driver in some way). The return value
874 may be less than or equal to the {\tt user\_length} argument, but
875 should never be greater.
876 \item 0 should only be returned in response to a {\tt write} of length
878 \item Negative values (such as -EPERM, -EPIPE, or -ENOMEM) indicate
879 errors. Such values will cause the user's {\tt write()} to return -1
880 with errno set appropriately.
881 \end{itemize}
883 Program~\ref{echo.c}, echo.c, is an example implementation of a device
884 ({\tt /dev/echo}) that uses both {\tt read()} and {\tt write()}
885 callbacks. A client that tries to {\tt read()} from this device will
886 get the contents of the most recent {\tt write()}. For example:\\
887 \begin{minipage}{\textwidth}
888 \vspace{\baselineskip}
889 \begin{verbatim}
890 % echo Hello there > /dev/echo
891 % cat /dev/echo
892 Hello there
893 % echo Device drivers are fun > /dev/echo
894 % cat /dev/echo
895 Device drivers are fun
897 \end{verbatim}
898 \end{minipage}
900 \begin{Program}
901 \listinginput[5]{1}{echo.c.example}
902 \caption{echo.c: Using both {\tt read} and {\tt write} callbacks}
903 \label{echo.c}
904 \end{Program}
906 The implementation of {\tt /dev/echo} keeps a global variable, {\tt
907 data}, which serves as a cache for the data most recently written to
908 the driver by a client program. The driver does not assume the data
909 is null-terminated, so it also keeps track of the number of bytes of
910 data available. (These two variables appear on lines 1--2.)
912 The driver's {\tt write} callback first frees any data which might
913 have been allocated by a previous call to write (lines 26--29). Next,
914 on line 33, it attempts to allocate new memory for the new data
915 arriving. If the allocation fails, {\tt -ENOMEM} is returned to the
916 client. If the allocation is successful, the driver copies the data
917 into its local buffer and stores its length (lines 37--38). Finally,
918 the driver tells the user that the entire buffer was consumed by
919 returning a value equal to the number of bytes the user tried to write
920 ({\tt user\_length}).
922 The {\tt read} callback has some extra features that we did not see in
923 Program~\ref{helloworld.c}'s {\tt read()} callback. The most
924 important is that it allows the driver to read the available data {\em
925 incrementally}, instead of requiring that the first {\tt read()}
926 executed by the client has enough space for all the data the driver
927 has available. In other words, a client can do two 50-byte reads,
928 and expect the same effect as if it had done a single 100-byte read.
930 This is implemented using {\tt *offset}, the user's file pointer. If
931 the user is trying to read past the amount of data we have available,
932 the driver returns EOF (lines 8--9). Normally, this happens after the
933 client has finished reading data. However, in this driver, it might
934 happen on a client's first read if nothing has been written to the
935 driver yet or if the most recent write's memory allocation failed.
937 If there is data to return, the driver computes the number of bytes
938 that should be copied back to the client---the minimum of the number
939 of bytes the user asked for, and the number of bytes of data that this
940 client hasn't seen yet (line 12). This data is copied back to the
941 user's buffer (line 15), and the user's file pointer is advanced
942 accordingly (line 16). Finally, on line 19, the client is told how
943 many bytes were copied to its buffer.
946 \subsection{Unregistering a device with {\tt fusd\_unregister()}}
948 All devices registered by a driver are unregistered automatically when
949 the program exits (or crashes). However, the {\tt fusd\_unregister()}
950 function can be used to unregister a device without terminating the
951 entire driver. {\tt fusd\_unregister} takes one argument: a device
952 handle (i.e., the return value from {\tt fusd\_register()}).
954 A device can be unregistered at any time. Any client system calls
955 that are pending when a device is unregistered will return immediately
956 with an error. In this case, {\tt errno} will be set to {\tt -EPIPE}.
959 \section{Using Information in {\tt fusd\_file\_info}}
961 \label{fusd-file-info}
963 We mentioned in the previous sections that the first argument to every
964 callback is a pointer to a {\tt fusd\_file\_info} structure. This
965 structure contains information that can be useful to driver
966 implementers in deciding how to respond to a system call request.
968 The fields of {\tt fusd\_file\_info} structures fall into several
969 categories:
970 \begin{itemize}
971 \item {\em Read-only.} The driver can inspect the value, but changing
972 it will have no effect.
973 \begin{itemize}
974 \item {\tt pid\_t pid}: The process ID of the process making the
975 request
976 \item {\tt uid\_t uid}: The user ID of the owner of the process making
977 the request
978 \item {\tt gid\_t gid}: The group ID of the owner of the process making
979 the request
980 \end{itemize}
981 \item {\em Read-write.} Any changes to the value will be propagated
982 back to the kernel and be written to the appropriate in-kernel
983 structure.
984 \begin{itemize}
985 \item {\tt unsigned int flags}: A copy of the {\tt f\_flags} field in
986 the kernel's {\tt file} structure. The flags are an or'd-together set
987 of the kernel's {\tt O\_} series of flags: {\tt O\_NONBLOCK}, {\tt
988 O\_APPEND}, {\tt O\_SYNC}, etc.
989 \item {\tt void *device\_info}: The data passed to {\tt
990 fusd\_register} when the device was registered; see
991 Section~\ref{device-info} for details
992 \item {\tt void *private\_data}: A generic per-file-descriptor pointer
993 usable by the driver for its own purposes, such as to keep state (or a
994 pointer to state) that should be maintained between operations on the
995 same instance of an open file. It is guaranteed to be NULL when the
996 file is first opened. See Section~\ref{private-data} for more
997 details.
998 \end{itemize}
999 \item {\em Hidden fields.} The driver should not touch these fields
1000 (such as {\tt fd}). They contain state used by the FUSD library to
1001 generate the reply sent to the kernel.
1002 \end{itemize}
1004 {\bf Important note:} the value of the {\tt fusd\_file\_info} pointer
1005 itself has {\em no meaning}. Repeated requests on the same file
1006 descriptor {\em will not} generate callbacks with identical {\tt
1007 fusd\_file\_info} pointer values, as would be the case with an
1008 in-kernel driver. In other words, if a driver needs to keep state in
1009 between successive system calls on a user's file descriptor, it {\em
1010 must} store that state using the {\tt private\_data} field. The {\tt
1011 fusd\_file\_info} pointer itself is ephemeral; the data to which it
1012 points is persistent.
1014 Program~\ref{uid-filter.c} shows an example of how a driver might make
1015 use of the data in the {\tt fusd\_file\_info} structure. Much of the
1016 driver is identical to helloworld.c. However, instead of printing a
1017 static greeting, this new program generates a custom message each time
1018 the device file is read, as seen on line 25. The message contains the
1019 PID of the user process that requested the read ({\tt file->pid}).
1021 \begin{Program}
1022 \listinginput[5]{1}{uid-filter.c.example}
1023 \caption{uid-filter.c: Inspecting data in {\tt fusd\_file\_info} such
1024 as UID and PID of the calling process}
1025 \label{uid-filter.c}
1026 \end{Program}
1028 In addition, Program~\ref{uid-filter.c}'s {\tt open} callback does not
1029 return 0 (success) unconditionally as it did in
1030 Program~\ref{helloworld.c}. Instead, it checks (on line 7) to make
1031 sure the UID of the process trying to read from the device ({\tt
1032 file->uid}) matches the UID under which the driver itself is running
1033 ({\tt getuid()}). If they don't match, -EPERM is returned. In other
1034 words, only the user who ran the driver is allowed to read from the
1035 device that it creates. If any other user---including root!---tries
1036 to open it, a ``Permission denied'' error will be generated.
1039 \subsection{Registration of Multiple Devices, and Passing Data to Callbacks}
1041 \label{device-info}
1043 Device drivers frequently expose several different ``flavors'' of a
1044 device. For example, a single magnetic tape drive will often have
1045 many different device files in {\tt /dev}. Each device file
1046 represents a different combination of options such as
1047 rewind/no-rewind, or compressed/uncompressed. However, they access
1048 the same physical tape drive.
1050 Traditionally, the device file's {\em minor number} was used to
1051 communicate the desired options with device drivers. But, since devfs
1052 dynamically (and unpredictably) generates both major and minor numbers
1053 every time a device is registered, a different technique was
1054 developed. When using devfs, drivers are allowed to associate a value
1055 (of type {\tt void *}) with each device they register. This facility
1056 takes the place of the minor number.
1058 The devfs solution is also used by FUSD. The mysterious third
1059 argument to {\tt fusd\_register} that we mentioned in
1060 Section~\ref{using-fusd-register} is an arbitrary piece of data that
1061 can be passed to FUSD when a device is registered. Later, when a
1062 callback is activated, the contents of that argument are available in
1063 the {\tt device\_info} member of the {\tt fusd\_file\_info} structure.
1065 Program~\ref{drums.c} shows an example of this technique, inspired by
1066 Alessandro Rubini's similar devfs tutorial
1067 \htmladdnormallinkfoot{published in Linux
1068 Magazine}{http://www.linux.it/kerneldocs/devfs/}. It creates a number
1069 of devices in the {\tt /dev/drums} directory, each of which is useful
1070 for generating a different kind of ``sound''---{\tt /dev/drums/bam},
1071 {\tt /dev/drums/boom}, and so on. Reading from any of these devices
1072 will return a string equal to the device's name.
1074 \begin{Program}
1075 \listinginput[5]{1}{drums.c.example}
1076 \caption{drums.c: Passing private data to {\tt fusd\_register} and
1077 retrieving it from {\tt device\_info}}
1078 \label{drums.c}
1079 \end{Program}
1081 The first thing to notice about {\tt drums.c} is that it registers
1082 more than one FUSD device. In the loop starting in line 31, it calls
1083 {\tt fusd\_register()} once for every device named in {\tt
1084 drums\_strings} on line 1. When {\tt fusd\_run()} is called, it
1085 automatically watches every device the driver registered, and
1086 activates the callbacks associated with each device as needed.
1087 Although {\tt drums.c} uses the same set of callbacks for every device
1088 it registers (as can be seen on line 33), each device could have
1089 different callbacks if desired. (Not shown is the initialization of
1090 {\tt drums\_fops}, which assigns {\tt drums\_read} to be the {\tt
1091 read} callback.)
1093 If {\tt drums\_read} is called for all 6 types of drums, how does it
1094 know which device it's supposed to be servicing when it gets called?
1095 The answer is in the third argument of {\tt fusd\_register()}, which
1096 we were previously ignoring. Whatever value is passed to {\tt
1097 fusd\_register()} will be passed back to the callback in the {\tt
1098 device\_info} field of the {\tt fusd\_file\_info} structure. The name
1099 of the drum sound is passed to {\tt fusd\_register} on line 33, and
1100 later retrieved by the driver on line 12.
1102 Although this example uses a string as its {\tt device\_info}, the
1103 pointer can be used for anything---a mode number, a pointer to a
1104 configuration structure, and so on.
1107 \subsection{The difference between {\tt device\_info} and {\tt
1108 private\_data}}
1110 \label{private-data}
1112 As we mentioned in Section~\ref{fusd-file-info}, the {\tt
1113 fusd\_file\_info} structure has two seemingly similar fields, both of
1114 which can be used by drivers to store their own data: {\tt
1115 device\_info} and {\tt private\_data}. However, there is an important
1116 difference between them:
1118 \begin{itemize}
1120 \item {\tt private\_data} is stored {\em per file descriptor}. If 20
1121 processes open a FUSD device (or, one process opens a FUSD device 20
1122 times), each of those 20 file descriptors will have their own copy of
1123 {\tt private\_data} associated with them. This field is therefore
1124 useful to drivers that need to differentiate multiple requests to a
1125 single device that might be serviced in parallel. (Note that most
1126 UNIX variants, including Linux, do allow multiple processes to share a
1127 single file descriptor---specifically, if a process {\tt open}s a
1128 file, then {\tt fork}s. In this case, processes will also share a
1129 single copy of {\tt private\_data}.)
1131 The first time a FUSD driver sees {\tt private\_data} (in the {\tt
1132 open} callback), it is guaranteed to be NULL. Any changes to it by a
1133 driver callback will only affect the state associated with that single
1134 file descriptor.
1136 \item {\tt device\_info} is kept {\em per device}. That is, {\em all}
1137 clients of a device share a {\em single} copy of {\tt device\_info}.
1138 Unlike {\tt private\_data}, which is always initialized to NULL, {\tt
1139 device\_info} is always initialized to whatever value the driver
1140 passed to {\tt fusd\_register} as described in the previous section.
1141 If a callback changes the copy of {\tt device\_info} in the {\tt
1142 fusd\_file\_info} structure, this has no effect; {\tt device\_info}
1143 can only be set at registration time, with {\tt fusd\_register}.
1145 \end{itemize}
1147 In short, {\tt device\_info} is used to differentiate {\em devices}.
1148 {\tt private\_data} is used to differentiate {\em users of those
1149 devices}.
1151 Program~\ref{drums2.c}, drums2.c, illustrates the difference between
1152 {\tt device\_info} and {\tt private\_data}. Like the original
1153 drums.c, it creates a bunch of devices in {\tt /dev/drums/}, each of
1154 which ``plays'' a different sound. However, it also does something
1155 new: keeps track of how many times each device has been opened. Every
1156 read to any drum gives you the name of its sound as well as your
1157 unique ``user number''. And, instead of returning just a single line
1158 (as drums.c did), it will keep generating more ``sound'' every time a
1159 {\tt read()} system call arrives.
1161 \begin{Program}
1162 \listinginput[5]{1}{drums2.c.example}
1163 \caption{drums2.c: Using both {\tt device\_info} and {\tt private\_data}}
1164 \label{drums2.c}
1165 \end{Program}
1167 The trick is that we want to keep users separate from each other. For
1168 example, user one might type:\\
1169 \begin{minipage}{\textwidth}
1170 \vspace{\baselineskip}
1171 \begin{verbatim}
1172 % more /dev/drums/bam
1173 You are user 1 to hear a drum go 'bam'!
1174 You are user 1 to hear a drum go 'bam'!
1175 You are user 1 to hear a drum go 'bam'!
1178 \end{verbatim}
1179 \end{minipage}
1181 Meanwhile, another user in a different shell might type the same
1182 command at the same time, and get different results:\\
1183 \begin{minipage}{\textwidth}
1184 \vspace{\baselineskip}
1185 \begin{verbatim}
1186 % more /dev/drums/bam
1187 You are user 2 to hear a drum go 'bam'!
1188 You are user 2 to hear a drum go 'bam'!
1189 You are user 2 to hear a drum go 'bam'!
1192 \end{verbatim}
1193 \end{minipage}
1195 The idea is that no matter how long those two users go on reading
1196 their devices, the driver always generates a message that is specific
1197 to that user. The two users' data are not intermingled.
1199 To implement this, Program~\ref{drums2.c} introduces a new {\tt
1200 drum\_info} structure (lines 1-4), which keeps track of both the
1201 drum's name, and the number of time each drum device has been opened.
1202 An instance of this structure, {\tt drums}, is initialized on lines
1203 4-8. Note that the call to {\tt fusd\_register} (line 45) now passes
1204 a pointer to a {\tt drum\_info} structure. (This {\tt drum\_info *}
1205 pointer is shared by every instance of a client that opens a
1206 particular type of drum.)
1208 Each time a drum device is opened, its {\tt drum\_info} structure is
1209 retrieved from {\tt device\_info} (line 15). Then, on line 18, the
1210 {\tt num\_users} field is incremented and the new user number is
1211 stored in {\tt fusd\_file\_info}'s {\tt private\_data} field. To
1212 reiterate our earlier point: {\em {\tt device\_info} contains
1213 information global to all users of a device, while {\tt private\_data}
1214 has information specific to a particular user of the device.}
1216 It's also worthwhile to note that when we increment {\tt num\_users}
1217 on line 18, a simple {\tt num\_users++} is correct. If this was a
1218 driver inside the kernel, we'd have to use something like {\tt
1219 atomic\_inc()} because a plain {\tt i++} is not atomic. Such a
1220 non-atomic statement will result in a race condition on SMP platforms,
1221 if an interrupt handler also touches {\tt num\_users}, or in some
1222 future Linux kernel that is preemptive. Since this FUSD driver is
1223 just a plain, single-threaded user-space application, good old {\tt
1224 ++} still works.
1227 \section{Writing {\tt ioctl} Callbacks}
1229 The POSIX API provides for a function called {\tt ioctl}, which allows
1230 ``out-of-band'' configuration information to be passed to a device
1231 driver through a file descriptor. Using FUSD, you can write a device
1232 driver with a callback to handle {\tt ioctl} requests from clients.
1233 For the most part, it's just like writing a callback for {\tt read} or
1234 {\tt write}, as we've seen in previous sections. From the client's
1235 point of view, {\tt ioctl} traditionally takes three arguments: a file
1236 descriptor, a command number, and a pointer to any additional data
1237 that might be required for the command.
1239 \subsection{Using macros to generate {\tt ioctl} command numbers}
1241 The Linux header file {\tt /usr/include/asm/ioctl.h} defines macros
1242 that {\em must} be used to create the {\tt ioctl} command number.
1243 These macros take various combinations of three arguments:
1245 \begin{itemize}
1247 \item {\tt type}---an 8-bit integer selected to be specific to the
1248 device driver. {\tt type} should be chosen so as not to conflict with
1249 other drivers that might be ``listening'' to the same file descriptor.
1250 (Inside the kernel, for example, the TCP and IP stacks use distinct
1251 numbers since an {\tt ioctl} sent to a socket file descriptor might be
1252 examined by both stacks.)
1254 \item {\tt number}---an 8-bit integer ``command number.'' Within a
1255 driver, distinct numbers should be chosen for each different kind of
1256 {\tt ioctl} command that the driver services.
1258 \item {\tt data\_type}---The name of a type used to compute how many
1259 bytes are exchanged between the client and the driver. This argument
1260 is, for example, the name of a structure.
1262 \end{itemize}
1264 The macros used to generate command numbers are:
1266 \begin{itemize}
1268 \item {\tt \_IO(int type, int number)} -- used for a simple ioctl that
1269 sends nothing but the type and number, and receives back nothing but
1270 an (integer) retval.
1272 \item {\tt \_IOR(int type, int number, data\_type)} -- used for an
1273 ioctl that reads data {\em from} the device driver. The driver will
1274 be allowed to return {\tt sizeof(data\_type)} bytes to the user.
1276 \item {\tt \_IOW(int type, int number, data\_type)} -- similar to
1277 \_IOR, but used to write data {\em to} the driver.
1279 \item {\tt \_IORW(int type, int number, data\_type)} -- a combination
1280 of {\tt \_IOR} and {\tt \_IOW}. That is, data is both written to the
1281 driver and then read back from the driver by the client.
1282 \end{itemize}
1284 \begin{Program}
1285 \listinginput[5]{1}{ioctl.h.example}
1286 \caption{ioctl.h: Using the {\tt \_IO} macros to generate {\tt ioctl}
1287 command numbers}
1288 \label{ioctl.h}
1289 \end{Program}
1291 Program~\ref{ioctl.h} is an example header file showing the use of
1292 these macros. In real programs, the client executing an ioctl and the
1293 driver that services it must share the same header file.
1295 \subsection{Example client calls and driver callbacks}
1297 Program~\ref{ioctl-client.c} shows a client program that executes {\tt
1298 ioctl}s using the ioctl command numbers defined in
1299 Program~\ref{ioctl.h}. The {\tt ioctl\_data\_t} is
1300 application-specific; our simple test program defines it as a
1301 structure containing two arrays of characters. The first {\tt ioctl}
1302 call (line 10) sends the command {\tt IOCTL\_TEST3}, which retrieves
1303 strings {\em from} the driver. The second {\tt ioctl} uses the
1304 command {\tt IOCTL\_TEST4} (line 18), which sends strings {\em to} the
1305 driver.
1307 \begin{Program}
1308 \listinginput[5]{1}{ioctl-client.c.example}
1309 \caption{ioctl-client.c: A program that makes {\tt ioctl} requests on
1310 a file descriptor}
1311 \label{ioctl-client.c}
1312 \end{Program}
1314 The portion of the FUSD driver that services these calls is shown in
1315 Program~\ref{ioctl-server.c}.
1317 \begin{Program}
1318 \listinginput[5]{1}{ioctl-server.c.example}
1319 \caption{ioctl-server.c: A driver that handles {\tt ioctl} requests}
1320 \label{ioctl-server.c}
1321 \end{Program}
1323 The ioctl example header file and test programs shown in this document
1324 (Programs~\ref{ioctl.h}, \ref{ioctl-client.c}, and
1325 \ref{ioctl-server.c}) are actually contained in a larger, single
1326 example program included in the FUSD distribution called {\tt
1327 ioctl.c}. That source code shows other variations on calling and
1328 servicing {\tt ioctl} commands.
1331 \section{Integrating FUSD With Your Application Using {\tt fusd\_dispatch()}}
1332 \label{selecting}
1334 The example applications we've seen so far have something in common:
1335 after initialization and device registration, they call {\tt
1336 fusd\_run()}. This gives up control of the program's flow, turning it
1337 over to the FUSD library instead. This worked fine for our simple
1338 example programs, but doesn't work in a real program that needs to
1339 wait for events other than FUSD callbacks. For this reason, our
1340 framework provides another way to activate callbacks that does not
1341 require the driver to give up control of its {\tt main()}.
1343 \subsection{Using {\tt fusd\_dispatch()}}
1345 Recall from Section~\ref{using-fusd-register} that {\tt
1346 fusd\_register} returns a {\em file descriptor} for every device that
1347 is successfully registered. This file descriptor can be used to
1348 activate device callbacks ``manually,'' without passing control of the
1349 application to {\tt fusd\_run()}. Whenever the file descriptor
1350 becomes readable according to {\tt select(2)}, it should be passed to
1351 {\tt fusd\_dispatch()}, which in turn will activate callbacks in the
1352 same way that {\tt fusd\_run()} does. In other words, an application
1353 can:
1354 \begin{enumerate}
1355 \item Save the file descriptors returned by {\tt fusd\_register()};
1356 \item Add those FUSD file descriptors to an {\tt fd\_set} that is
1357 passed to {\tt select}, along with any other file
1358 descriptors that might be interesting to the application; and
1359 \item Pass every FUSD file descriptor that {\tt select} indicates is
1360 readable to {\tt fusd\_dispatch}.
1361 \end{enumerate}
1363 {\tt fusd\_dispatch()} returns 0 if at least one callback was
1364 successfully activated. On error, -1 is returned with {\tt errno} set
1365 appropriately. {\tt fusd\_dispatch()} will never block---if no
1366 messages are available from the kernel, it will return -1 with {\tt
1367 errno} set to {\tt EAGAIN}.
1369 \subsection{Helper Functions for Constructing an {\tt fd\_set}}
1371 The FUSD library provides two (optional) utility functions that can
1372 make it easier to write applications that integrate FUSD into their
1373 own {\tt select()} loops. Specifically:
1374 \begin{itemize}
1375 \item {\tt void fusd\_fdset\_add(fd\_set *set, int *max)}---is meant
1376 to help construct an {\tt fd\_set} that will be passed as the
1377 ``readable fds'' set to select. This function adds the file
1378 descriptors of all previously registered FUSD devices to the fd\_set
1379 {\tt set}. It assumes that {\tt set} has already been initialized by
1380 the caller. The integer {\tt max} is updated to reflect the largest
1381 file descriptor number in the set. {\tt max} is not changed if the
1382 value passed to {\tt fusd\_fdset\_add} is already larger than the
1383 largest FUSD file descriptor added to the set.
1385 \item {\tt void fusd\_dispatch\_fdset(fd\_set *set)}---is meant to be
1386 called on the {\tt fd\_set} that is {\em returned} by select. It
1387 assumes that {\tt set} contains a set file descriptors that {\tt
1388 select()} has indicated are readable. {\tt fusd\_dispatch\_fdset()}
1389 calls {\tt fusd\_dispatch} on every descriptor in {\tt set} that is a
1390 valid FUSD descriptor. Non-FUSD descriptors in {\tt set} are
1391 ignored.
1392 \end{itemize}
1395 \begin{Program}
1396 \listinginput[5]{1}{drums3.c.example}
1397 \caption{drums3.c: Waiting for both FUSD and non-FUSD events in a
1398 {\tt select} loop}
1399 \label{drums3.c}
1400 \end{Program}
1402 The excerpt of {\tt drums3.c} shown in Program~\ref{drums3.c}
1403 demonstrates the use of these helper functions. This program is
1404 similar to the earlier drums.c example: it creates a number of musical
1405 instruments such as {\tt /dev/drums/bam} and {\tt /dev/drums/boom}.
1406 However, in addition to servicing its musical callbacks, the driver
1407 also prints a prompt to standard input asking how ``loud'' the drums
1408 should be. Instead of turning control of {\tt main()} over to {\tt
1409 fusd\_run()} as in the previous examples, {\tt drums3} uses {\tt
1410 select()} to simultaneously watch its FUSD file descriptors and standard
1411 input. It responds to input from both sources.
1413 On lines 2--5, an {\tt fd\_set} and its associated ``max'' value are
1414 initialized to contain stdin's file descriptor. On line 9, we use
1415 {\tt fusd\_fdset\_add} to add the FUSD file descriptors for all
1416 registered devices. (Not shown in this excerpt is the device
1417 registration, which is the same as the registration code we saw in
1418 {\tt drums.c}.) On line 13 we call select, which blocks until one of
1419 the fd's in the set is readable. On lines 17 and 18, we check to see
1420 if standard input is readable; if so, a function is called which reads
1421 the user's response from standard input and prints a new prompt.
1422 Finally, on line 21, we call {\tt fusd\_dispatch\_fdset}, which in
1423 turn will activate the callbacks for devices that have pending system
1424 calls waiting to be serviced.
1426 It's worth reiterating that drivers are not required to use the FUSD
1427 helper functions {\tt fusd\_fdset\_add} and {\tt
1428 fusd\_dispatch\_fdset}. If it's more convenient, a driver can
1429 manually save all of the file descriptors returned by {\tt
1430 fusd\_register}, construct its own {\tt fd\_set}, and then call {\tt
1431 fusd\_dispatch} on each descriptor that is readable. This method is
1432 sometimes required for integration with other frameworks that want to
1433 take over your {\tt main()}. For example, the
1434 \htmladdnormallinkfoot{GTK user interface
1435 framework}{http://www.gtk.org} is event-driven and requires that you
1436 pass control of your {\tt main} to it. However, it does allow you to
1437 give it a file descriptor and a function pointer, saying ``Call this
1438 callback when {\tt select} indicates this file descriptor has become
1439 readable.'' A GTK application that implements FUSD devices can work
1440 by giving GTK all the FUSD file descriptors individually, and calling
1441 {\tt fusd\_dispatch()} when GTK calls the associated callbacks.
1445 \section{Implementing Blocking System Calls}
1447 All of the example drivers that we've seen until now have had an
1448 important feature missing: they never had to {\em wait} for anything.
1449 So far, a driver's response to a system call has always been
1450 immediately available---allowing the driver to response immediately.
1451 However, real devices are often not that lucky: they usually have to
1452 wait for something to happen before completing a client's system call.
1453 For example, a driver might be waiting for data to arrive from the
1454 serial port or over the network, or even waiting for a user action.
1456 In situations like this, a basic capability most device drivers must
1457 have is the ability to {\em block} the caller. Blocking operations
1458 are important because they provide a simple interface to user programs
1459 that does flow control, rather than something more expensive like
1460 continuous polling. For example, user programs expect to be able to
1461 execute a statement like {\tt read(fd, buf, sizeof(buf))}, and expect
1462 the read call to block (stop the flow of the calling program) until
1463 data is available. This is much simpler and more efficient than
1464 polling repeatedly.
1466 In the following sections, we'll describe how to block and unblock
1467 system calls for devices that use FUSD.
1470 \subsection{Blocking the caller by blocking the driver}
1472 The easiest (but least useful) way to block a client's system call is
1473 simply to block the driver, too. For example, consider
1474 Program~\ref{console-read.c}, which implements a device called {\tt
1475 /dev/console-read}. Whenever a process tries to read from this
1476 device, the driver prints a prompt to standard input, asking for a
1477 reply. (The prompt appears in the shell the driver was run in, not
1478 the shell that's trying to read from the device.) When the user
1479 enters a line of text, the response is returned to the client that did
1480 the original {\tt read()}. By blocking the driver waiting for the
1481 reply, the client that issued the system call is blocked as well.
1483 \begin{Program}
1484 \listinginput[5]{1}{console-read.c.example}
1485 \caption{console-read.c: A simple blocking system call}
1486 \label{console-read.c}
1487 \end{Program}
1489 Blocking the driver this way is safe---unlike programming in the
1490 kernel proper, where doing something like this would block the entire
1491 system. It's also easy to implement, as seen from the example above.
1492 However, it makes the driver unresponsive to system call requests that
1493 might be coming from other clients. If another process tries to do
1494 anything at all with a blocked driver's device---even an {\tt
1495 open()}---it will block until the driver wakes up again. This
1496 limitation makes blocking drivers inappropriate for any device driver
1497 that expects to service more than one client at a time.
1500 \subsection{Blocking the caller using {\tt -FUSD\_NOREPLY};
1501 unblocking it using {\tt fusd\_return()}}
1502 \label{fusd-noreply}
1504 If a device driver expects more than one client at a time---as is
1505 often the case---a slightly different programming model is needed for
1506 system calls that can potentially block. Instead of blocking, the
1507 driver immediately sends a message to the FUSD framework that says, in
1508 essence, ``Don't unblock the client that issued this system call, but
1509 continue sending additional system call requests that might be coming
1510 from other clients.'' Driver callbacks can send this message to FUSD
1511 by returning the special value {\tt -FUSD\_NOREPLY} instead of a
1512 normal system call return value.
1514 Before a callback blocks the caller by returning {\tt -FUSD\_NOREPLY},
1515 it must save the {\tt fusd\_file\_info} pointer that was provided to
1516 the callback as its first argument. Later, when an event occurs which
1517 allows the client's blocked system call to complete, the driver should
1518 call {\tt fusd\_return()}, which will unblock the calling process and
1519 complete its system call. {\tt fusd\_return()} takes two arguments:
1520 \begin{itemize}
1521 \item The {\tt fusd\_file\_info} pointer that the callback saved
1522 earlier; and
1523 \item The system call's return value (in other words, the value that
1524 would have been returned by the callback function had it not returned
1525 {\tt -FUSD\_NOREPLY}). FUSD itself {\em does not} examine the return
1526 value passed as the second argument to {\tt fusd\_return}; it simply
1527 propagates that value back to the kernel as the return value of the
1528 blocked system call.
1529 \end{itemize}
1531 Drivers should never call {\tt fusd\_return} more than once on a
1532 single {\tt fusd\_file\_info} pointer. Doing so will have undefined
1533 results, similar to calling {\tt free()} twice on the same pointer.
1535 It also bears repeating that a callback can call {\em either} call
1536 fusd\_return() explicitly {\em or} return a normal return value (i.e.,
1537 not {\tt -FUSD\_NOREPLY}), but not both.
1539 {\tt -FUSD\_NOREPLY} and {\tt fusd\_return()} make it easy for a
1540 driver to block a process, then unblock it later when data becomes
1541 available. When the callback returns {\tt -FUSD\_NOREPLY}, the driver
1542 is freed up to wait for other events, even though the process making
1543 the system call is still blocked. The driver can then wait for
1544 something to happen that unblocks the original caller---for example,
1545 another FUSD event, data from a serial port, or data from the network.
1546 (Recall from Section~\ref{selecting} that a FUSD driver can
1547 simultaneously wait for both FUSD and non-FUSD events.)
1549 FUSD includes an example program, {\tt pager.c}, which demonstrates
1550 these techniques. The pager driver implements a simple notification
1551 interface which lets any number of ``waiters'' wait for a signal from
1552 a ``notifier.'' All the waiters wait by trying to read from {\tt
1553 /dev/pager/notify}. Those reads will block until a notifier writes
1554 the string {\tt page} to {\tt /dev/pager/input}. It's easy to try
1555 the application out---run the driver, and then open three other
1556 shells. In two of them, type {\tt cat /dev/pager/notify}. The reads
1557 will block. Then, in the third shell, type {\tt echo page >
1558 /dev/pager/input}---the other two shells should become unblocked.
1560 Let's take a look at how this application is implemented, step by
1561 step.
1563 \subsubsection{Keeping Per-Client State}
1565 The first thing to notice about {\tt pager.c} is that it keeps {\em
1566 per-client state}. That is, for every file descriptor open to the
1567 driver, a structure is allocated that has information relating to that
1568 file descriptor. Previous driver examples were, for the most part,
1569 {\em reactive}---they received requests, and immediately generated
1570 responses. Since there was never more than one request outstanding,
1571 there was no need to keep a list of them. The pager application is
1572 the first one that must keep track of an arbitrary number of requests
1573 that might be outstanding at the same time. The first excerpt of {\tt
1574 pager.c}, which appears in Program~\ref{pager-open.c}, shows the code
1575 which creates this per-client state. Lines 1--6 define a structure,
1576 {\tt pager\_client}, which keeps all the information we need about
1577 each client attached to the driver. The {\tt open} callback for {\tt
1578 /dev/pager/notify}, shown on lines 12--31, allocates memory for an
1579 instance of this structure and adds it to a linked list. (If the
1580 memory allocation fails, an error is returned to the client on line
1581 18; this will prevent the file from opening.) Note on line 25 that we
1582 use the {\tt private\_data} field to store a pointer to the client
1583 state; this allows the structure to be retrieved when later callbacks
1584 on this file descriptor arrive. The memory is deallocated when the
1585 file is closed; we'll see that in a later section.
1587 \begin{Program}
1588 \listinginput[5]{1}{pager-open.c.example}
1589 \caption{pager.c (Part 1): Creating state for every client using the
1590 driver}
1591 \label{pager-open.c}
1592 \end{Program}
1594 Another thing to notice about the open callback is the use of the {\tt
1595 last\_page\_seen} variable. The driver gives a sequence number to
1596 every page it receives; {\tt last\_page\_seen} stores the number of
1597 the most recent page seen by a client. When a new client arrives
1598 (i.e., it opens {\tt /dev/pager/notify}), its {\tt last\_page\_seen}
1599 state is set equal to the page that has most recently arrived; this
1600 forces a new client to wait for the {\em next} page, rather than
1601 immediately being notified of a page that has arrived in the past.
1603 \subsubsection{Blocking and completing reads}
1605 The next part of {\tt pager.c} is shown in Program~\ref{pager-read.c}.
1606 The {\tt pager\_notify\_read} function seen on line 1 is registered as
1607 the {\tt read} callback for the {\tt /dev/pager/notify} device. It
1608 blocks the read request using the technique we described earlier: it
1609 stores the {\tt fusd\_file\_info} pointer in that client's state
1610 structure, and returns {\tt -FUSD\_NOREPLY}. (Note that the pointer
1611 to the client's state structure comes from the {\tt private\_data}
1612 field of {\tt fusd\_file\_info}, where the open callback stored it.)
1614 \begin{Program}
1615 \listinginput[5]{1}{pager-read.c.example}
1616 \caption{pager.c (Part 2): Block clients' {\tt read} requests, and later
1617 completing the blocked reads}
1618 \label{pager-read.c}
1619 \end{Program}
1622 {\tt pager\_notify\_complete\_read} {\em unblocks} previously blocked
1623 reads. This function first checks to see that there is, in fact, a blocked
1624 read (line 19). It then checks to see if a page has arrived that the
1625 client hasn't seen yet (line 23). Finally, it updates the client
1626 state and unblocks the blocked read by calling {\tt fusd\_return}.
1627 Note the second argument to {\tt fusd\_return} is a 0; as we
1628 saw in Section~\ref{read-callback}, a 0 return value to a {\tt read}
1629 system call means EOF. (The system call will be unblocked regardless
1630 of the return value.)
1632 {\tt pager\_notify\_complete\_read} is called every time a new page
1633 arrives. New pages are processed by {\tt pager\_input\_write} (line
1634 34), which is the {\tt write} callback for {\tt /dev/pager/input}.
1635 After recording the fact that a new page has arrived, it calls {\tt
1636 pager\_notify\_complete\_read} for each client that has an open file
1637 descriptor. This will complete the reads of any clients who have not
1638 yet seen this new data, and have no effect on clients that don't have
1639 outstanding reads.
1641 There is another interesting point to notice about {\tt
1642 pager\_notify\_read}. On line 12, after it stores the blocked system
1643 call's pointer, but before we return {\tt -FUSD\_NOREPLY}, it calls
1644 the completion function. This has the effect of returning any data
1645 that might already be available back to the caller immediately. If
1646 that happens, we will end up calling {\tt fusd\_return} {\em before}
1647 we return {\tt -FUSD\_NOREPLY}. This probably seems strange, but it's
1648 legal. Recall that a callback can call fusd\_return() explicitly {\em
1649 or} return a normal (not {\tt -FUSD\_NOREPLY}) return value, but not
1650 both; the order doesn't matter.
1652 \subsubsection{Using {\tt fusd\_destroy()} to clean up client state}
1653 \label{fusd-destroy}
1655 Finally, let's take a look at one last aspect of the pager program:
1656 how it cleans up the per-client state when a client leaves. This is
1657 mostly straightforward, with one exception: a client may have an
1658 outstanding read request out when a close request comes in. Normally,
1659 a client can't make another system call request while a previous
1660 system call is still blocked. However, the {\tt close} system call is
1661 an exception: it gets called when a client dies (for example, if it
1662 receives an interrupt signal). If a {\tt close} comes in while
1663 another system call is still outstanding, the state associated with
1664 the outstanding request should be freed to avoid a memory leak. The
1665 {\tt fusd\_destroy} function is used to do this, seen on linen 12-14
1666 of Program~\ref{pager-close.c}.
1668 \begin{Program}
1669 \listinginput[5]{1}{pager-close.c.example}
1670 \caption{pager.c (Part 3): Cleaning up when a client leaves}
1671 \label{pager-close.c}
1672 \end{Program}
1675 \subsection{Retrieving a blocked system call's arguments from a {\tt
1676 fusd\_file\_info} pointer}
1678 \label{logring}
1680 In the previous section, we showed how the {\tt fusd\_return} function
1681 can be used to specify the return value of a system call that was
1682 previously blocked. However, many system calls have side effects in
1683 addition to returning a value---for example, in a {\tt read()}
1684 request, the data being returned has to be copied into the caller's
1685 buffer. To facilitate this, FUSD provides accessor functions that let
1686 drivers retrieve the arguments that had been passed to its callbacks
1687 at the time the call was originally issued. For example, the {\tt
1688 fusd\_get\_read\_buffer()} function will return a pointer to the data
1689 buffer that is provided with {\tt read()} callbacks. Drivers can use
1690 these accessor functions to affect change to a client {\em before}
1691 calling {\tt fusd\_return()}.
1693 The following accessor functions are available, all of which take a
1694 single {\tt fusd\_file\_info *} argument:
1695 \begin{itemize}
1696 \item {\tt int char *fusd\_get\_read\_buffer}---The destination buffer
1697 for data that a driver is returning to a process doing a {\tt read()}.
1698 \item {\tt const char *fusd\_get\_write\_buffer}---The source buffer
1699 containing data sent to the driver by a process doing a {\tt write()}.
1700 \item {\tt fusd\_get\_length}---The length (in bytes) of the buffer
1701 for either a {\tt read()} or a {\tt write()}.
1702 \item {\tt loff\_t fusd\_get\_offset}---The file descriptor's byte
1703 offset, typically used in {\tt read()} and {\tt write()} callbacks.
1704 \item {\tt int fusd\_get\_ioctl\_request}---An ioctl's request
1705 ``command number'' (i.e., the first argument of an ioctl).
1706 \item {\tt int fusd\_get\_ioctl\_arg}---The second argument of an
1707 ioctl for non-data-bearing {\tt ioctl} requests (i.e., {\tt \_IO}
1708 commands).
1709 \item {\tt void *fusd\_get\_ioctl\_buffer}---The data buffer for
1710 data-bearing {\tt ioctl} requests ({\tt \_IOR}, {\tt \_IOW}, and
1711 {\tt \_IORW} commands).
1712 \item {\tt int fusd\_get\_poll\_diff\_cached\_state}---See
1713 Section~\ref{selectable}.
1714 \end{itemize}
1716 We got away without using these accessor functions in our {\tt
1717 pager.c} example because the pager doesn't actually return data---it
1718 just blocks and unblocks {\tt read} calls. However, the FUSD
1719 distribution contains another example program, {\tt logring}, that
1720 demonstrates their use.
1722 {\tt logring} makes it easy to access the most recent (and only the most
1723 recent) output from a process. It works just like {\tt tail -f} on a
1724 log file, except that the storage required never grows. This can be
1725 useful in embedded systems where there isn't enough memory or disk
1726 space for keeping complete log files, but the most recent debugging
1727 messages are sometimes needed (e.g., after an error is observed).
1729 {\tt logring} uses FUSD to implement a character device, {\tt
1730 /dev/logring}, that acts like a named pipe that has a finite, circular
1731 buffer. The size of the buffer is given as a command-line argument.
1732 As more data is written into the buffer, the oldest data is discarded.
1733 A process that reads from the logring device will first read the
1734 existing buffer, then block and see new data as it's written, similar
1735 to monitoring a log file using {\tt tail -f}.
1737 You can run this example program by typing {\tt logring <logsize>},
1738 where {\tt logsize} is the size of the circular buffer in bytes.
1739 Then, type {\tt cat /dev/logring} in a shell. The {\tt cat} process
1740 will block, waiting for data. From another shell, write to the
1741 logring (e.g., {\tt echo Hi there > /dev/logring}). The {\tt cat}
1742 process will see the message appear.
1744 (This example program is based on {\em emlog}, a (real) Linux kernel
1745 module with identical functionality. If you find logring useful, but
1746 want to use it on a system that does not have FUSD, check out the
1747 original
1748 \htmladdnormallinkfoot{emlog}{http://www.circlemud.org/jelson/software/emlog}.)
1754 \section{Implementing {\tt select}able Devices}
1755 \label{selectable}
1757 One important feature that almost every device driver in a system
1758 should have is support for the {\tt select(2)} system call. {\tt
1759 select} allows clients to assemble a set of file descriptors and ask
1760 to be notified when one of them becomes readable or writable. This
1761 simple feature is deceptively powerful---it allows clients to wait for
1762 any number of a set of possible events to occur. This is
1763 fundamentally different than (for example) a blocking read, which only
1764 unblocks on one kind of event. In this section, we'll describe how
1765 FUSD can be used to create a device whose state can be queried by a
1766 client's call to {\tt select(2)}.
1768 This section is limited to a discussion what a FUSD driver writer
1769 needs to know to implement a selectable device. Details of the FUSD
1770 implementation required to support this feature are described in
1771 Section~\ref{poll-diff-implementation}
1774 \subsection{Poll state and the {\tt poll\_diff} callback}
1776 FUSD's implementation of selectable devices depends on the concept of
1777 {\em poll state}. A file descriptor's poll state is a bitmask that
1778 describes its current properties---readable, writable, or exception
1779 raised. These three states correspond to {\tt select(2)}'s three
1780 {\tt fd\_set}s. FUSD has constants used to describe these states:
1781 \begin{itemize}
1782 \item {\tt FUSD\_NOTIFY\_INPUT}---Input is available; a read will not
1783 block.
1784 \item {\tt FUSD\_NOTIFY\_OUTPUT}---Output space is available; a write
1785 will not block.
1786 \item {\tt FUSD\_NOTIFY\_EXCEPT}---An exception has occurred.
1787 \end{itemize}
1789 These constants can be combined with C's bitwise-or operator. For
1790 example, a descriptor that is both readable and writable is expressed
1791 as {\tt FUSD\_NOTIFY\_INPUT | FUSD\_NOTIFY\_OUTPUT}. 0 means a file
1792 descriptor is not readable, not writable, and not in the exception
1793 set.
1795 For a FUSD device to be selectable, its driver must implement a
1796 callback called {\tt poll\_diff}. This callback is very different
1797 than the others; it is not a ``direct line'' between the client and
1798 the driver as is the case with a call such as {\tt ioctl}. A driver's
1799 response to {\tt poll\_diff} is {\em not} the return value seen by a
1800 client's call to {\tt select}. When a client tries to {\tt select} on
1801 a set of file descriptors, the kernel collects the responses from all
1802 the appropriate callbacks---{\tt poll} for file descriptors managed by
1803 kernel drivers, and {\tt poll\_diff} callbacks those managed by FUSD
1804 drivers---and synthesizes all of that information into the return
1805 value seen by the client.
1807 FUSD keeps a cache of the poll state it has most recently received
1808 from each FUSD device driver, initially assumed to be 0. This state
1809 is returned to clients trying to {\tt select()} on devices managed by
1810 those drivers. Under certain conditions, FUSD sends a query to the
1811 driver in order to ensure that the kernel's poll state cache is up to
1812 date. This query takes the form of a {\tt poll\_diff} callback
1813 activation, which is given a single argument: the poll state that FUSD
1814 currently has cached. The driver should consult its internal data
1815 structures to determine the actual, current poll state (i.e., whether
1816 or not buffers have readable data). Then:
1817 \begin{itemize}
1818 \item If the FUSD cache is incorrect (that is, the current true poll
1819 state is different than FUSD's cached state), the current poll state
1820 should be returned immediately.
1821 \item If the FUSD cache is up to date (that is, it matches the real
1822 current state), the callback should save the {\tt fusd\_file\_info}
1823 pointer and return {\tt -FUSD\_NOREPLY}. Later, when the poll
1824 state changes, the driver can call {\tt fusd\_return()} to update
1825 FUSD's cache.
1826 \end{itemize}
1828 In other words, when a driver's {\tt poll\_diff} callback is
1829 activated, the kernel is effectively saying to the driver, ``Here is
1830 what I think the current poll state of this file descriptor is; let me
1831 know when that state {\em changes}.'' The driver can either respond
1832 immediately (if the kernel's cache is already known to be out of
1833 date), or return {\tt -FUSD\_NOREPLY} if no update is immediately
1834 necessary. Later, when the poll state changes (for example, if new
1835 data arrives that makes a device readable), the driver can used its
1836 saved {\tt fusd\_file\_info} pointer to send a poll state update to
1837 the kernel.
1839 When a FUSD driver sends a poll state update, it might (or might not)
1840 have the effect of waking up a client that was blocked in {\tt
1841 select(2)}. On the same note, it's worth reiterating that a {\tt
1842 -FUSD\_NOREPLY} response to a {\tt poll\_diff} callback {\em does not}
1843 necessarily block the client---other descriptors in the client's {\tt
1844 select} set might be readable, for example.
1846 \subsection{Receiving a {\tt poll\_diff} request when the previous one
1847 has not been returned yet}
1848 \label{multiple-polldiffs}
1850 Calls such as {\tt read} and {\tt write} are synchronous from the
1851 standpoint of an individual client---a request is made, and the
1852 requester blocks until a reply is received. This means that there
1853 can't ever be more than a single {\tt read} request outstanding for a
1854 single client at a time. (The driver as a whole may be keeping track
1855 of many outstanding {\tt read} requests in parallel, but no two of them will
1856 be from the same client file descriptor.)
1858 As we mentioned in the previous section, the {\tt poll\_diff} callback
1859 is different from other callbacks. It is not part of a synchronous
1860 request/reply sequence that causes the client to block. It is also an
1861 interface to the {\em kernel}, not directly to the client. So, it
1862 {\em is} possible to receive a {\tt poll\_diff} request while there is
1863 already one outstanding. This happens if the kernel's poll state
1864 cache changes, causing it to notify the driver that it has a new
1865 cached value.
1867 This is easy to handle; the client should simply
1868 \begin{enumerate}
1869 \item Destroy the old (now out-of-date) {\tt poll\_diff} request
1870 using the {\tt fusd\_destroy} function we saw in
1871 Section~\ref{fusd-destroy}.
1872 \item Either respond to or save the new {\tt poll\_diff} request,
1873 exactly as described in the previous section.
1874 \end{enumerate}
1876 The next section will show an example of this technique.
1879 \subsection{Adding {\tt select} support to {\tt pager.c}}
1881 Given the explanation of {\tt poll\_diff} in the previous sections, it
1882 might seem that implementing a selectable device is a daunting task.
1883 It's actually not as bad as it sounds---the example code may well be
1884 shorter than its explanation!
1886 \begin{Program}
1887 \listinginput[5]{1}{pager-polldiff.c.example}
1888 \caption{pager.c (Part 4): Supporting {\tt select(2)} by implementing a
1889 {\tt poll\_diff} callback}
1890 \label{pager-polldiff.c}
1891 \end{Program}
1893 Program~\ref{pager-polldiff.c} shows the implementation of {\tt
1894 poll\_diff} in {\tt pager.c}, which makes its notification interface
1895 ({\tt /dev/pager/notify}) selectable. It is decomposed into a ``top
1896 half'' and ``bottom half'' function, exactly as we did for the
1897 blocking {\tt read} implementation in Program~\ref{pager-read.c}.
1898 First, on lines 1--20, we see the the callback for {\tt poll\_diff}
1899 callback itself. It is virtually identical to the {\tt read} callback
1900 in Program~\ref{pager-read.c}. The main difference is that it first
1901 checks (on line 12) to see if a {\tt poll\_diff} request is already
1902 outstanding when a new request comes in. If so, the out-of-date
1903 request is destroyed using {\tt fusd\_destroy}, as we described in
1904 Section~\ref{multiple-polldiffs}.
1906 The bottom half is shown on lines 22-46. First, on lines 32--35, it
1907 computes the current poll state---if a page has arrived that the
1908 client hasn't seen yet, the file is readable; otherwise, it isn't.
1909 Next, the driver compares the current poll state with the poll state
1910 that the kernel has cached. If the kernel's cache is out of date, the
1911 current state is returned to the kernel. Otherwise, it does nothing.
1913 As with the {\tt read} callback we saw previously, notice that {\tt
1914 pager\_notify\_complete\_polldiff} is called in two different cases:
1915 \begin{enumerate}
1916 \item It is called immediately from the {\tt pager\_notify\_polldiff}
1917 callback itself. This causes the current poll state to be returned to
1918 the kernel immediately when the request arrives if the driver already
1919 knows the kernel's state needs to be updated.
1920 \item It is called when new data arrives that causes the poll state to
1921 change. Refer back to Program~\ref{pager-read.c} on
1922 page~\pageref{pager-read.c}; in the callback that receives new pages,
1923 notice on line 45 that the {\tt poll\_diff} completion function is called
1924 alongside the {\tt read} completion function.
1925 \end{enumerate}
1927 With this {\tt poll\_diff} implementation, it is possible for a client
1928 to open {\tt /dev/pager/notify}, and block in a {\tt select(2)} system
1929 call. If another client writes {\tt page} to {\tt /dev/pager/input},
1930 the first client's {\tt select} will unblock, indicating the file has
1931 become readable.
1933 For additional example code, take a look at the {\tt logring} example
1934 program we first mentioned in Section~\ref{logring}. It also supports
1935 {\tt select} by implementing a similar {\tt poll\_diff} callback.
1937 \section{Performance of User-Space Devices}
1938 \label{performance}
1940 This section hasn't been written yet. I have some pretty graphs and
1941 whatnot, but no time to write about them here before the release.
1943 \section{FUSD Implementation Notes}
1945 In this section, we describe some of the details of how FUSD is
1946 implemented. It's not necessary to understand these details in order
1947 to use FUSD. However, these notes can be useful for people who are
1948 trying to understand the FUSD framework itself---hackers, debuggers,
1949 or the generally curious.
1951 \subsection{The situation with {\tt poll\_diff}}
1952 \label{poll-diff-implementation}
1955 In-kernel device drivers support select by implementing a callback
1956 called {\tt poll}. This driver's callback is supposed to do two
1957 things. First, it should return the current state of a file
1958 descriptor---a combination of being readable, writable, or having
1959 exceptions. Second, it should provide a pointer to one of the
1960 driver's internal wait queues that will be awakened whenever the state
1961 changes. The {\tt poll} call itself should never block---it should
1962 just instantaneously report what the {\em current} state is.
1964 FUSD's implementation of selectable devices is different, but attempts
1965 to maintain three properties that we thought to be most important from
1966 the point of view of a client using {\tt select}. Specifically:
1967 \begin{enumerate}
1968 \item The {\tt select(2)} call itself should never become blocked.
1969 For example, if one file descriptor in its set isn't readable, that
1970 shouldn't prevent it from reporting other file descriptors that are.
1971 \item If {\tt select(2)} indicates a file descriptor is readable (or
1972 writable), a read (or write) on that file descriptor shouldn't block.
1973 \item Clients should be allowed to seamlessly {\tt select} on any set
1974 of file descriptors, even if that set contains a mix of both FUSD and
1975 non-FUSD devices.
1976 \end{enumerate}
1979 The FUSD kernel module keeps a cache of the driver's most recent
1980 answer for each file descriptor, initially assumed to be 0. When the
1981 kernel module's internal {\tt poll} callback is activated, it:
1982 \begin{enumerate}
1983 \item Dispatches a {\em non-}blocking {\tt poll\_diff} to the
1984 associated user-space driver, asking for a cache update---if and only
1985 if there isn't already an outstanding poll diff request out that has
1986 the same value.
1987 \item Immediately returns the cached value to the kernel
1988 \end{enumerate}
1990 In addition, the cached value's readable bit is cleared on every read;
1991 the writable bit is cleared on every write. This is necessary to
1992 prevent old poll state---which says ``device is readable''---from
1993 being returned out of the cache when it might be invalid. FUSD
1994 assumes that any read to a device can make it potentially unreadable.
1995 This mechanism is what causes an updated poll diff to be sent to a
1996 client before the previous one has been returned.
1998 (this section isn't finished yet; fancy time diagrams coming someday)
2000 \subsection{Restartable System Calls}
2002 No time to write this section yet...
2005 \appendix
2007 \section{Using {\tt strace}}
2008 \label{strace}
2010 This section hasn't been written yet. Contributions are welcome.
2012 \end{document}