From 1a72298aa6595b906aed96336ae7feeadabc2507 Mon Sep 17 00:00:00 2001 From: zrj Date: Thu, 28 Jan 2016 17:04:37 +0200 Subject: [PATCH] docs: Prune stale nqnfs papers. No code in DragonFly. --- share/doc/papers/Makefile | 2 +- share/doc/papers/nqnfs/Makefile | 12 - share/doc/papers/nqnfs/nqnfs.me | 2005 --------------------------------------- 3 files changed, 1 insertion(+), 2018 deletions(-) delete mode 100644 share/doc/papers/nqnfs/Makefile delete mode 100644 share/doc/papers/nqnfs/nqnfs.me diff --git a/share/doc/papers/Makefile b/share/doc/papers/Makefile index 53f42b8367..400993946b 100644 --- a/share/doc/papers/Makefile +++ b/share/doc/papers/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/share/doc/papers/Makefile,v 1.7.2.1 2000/08/20 13:16:28 mpp Exp $ SUBDIR= beyond4.3 diskperf fsinterface jail kernmalloc kerntune malloc memfs \ - newvm nqnfs relengr sysperf \ + newvm relengr sysperf \ contents .include diff --git a/share/doc/papers/nqnfs/Makefile b/share/doc/papers/nqnfs/Makefile deleted file mode 100644 index 886d4f7be4..0000000000 --- a/share/doc/papers/nqnfs/Makefile +++ /dev/null @@ -1,12 +0,0 @@ -# From: @(#)Makefile 8.1 (Berkeley) 4/20/94 -# $FreeBSD: src/share/doc/papers/nqnfs/Makefile,v 1.5 1999/08/28 00:18:15 peter Exp $ -# $DragonFly: src/share/doc/papers/nqnfs/Makefile,v 1.2 2003/06/17 04:36:56 dillon Exp $ - -VOLUME= papers -DOC= nqnfs -SRCS= nqnfs.me -MACROS= -me -USE_PIC= yes -USE_TBL= yes - -.include diff --git a/share/doc/papers/nqnfs/nqnfs.me b/share/doc/papers/nqnfs/nqnfs.me deleted file mode 100644 index c84687673a..0000000000 --- a/share/doc/papers/nqnfs/nqnfs.me +++ /dev/null @@ -1,2005 +0,0 @@ -.\" Copyright (c) 1993 The Usenix Association. All rights reserved. -.\" -.\" This document is derived from software contributed to Berkeley by -.\" Rick Macklem at The University of Guelph with the permission of -.\" the Usenix Association. -.\" -.\" Redistribution and use in source and binary forms, with or without -.\" modification, are permitted provided that the following conditions -.\" are met: -.\" 1. Redistributions of source code must retain the above copyright -.\" notice, this list of conditions and the following disclaimer. -.\" 2. Redistributions in binary form must reproduce the above copyright -.\" notice, this list of conditions and the following disclaimer in the -.\" documentation and/or other materials provided with the distribution. -.\" 3. All advertising materials mentioning features or use of this software -.\" must display the following acknowledgement: -.\" This product includes software developed by the University of -.\" California, Berkeley and its contributors. -.\" 4. Neither the name of the University nor the names of its contributors -.\" may be used to endorse or promote products derived from this software -.\" without specific prior written permission. -.\" -.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND -.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE -.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS -.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT -.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF -.\" SUCH DAMAGE. -.\" -.\" @(#)nqnfs.me 8.1 (Berkeley) 4/20/94 -.\" $FreeBSD: src/share/doc/papers/nqnfs/nqnfs.me,v 1.2.6.1 2000/11/30 13:10:34 ru Exp $ -.\" $DragonFly: src/share/doc/papers/nqnfs/nqnfs.me,v 1.2 2003/06/17 04:36:56 dillon Exp $ -.\" -.lp -.nr PS 12 -.ps 12 -Reprinted with permission from the "Proceedings of the Winter 1994 Usenix -Conference", January 1994, San Francisco, CA, Copyright The Usenix -Association. -.nr PS 14 -.ps 14 -.sp -.ce -\fBNot Quite NFS, Soft Cache Consistency for NFS\fR -.nr PS 12 -.ps 12 -.sp -.ce -\fIRick Macklem\fR -.ce -\fIUniversity of Guelph\fR -.sp -.nr PS 12 -.ps 12 -.ce -\fBAbstract\fR -.nr PS 10 -.ps 10 -.pp -There are some constraints inherent in the NFS\(tm\(mo protocol -that result in performance limitations -for high performance -workstation environments. -This paper discusses an NFS-like protocol named Not Quite NFS (NQNFS), -designed to address some of these limitations. -This protocol provides full cache consistency during normal -operation, while permitting more effective client-side caching in an -effort to improve performance. -There are also a variety of minor protocol changes, in order to resolve -various NFS issues. -The emphasis is on observed performance of a -preliminary implementation of the protocol, in order to show -how well this design works -and to suggest possible areas for further improvement. -.sh 1 "Introduction" -.pp -It has been observed that -overall workstation performance has not been scaling with -processor speed and that file system I/O is a limiting factor [Ousterhout90]. -Ousterhout -notes -that a principal challenge for operating system developers is the -decoupling of system calls from their underlying I/O operations, in order -to improve average system call response times. -For distributed file systems, every synchronous Remote Procedure Call (RPC) -takes a minimum of a few milliseconds and, as such, is analogous to an -underlying I/O operation. -This suggests that client caching with a very good -hit ratio for read type operations, along with asynchronous writing, is required in order to avoid delays waiting for RPC replies. -However, the NFS protocol requires that the server be stateless\** -.(f -\**The server must not require any state that may be lost due to a crash, to -function correctly. -.)f -and does not provide any explicit mechanism for client cache -consistency, putting -constraints on how the client may cache data. -This paper describes an NFS-like protocol that includes a cache consistency -component designed to enhance client caching performance. It does provide -full consistency under normal operation, but without requiring that hard -state information be maintained on the server. -Design tradeoffs were made towards simplicity and -high performance over cache consistency under abnormal conditions. -The protocol design uses a variation of Leases [Gray89] -to provide state on the server that does not need to be recovered after a -crash. -.pp -The protocol also includes changes designed to address other limitations -of NFS in a modern workstation environment. -The use of TCP transport is optionally available to avoid -the pitfalls of Sun RPC over UDP transport when running across an internetwork [Nowicki89]. -Kerberos [Steiner88] support is available -to do proper user authentication, in order to provide improved security and -arbitrary client to server user ID mappings. -There are also a variety of other changes to accommodate large file systems, -such as 64bit file sizes and offsets, as well as lifting the 8Kbyte I/O size -limit. -The remainder of this paper gives an overview of the protocol, highlighting -performance related components, followed by an evaluation of resultant performance -for the 4.4BSD implementation. -.sh 1 "Distributed File Systems and Caching" -.pp -Clients using distributed file systems cache recently-used data in order -to reduce the number of synchronous server operations, and therefore improve -average response times for system calls. -Unfortunately, maintaining consistency between these caches is a problem -whenever write sharing occurs; that is, when a process on a client writes -to a file and one or more processes on other client(s) read the file. -If the writer closes the file before any reader(s) open the file for reading, -this is called sequential write sharing. Both the Andrew ITC file system -[Howard88] and NFS [Sandberg85] maintain consistency for sequential write -sharing by requiring the writer to push all the writes through to the -server on close and having readers check to see if the file has been -modified upon open. If the file has been modified, the client throws away -all cached data for that file, as it is now stale. -NFS implementations typically detect file modification by checking a cached -copy of the file's modification time; since this cached value is often -several seconds out of date and only has a resolution of one second, an NFS -client often uses stale cached data for some time after the file has -been updated on the server. -.pp -A more difficult case is concurrent write sharing, where write operations are intermixed -with read operations. -Consistency for this case, often referred to as "full cache consistency," -requires that a reader always receives the most recently written data. -Neither NFS nor the Andrew ITC file system maintain consistency for this -case. -The simplest mechanism for maintaining full cache consistency is the one -used by Sprite [Nelson88], which disables all client caching of the -file whenever concurrent write sharing might occur. -There are other mechanisms described in the literature [Kent87a, -Burrows88], but they appeared to be too elaborate for incorporation -into NQNFS (for example, Kent's requires specialized hardware). -NQNFS differs from Sprite in the way it -detects write sharing. The Sprite server maintains a list of files currently open -by the various clients and detects write sharing when a file open request -for writing is received and the file is already open for reading -(or vice versa). -This list of open files is hard state information that must be recovered -after a server crash, which is a significant problem in its own -right [Mogul93, Welch90]. -.pp -The approach used by NQNFS is a variant of the Leases mechanism [Gray89]. -In this model, the server issues to a client a promise, referred to as a -"lease," that the client may cache a specific object without fear of -conflict. -A lease has a limited duration and must be renewed by the client if it -wishes to continue to cache the object. -In NQNFS, clients hold short-term (up to one minute) leases on files -for reading or writing. -The leases are analogous to entries in the open file list, except that -they expire after the lease term unless renewed by the client. -As such, one minute after issuing the last lease there are no current -leases and therefore no lease records to be recovered after a crash, hence -the term "soft server state." -.pp -A related design consideration is the way client writing is done. -Synchronous writing requires that all writes be pushed through to the server -during the write system call. -This is the simplest variant, from a consistency point of view, since the -server always has the most recently written data. It also permits any write -errors, such as "file system out of space" to be propagated back to the -client's process via the write system call return. -Unfortunately this approach limits the client write rate, based on server write -performance and client/server RPC round trip time (RTT). -.pp -An alternative to this is delayed writing, where the write system call returns -as soon as the data is cached on the client and the data is written to the -server sometime later. -This permits client writing to occur at the rate of local storage access -up to the size of the local cache. -Also, for cases where file truncation/deletion occurs shortly after writing, -the write to the server may be avoided since the data has already been -deleted, reducing server write load. -There are some obvious drawbacks to this approach. -For any Sprite-like system to maintain -full consistency, the server must "callback" to the client to cause the -delayed writes to be written back to the server when write sharing is about to -occur. -There are also problems with the propagation of errors -back to the client process that issued the write system call. -The reason for this is that -the system call has already returned without reporting an error and the -process may also have already terminated. -As well, there is a risk of the loss of recently written data if the client -crashes before the data is written back to the server. -.pp -A compromise between these two alternatives is asynchronous writing, where -the write to the server is initiated during the write system call but the write system -call returns before the write completes. -This approach minimizes the risk of data loss due to a client crash, but negates -the possibility of reducing server write load by throwing writes away when -a file is truncated or deleted. -.pp -NFS implementations usually do a mix of asynchronous and delayed writing -but push all writes to the server upon close, in order to maintain open/close -consistency. -Pushing the delayed writes on close -negates much of the performance advantage of delayed writing, since the -delays that were avoided in the write system calls are observed in the close -system call. -Akin to Sprite, the NQNFS protocol does delayed writing in an effort to achieve -good client performance and uses a callback mechanism to maintain full cache -consistency. -.sh 1 "Related Work" -.pp -There has been a great deal of effort put into improving the performance and -consistency of the NFS protocol. This work can be put in two categories. -The first category are implementation enhancements for the NFS protocol and -the second involve modifications to the protocol. -.pp -The work done on implementation enhancements have attacked two problem areas, -NFS server write performance and RPC transport problems. -Server write performance is a major problem for NFS, in part due to the -requirement to push all writes to the server upon close and in part due -to the fact that, for writes, all data and meta-data must be committed to -non-volatile storage before the server replies to the write RPC. -The Prestoserve\(tm\(dg -[Moran90] -system uses non-volatile RAM as a buffer for recently written data on the server, -so that the write RPC replies can be returned to the client before the data is written to the -disk surface. -Write gathering [Juszczak94] is a software technique used on the server where a write -RPC request is delayed for a short time in the hope that another contiguous -write request will arrive, so that they can be merged into one write operation. -Since the replies to all of the merged writes are not returned to the client until the write -operation is completed, this delay does not violate the protocol. -When write operations are merged, the number of disk writes can be reduced, -improving server write performance. -Although either of the above reduces write RPC response time for the server, -it cannot be reduced to zero, and so, any client side caching mechanism -that reduces write RPC load or client dependence on server RPC response time -should still improve overall performance. -Good client side caching should be complementary to these server techniques, -although client performance improvements as a result of caching may be less -dramatic when these techniques are used. -.pp -In NFS, each Sun RPC request is packaged in a UDP datagram for transmission -to the server. A timer is started, and if a timeout occurs before the corresponding -RPC reply is received, the RPC request is retransmitted. -There are two problems with this model. -First, when a retransmit timeout occurs, the RPC may be redone, instead of -simply retransmitting the RPC request message to the server. A recent-request -cache can be used on the server to minimize the negative impact of redoing -RPCs [Juszczak89]. -The second problem is that a large UDP datagram, such as a read request or -write reply, must be fragmented by IP and if any one IP fragment is lost in -transit, the entire UDP datagram is lost [Kent87]. Since entire requests and replies -are packaged in a single UDP datagram, this puts an upper bound on the read/write -data size (8 kbytes). -.pp -Adjusting the retransmit timeout (RTT) interval dynamically and applying a -congestion window on outstanding requests has been shown to be of some help -[Nowicki89] with the retransmission problem. -An alternative to this is to use TCP transport to delivery the RPC messages -reliably [Macklem90] and one of the performance results in this paper -shows the effects of this further. -.pp -Srinivasan and Mogul [Srinivasan89] enhanced the NFS protocol to use the Sprite cache -consistency algorithm in an effort to improve performance and to provide -full client cache consistency. -This experimental implementation demonstrated significantly better -performance than NFS, but suffered from a lack of crash recovery support. -The NQNFS protocol design borrowed heavily from this work, but differed -from the Sprite algorithm by using Leases instead of file open state -to detect write sharing. -The decision to use Leases was made primarily to avoid the crash recovery -problem. -More recent work by the Sprite group [Baker91] and Mogul [Mogul93] have -addressed the crash recovery problem, making this design tradeoff more -questionable now. -.pp -Sun has recently updated the NFS protocol to Version 3 [SUN93], using some -changes similar to NQNFS to address various issues. The Version 3 protocol -uses 64bit file sizes and offsets, provides a Readdir_and_Lookup RPC and -an access RPC. -It also provides cache hints, to permit a client to be able to determine -whether a file modification is the result of that client's write or some -other client's write. -It would be possible to add either Spritely NFS or NQNFS support for cache -consistency to the NFS Version 3 protocol. -.sh 1 "NQNFS Consistency Protocol and Recovery" -.pp -The NQNFS cache consistency protocol uses a somewhat Sprite-like [Nelson88] -mechanism, but is based on Leases [Gray89] instead of hard server state information -about open files. -The basic principle is that the server disables client caching of files whenever -concurrent write sharing could occur, by performing a server-to-client -callback, -forcing the client to flush its caches and to do all subsequent I/O on the file with -synchronous RPCs. -A Sprite server maintains a record of the open state of files for -all clients and uses this to determine when concurrent write sharing might -occur. -This \fIopen state\fR information might also be referred to as an infinite-term -lease for the file, with explicit lease cancellation. -NQNFS, on the other hand, uses a short-term lease that expires due to timeout -after a maximum of one minute, unless explicitly renewed by the client. -The fundamental difference is that an NQNFS client must keep renewing -a lease to use cached data whereas a Sprite client assumes the data is valid until canceled -by the server -or the file is closed. -Using leases permits the server to remain "stateless," since the soft -state information, which consists of the set of current leases, is -moot after one minute, when all the leases expire. -.pp -Whenever a client wishes to access a file's data it must hold one of -three types of lease: read-caching, write-caching or non-caching. -The latter type requires that all file operations be done synchronously with -the server via the appropriate RPCs. -.pp -A read-caching lease allows for client data caching but no modifications -may be done. -It may, however, be shared between multiple clients. Diagram 1 shows a typical -read-caching scenario. The vertical solid black lines depict the lease records. -Note that the time lines are nowhere near to scale, since a client/server -interaction will normally take less than one hundred milliseconds, whereas the -normal lease duration is thirty seconds. -Every lease includes a \fImodrev\fR value, which changes upon every modification -of the file. It may be used to check to see if data cached on the client is -still current. -.pp -A write-caching lease permits delayed write caching, -but requires that all data be pushed to the server when the lease expires -or is terminated by an eviction callback. -When a write-caching lease has almost expired, the client will attempt to -extend the lease if the file is still open, but is required to push the delayed writes to the server -if renewal fails (as depicted by diagram 2). -The writes may not arrive at the server until after the write lease has -expired on the client, but this does not result in a consistency problem, -so long as the write lease is still valid on the server. -Note that, in diagram 2, the lease record on the server remains current after -the expiry time, due to the conditions mentioned in section 5. -If a write RPC is done on the server after the write lease has expired on -the server, this could be considered an error since consistency could be -lost, but it is not handled as such by NQNFS. -.pp -Diagram 3 depicts how read and write leases are replaced by a non-caching -lease when there is the potential for write sharing. -.(z -.sp -.PS -.ps -.ps 50 -line from 0.738,5.388 to 1.238,5.388 -.ps -.ps 10 -dashwid = 0.050i -line dashed from 1.488,10.075 to 1.488,5.450 -line dashed from 2.987,10.075 to 2.987,5.450 -line dashed from 4.487,10.075 to 4.487,5.450 -.ps -.ps 50 -line from 4.487,7.013 to 4.487,5.950 -line from 2.987,7.700 to 2.987,5.950 to 2.987,6.075 -line from 1.488,7.513 to 1.488,5.950 -line from 2.987,9.700 to 2.987,8.325 -line from 1.488,9.450 to 1.488,8.325 -.ps -.ps 10 -line from 2.987,6.450 to 4.487,6.200 -line from 4.385,6.192 to 4.487,6.200 to 4.393,6.241 -line from 4.487,6.888 to 2.987,6.575 -line from 3.080,6.620 to 2.987,6.575 to 3.090,6.571 -line from 2.987,7.263 to 4.487,7.013 -line from 4.385,7.004 to 4.487,7.013 to 4.393,7.054 -line from 4.487,7.638 to 2.987,7.388 -line from 3.082,7.429 to 2.987,7.388 to 3.090,7.379 -line from 2.987,6.888 to 1.488,6.575 -line from 1.580,6.620 to 1.488,6.575 to 1.590,6.571 -line from 1.488,7.200 to 2.987,6.950 -line from 2.885,6.942 to 2.987,6.950 to 2.893,6.991 -line from 2.987,7.700 to 1.488,7.513 -line from 1.584,7.550 to 1.488,7.513 to 1.590,7.500 -line from 1.488,8.012 to 2.987,7.763 -line from 2.885,7.754 to 2.987,7.763 to 2.893,7.804 -line from 2.987,9.012 to 1.488,8.825 -line from 1.584,8.862 to 1.488,8.825 to 1.590,8.813 -line from 1.488,9.325 to 2.987,9.137 -line from 2.885,9.125 to 2.987,9.137 to 2.891,9.175 -line from 2.987,9.637 to 1.488,9.450 -line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438 -line from 1.488,9.887 to 2.987,9.700 -line from 2.885,9.688 to 2.987,9.700 to 2.891,9.737 -.ps -.ps 12 -.ft -.ft R -"Lease valid on machine" at 1.363,5.296 ljust -"with same modrev" at 1.675,7.421 ljust -"miss)" at 2.612,9.233 ljust -"(cache" at 2.300,9.358 ljust -.ps -.ps 14 -"Diagram #1: Read Caching Leases" at 0.738,5.114 ljust -"Client B" at 4.112,10.176 ljust -"Server" at 2.612,10.176 ljust -"Client A" at 0.925,10.176 ljust -.ps -.ps 12 -"from cache" at 4.675,6.546 ljust -"Read syscalls" at 4.675,6.796 ljust -"Reply" at 3.737,6.108 ljust -"(cache miss)" at 3.675,6.421 ljust -"Read req" at 3.737,6.608 ljust -"to lease" at 3.112,6.796 ljust -"Client B added" at 3.112,6.983 ljust -"Reply" at 3.237,7.296 ljust -"Read + lease req" at 3.175,7.671 ljust -"Read syscall" at 4.675,7.608 ljust -"Reply" at 1.675,6.796 ljust -"miss)" at 2.487,7.108 ljust -"Read req (cache" at 1.675,7.233 ljust -"from cache" at 0.425,6.296 ljust -"Read syscalls" at 0.425,6.546 ljust -"cache" at 0.425,6.858 ljust -"so can still" at 0.425,7.108 ljust -"Modrev same" at 0.425,7.358 ljust -"Reply" at 1.675,7.671 ljust -"Get lease req" at 1.675,8.108 ljust -"Read syscall" at 0.425,7.983 ljust -"Lease times out" at 0.425,8.296 ljust -"from cache" at 0.425,9.046 ljust -"Read syscalls" at 0.425,9.296 ljust -"for Client A" at 3.112,9.296 ljust -"Read caching lease" at 3.112,9.483 ljust -"Reply" at 1.675,8.983 ljust -"Read req" at 1.675,9.358 ljust -"Reply" at 1.675,9.608 ljust -"Read + lease req" at 1.675,9.921 ljust -"Read syscall" at 0.425,9.921 ljust -.ps -.ft -.PE -.sp -.)z -.(z -.sp -.PS -.ps -.ps 50 -line from 1.175,5.700 to 1.300,5.700 -line from 0.738,5.700 to 1.175,5.700 -line from 2.987,6.638 to 2.987,6.075 -.ps -.ps 10 -dashwid = 0.050i -line dashed from 2.987,6.575 to 2.987,5.950 -line dashed from 1.488,6.575 to 1.488,5.888 -.ps -.ps 50 -line from 2.987,9.762 to 2.987,6.638 -line from 1.488,9.450 to 1.488,7.700 -.ps -.ps 10 -line from 2.987,6.763 to 1.488,6.575 -line from 1.584,6.612 to 1.488,6.575 to 1.590,6.563 -line from 1.488,7.013 to 2.987,6.825 -line from 2.885,6.813 to 2.987,6.825 to 2.891,6.862 -line from 2.987,7.325 to 1.488,7.075 -line from 1.582,7.116 to 1.488,7.075 to 1.590,7.067 -line from 1.488,7.700 to 2.987,7.388 -line from 2.885,7.383 to 2.987,7.388 to 2.895,7.432 -line from 2.987,8.575 to 1.488,8.325 -line from 1.582,8.366 to 1.488,8.325 to 1.590,8.317 -line from 1.488,8.887 to 2.987,8.637 -line from 2.885,8.629 to 2.987,8.637 to 2.893,8.679 -line from 2.987,9.637 to 1.488,9.450 -line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438 -line from 1.488,9.887 to 2.987,9.762 -line from 2.886,9.746 to 2.987,9.762 to 2.890,9.796 -line dashed from 2.987,10.012 to 2.987,6.513 -line dashed from 1.488,10.012 to 1.488,6.513 -.ps -.ps 12 -.ft -.ft R -"write" at 4.237,5.921 ljust -"Lease valid on machine" at 1.425,5.733 ljust -.ps -.ps 14 -"Diagram #2: Write Caching Lease" at 0.738,5.551 ljust -"Server" at 2.675,10.114 ljust -"Client A" at 1.113,10.114 ljust -.ps -.ps 12 -"seconds after last" at 3.112,5.921 ljust -"Expires write_slack" at 3.112,6.108 ljust -"due to write activity" at 3.112,6.608 ljust -"Expiry delayed" at 3.112,6.796 ljust -"Lease times out" at 3.112,7.233 ljust -"Lease renewed" at 3.175,8.546 ljust -"Lease for client A" at 3.175,9.358 ljust -"Write caching" at 3.175,9.608 ljust -"Reply" at 1.675,6.733 ljust -"Write req" at 1.988,7.046 ljust -"Reply" at 1.675,7.233 ljust -"Write req" at 1.675,7.796 ljust -"Lease expires" at 0.487,7.733 ljust -"Close syscall" at 0.487,8.108 ljust -"lease granted" at 1.675,8.546 ljust -"Get write lease" at 1.675,8.921 ljust -"before expiry" at 0.487,8.608 ljust -"Lease renewal" at 0.487,8.796 ljust -"syscalls" at 0.487,9.046 ljust -"Delayed write" at 0.487,9.233 ljust -"lease granted" at 1.675,9.608 ljust -"Get write lease req" at 1.675,9.921 ljust -"Write syscall" at 0.487,9.858 ljust -.ps -.ft -.PE -.sp -.)z -.(z -.sp -.PS -.ps -.ps 50 -line from 0.613,2.638 to 1.238,2.638 -line from 1.488,4.075 to 1.488,3.638 -line from 2.987,4.013 to 2.987,3.575 -line from 4.487,4.013 to 4.487,3.575 -.ps -.ps 10 -line from 2.987,3.888 to 4.487,3.700 -line from 4.385,3.688 to 4.487,3.700 to 4.391,3.737 -line from 4.487,4.138 to 2.987,3.950 -line from 3.084,3.987 to 2.987,3.950 to 3.090,3.938 -line from 2.987,4.763 to 4.487,4.450 -line from 4.385,4.446 to 4.487,4.450 to 4.395,4.495 -.ps -.ps 50 -line from 4.487,4.438 to 4.487,4.013 -.ps -.ps 10 -line from 4.487,5.138 to 2.987,4.888 -line from 3.082,4.929 to 2.987,4.888 to 3.090,4.879 -.ps -.ps 50 -line from 4.487,6.513 to 4.487,5.513 -line from 4.487,6.513 to 4.487,6.513 to 4.487,5.513 -line from 2.987,5.450 to 2.987,5.200 -line from 1.488,5.075 to 1.488,4.075 -line from 2.987,5.263 to 2.987,4.013 -line from 2.987,7.700 to 2.987,5.325 -line from 4.487,7.575 to 4.487,6.513 -line from 1.488,8.512 to 1.488,8.075 -line from 2.987,8.637 to 2.987,8.075 -line from 2.987,9.637 to 2.987,8.825 -line from 1.488,9.450 to 1.488,8.950 -.ps -.ps 10 -line from 2.987,4.450 to 1.488,4.263 -line from 1.584,4.300 to 1.488,4.263 to 1.590,4.250 -line from 1.488,4.888 to 2.987,4.575 -line from 2.885,4.571 to 2.987,4.575 to 2.895,4.620 -line from 2.987,5.263 to 1.488,5.075 -line from 1.584,5.112 to 1.488,5.075 to 1.590,5.063 -line from 4.487,5.513 to 2.987,5.325 -line from 3.084,5.362 to 2.987,5.325 to 3.090,5.313 -line from 2.987,5.700 to 4.487,5.575 -line from 4.386,5.558 to 4.487,5.575 to 4.390,5.608 -line from 4.487,6.013 to 2.987,5.825 -line from 3.084,5.862 to 2.987,5.825 to 3.090,5.813 -line from 2.987,6.200 to 4.487,6.075 -line from 4.386,6.058 to 4.487,6.075 to 4.390,6.108 -line from 4.487,6.450 to 2.987,6.263 -line from 3.084,6.300 to 2.987,6.263 to 3.090,6.250 -line from 2.987,6.700 to 4.487,6.513 -line from 4.385,6.500 to 4.487,6.513 to 4.391,6.550 -line from 1.488,6.950 to 2.987,6.763 -line from 2.885,6.750 to 2.987,6.763 to 2.891,6.800 -line from 2.987,7.700 to 4.487,7.575 -line from 4.386,7.558 to 4.487,7.575 to 4.390,7.608 -line from 4.487,7.950 to 2.987,7.763 -line from 3.084,7.800 to 2.987,7.763 to 3.090,7.750 -line from 2.987,8.637 to 1.488,8.512 -line from 1.585,8.546 to 1.488,8.512 to 1.589,8.496 -line from 1.488,8.887 to 2.987,8.700 -line from 2.885,8.688 to 2.987,8.700 to 2.891,8.737 -line from 2.987,9.637 to 1.488,9.450 -line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438 -line from 1.488,9.950 to 2.987,9.762 -line from 2.885,9.750 to 2.987,9.762 to 2.891,9.800 -dashwid = 0.050i -line dashed from 4.487,10.137 to 4.487,2.825 -line dashed from 2.987,10.137 to 2.987,2.825 -line dashed from 1.488,10.137 to 1.488,2.825 -.ps -.ps 12 -.ft -.ft R -"(not cached)" at 4.612,3.858 ljust -.ps -.ps 14 -"Diagram #3: Write sharing case" at 0.613,2.239 ljust -.ps -.ps 12 -"Write syscall" at 4.675,7.546 ljust -"Read syscall" at 0.550,9.921 ljust -.ps -.ps 14 -"Lease valid on machine" at 1.363,2.551 ljust -.ps -.ps 12 -"(can still cache)" at 1.675,8.171 ljust -"Reply" at 3.800,3.858 ljust -"Write" at 3.175,4.046 ljust -"writes" at 4.612,4.046 ljust -"synchronous" at 4.612,4.233 ljust -"write syscall" at 4.675,5.108 ljust -"non-caching lease" at 3.175,4.296 ljust -"Reply " at 3.175,4.483 ljust -"req" at 3.175,4.983 ljust -"Get write lease" at 3.175,5.108 ljust -"Vacated msg" at 3.175,5.483 ljust -"to the server" at 4.675,5.858 ljust -"being flushed to" at 4.675,6.046 ljust -"Delayed writes" at 4.675,6.233 ljust -.ps -.ps 16 -"Server" at 2.675,10.182 ljust -"Client B" at 3.925,10.182 ljust -"Client A" at 0.863,10.182 ljust -.ps -.ps 12 -"(not cached)" at 0.550,4.733 ljust -"Read data" at 0.550,4.921 ljust -"Reply data" at 1.675,4.421 ljust -"Read request" at 1.675,4.921 ljust -"lease" at 1.675,5.233 ljust -"Reply non-caching" at 1.675,5.421 ljust -"Reply" at 3.737,5.733 ljust -"Write" at 3.175,5.983 ljust -"Reply" at 3.737,6.171 ljust -"Write" at 3.175,6.421 ljust -"Eviction Notice" at 3.175,6.796 ljust -"Get read lease" at 1.675,7.046 ljust -"Read syscall" at 0.550,6.983 ljust -"being cached" at 4.675,7.171 ljust -"Delayed writes" at 4.675,7.358 ljust -"lease" at 3.175,7.233 ljust -"Reply write caching" at 3.175,7.421 ljust -"Get write lease" at 3.175,7.983 ljust -"Write syscall" at 4.675,7.983 ljust -"with same modrev" at 1.675,8.358 ljust -"Lease" at 0.550,8.171 ljust -"Renewed" at 0.550,8.358 ljust -"Reply" at 1.675,8.608 ljust -"Get Lease Request" at 1.675,8.983 ljust -"Read syscall" at 0.550,8.733 ljust -"from cache" at 0.550,9.108 ljust -"Read syscall" at 0.550,9.296 ljust -"Reply " at 1.675,9.671 ljust -"plus lease" at 2.050,9.983 ljust -"Read Request" at 1.675,10.108 ljust -.ps -.ft -.PE -.sp -.)z -A write-caching lease is not used in the Stanford V Distributed System [Gray89], -since synchronous writing is always used. A side effect of this change -is that the five to ten second lease duration recommended by Gray was found -to be insufficient to achieve good performance for the write-caching lease. -Experimentation showed that thirty seconds was about optimal for cases where -the client and server are connected to the same local area network, so -thirty seconds is the default lease duration for NQNFS. -A maximum of twice that value is permitted, since Gray showed that for some -network topologies, a larger lease duration functions better. -Although there is an explicit get_lease RPC defined for the protocol, -most lease requests are piggybacked onto the other RPCs to minimize the -additional overhead introduced by leasing. -.sh 2 "Rationale" -.pp -Leasing was chosen over hard server state information for the following -reasons: -.ip 1. -The server must maintain state information about all current -client leases. -Since at most one lease is allocated for each RPC and the leases expire -after their lease term, -the upper bound on the number of current leases is the product of the -lease term and the server RPC rate. -In practice, it has been observed that less than 10% of RPCs request new leases -and since most leases have a term of thirty seconds, the following rule of -thumb should estimate the number of server lease records: -.sp -.nf - Number of Server Lease Records \(eq 0.1 * 30 * RPC rate -.fi -.sp -Since each lease record occupies 64 bytes of server memory, storing the lease -records should not be a serious problem. -If a server has exhausted lease storage, it can simply wait a few seconds -for a lease to expire and free up a record. -On the other hand, a Sprite-like server must store records for all files -currently open by all clients, which can require significant storage for -a large, heavily loaded server. -In [Mogul93], it is proposed that a mechanism vaguely similar to paging could be -used to deal with this for Spritely NFS, but this -appears to introduce a fair amount of complexity and may limit the -usefulness of open records for storing other state information, such -as file locks. -.ip 2. -After a server crashes it must recover lease records for -the current outstanding leases, which actually implies that if it waits -until all leases have expired, there is no state to recover. -The server must wait for the maximum lease duration of one minute, and it must serve -all outstanding write requests resulting from terminated write-caching -leases before issuing new leases. The one minute delay can be overlapped with -file system consistency checking (eg. fsck). -Because no state must be recovered, a lease-based server, like an NFS server, -avoids the problem of state recovery after a crash. -.sp -There can, however, be problems during crash recovery -because of a potentially large number of write backs due to terminated -write-caching leases. -One of these problems is a "recovery storm" [Baker91], which could occur when -the server is overloaded by the number of write RPC requests. -The NQNFS protocol deals with this by replying -with a return status code called -try_again_later to all -RPC requests (except write) until the write requests subside. -At this time, there has not been sufficient testing of server crash -recovery while under heavy server load to determine if the try_again_later -reply is a sufficient solution to the problem. -The other problem is that consistency will be lost if other RPCs are performed -before all of the write backs for terminated write-caching leases have completed. -This is handled by only performing write RPCs until -no write RPC requests arrive -for write_slack seconds, where write_slack is set to several times -the client timeout retransmit interval, -at which time it is assumed all clients have had an opportunity to send their writes -to the server. -.ip 3. -Another advantage of leasing is that, since leases are required at times when other I/O operations occur, -lease requests can almost always be piggybacked on other RPCs, avoiding some of the -overhead associated with the explicit open and close RPCs required by a Sprite-like system. -Compared with Sprite cache consistency, -this can result in a significantly lower RPC load (see table #1). -.sh 1 "Limitations of the NQNFS Protocol" -.pp -There is a serious risk when leasing is used for delayed write -caching. -If the server is simply too busy to service a lease renewal before a write-caching -lease terminates, the client will not be able to push the write -data to the server before the lease has terminated, resulting in -inconsistency. -Note that the danger of inconsistency occurs when the server assumes that -a write-caching lease has terminated before the client has -had the opportunity to write the data back to the server. -In an effort to avoid this problem, the NQNFS server does not assume that -a write-caching lease has terminated until three conditions are met: -.sp -.(l -1 - clock time > (expiry time + clock skew) -2 - there is at least one server daemon (nfsd) waiting for an RPC request -3 - no write RPCs received for leased file within write_slack after the corrected expiry time -.)l -.lp -The first condition ensures that the lease has expired on the client. -The clock_skew, by default three seconds, must be -set to a value larger than the maximum time-of-day clock error that is likely to occur -during the maximum lease duration. -The second condition attempts to ensure that the client -is not waiting for replies to any writes that are still queued for service by -an nfsd. The third condition tries to guarantee that the client has -transmitted all write requests to the server, since write_slack is set to -several times the client's timeout retransmit interval. -.pp -There are also certain file system semantics that are problematic for both NFS and NQNFS, -due to the -lack of state information maintained by the -server. If a file is unlinked on one client while open on another it will -be removed from the file server, resulting in failed file accesses on the -client that has the file open. -If the file system on the server is out of space or the client user's disk -quota has been exceeded, a delayed write can fail long after the write system -call was successfully completed. -With NFS this error will be detected by the close system call, since -the delayed writes are pushed upon close. With NQNFS however, the delayed write -RPC may not occur until after the close system call, possibly even after the process -has exited. -Therefore, -if a process must check for write errors, -a system call such as \fIfsync\fR must be used. -.pp -Another problem occurs when a process on one client is -running an executable file -and a process on another client starts to write to the file. The read lease on -the first client is terminated by the server, but the client has no recourse but -to terminate the process, since the process is already in progress on the old -executable. -.pp -The NQNFS protocol does not support file locking, since a file lock would have -to involve hard, recovered after a crash, state information. -.sh 1 "Other NQNFS Protocol Features" -.pp -NQNFS also includes a variety of minor modifications to the NFS protocol, in an -attempt to address various limitations. -The protocol uses 64bit file sizes and offsets in order to handle large files. -TCP transport may be used as an alternative to UDP -for cases where UDP does not perform well. -Transport mechanisms -such as TCP also permit the use of much larger read/write data sizes, -which might improve performance in certain environments. -.pp -The NQNFS protocol replaces the Readdir RPC with a Readdir_and_Lookup -RPC that returns the file handle and attributes for each file in the -directory as well as name and file id number. -This additional information may then be loaded into the lookup and file-attribute -caches on the client. -Thus, for cases such as "ls -l", the \fIstat\fR system calls can be performed -locally without doing any lookup or getattr RPCs. -Another additional RPC is the Access RPC that checks for file -accessibility against the server. This is necessary since in some cases the -client user ID is mapped to a different user on the server and doing the -access check locally on the client using file attributes and client credentials is -not correct. -One case where this becomes necessary is when the NQNFS mount point is using -Kerberos authentication, where the Kerberos authentication ticket is translated -to credentials on the server that are mapped to the client side user id. -For further details on the protocol, see [Macklem93]. -.sh 1 "Performance" -.pp -In order to evaluate the effectiveness of the NQNFS protocol, -a benchmark was used that was -designed to typify -real work on the client workstation. -Benchmarks, such as Laddis [Wittle93], that perform server load characterization -are not appropriate for this work, since it is primarily client caching -efficiency that needs to be evaluated. -Since these tests are measuring overall client system performance and -not just the performance of the file system, -each sequence of runs was performed on identical hardware and operating system in order to factor out the system -components affecting performance other than the file system protocol. -.pp -The equipment used for the all the benchmarks are members of the DECstation\(tm\(dg -family of workstations using the MIPS\(tm\(sc RISC architecture. -The operating system running on these systems was a pre-release version of -4.4BSD Unix\(tm\(dd. -For all benchmarks, the file server was a DECstation 2100 (10 MIPS) with 8Mbytes of -memory and a local RZ23 SCSI disk (27msec average access time). -The clients range in speed from DECstation 2100s -to a DECstation 5000/25, and always run with six block I/O daemons -and a 4Mbyte buffer cache, except for the test runs where the -buffer cache size was the independent variable. -In all cases /tmp is mounted on the local SCSI disk\**, all machines were -attached to the same uncongested Ethernet, and ran in single user mode during the benchmarks. -.(f -\**Testing using the 4.4BSD MFS [McKusick90] resulted in slightly degraded performance, -probably since the machines only had 16Mbytes of memory, and so paging -increased. -.)f -Unless noted otherwise, test runs used UDP RPC transport -and the results given are the average values of four runs. -.pp -The benchmark used is the Modified Andrew Benchmark (MAB) -[Ousterhout90], -which is a slightly modified version of the benchmark used to characterize -performance of the Andrew ITC file system [Howard88]. -The MAB was set up with the executable binaries in the remote mounted file -system and the final load step was commented out, due to a linkage problem -during testing under 4.4BSD. -Therefore, these results are not directly comparable to other reported MAB -results. -The MAB is made up of five distinct phases: -.sp -.ip "1." 10 -Makes five directories (no significant cost) -.ip "2." 10 -Copy a file system subtree to a working directory -.ip "3." 10 -Get file attributes (stat) of all the working files -.ip "4." 10 -Search for strings (grep) in the files -.ip "5." 10 -Compile a library of C sources and archive them -.lp -Of the five phases, the fifth is by far the largest and is the one affected most -by client caching mechanisms. -The results for phase #1 are invariant over all -the caching mechanisms. -.sh 2 "Buffer Cache Size Tests" -.pp -The first experiment was done to see what effect changing the size of the -buffer cache would have on client performance. A single DECstation 5000/25 -was used to do a series of runs of MAB with different buffer cache sizes -for four variations of the file system protocol. The four variations are -as follows: -.ip "Case 1:" 10 -NFS - The NFS protocol as implemented in 4.4BSD -.ip "Case 2:" 10 -Leases - The NQNFS protocol using leases for cache consistency -.ip "Case 3:" 10 -Leases, Rdirlookup - The NQNFS protocol using leases for cache consistency -and with the readdir RPC replaced by Readdir_and_Lookup -.ip "Case 4:" 10 -Leases, Attrib leases, Rdirlookup - The NQNFS protocol using leases for -cache consistency, with the readdir -RPC replaced by the Readdir_and_Lookup, -and requiring a valid lease not only for file-data access, but also for file-attribute access. -.lp -As can be seen in figure 1, the buffer cache achieves about optimal -performance for the range of two to ten megabytes in size. At eleven -megabytes in size, the system pages heavily and the runs did not -complete in a reasonable time. Even at 64Kbytes, the buffer cache improves -performance over no buffer cache by a significant margin of 136-148 seconds -versus 239 seconds. -This may be due, in part, to the fact that the Compile Phase of the MAB -uses a rather small working set of file data. -All variants of NQNFS achieve about -the same performance, running around 30% faster than NFS, with a slightly -larger difference for large buffer cache sizes. -Based on these results, all remaining tests were run with the buffer cache -size set to 4Mbytes. -Although I do not know what causes the local peak in the curves between 0.5 and 2 megabytes, -there is some indication that contention for buffer cache blocks, between the update process -(which pushes delayed writes to the server every thirty seconds) and the I/O -system calls, may be involved. -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.188 to 0.963,8.188 -line from 4.787,8.188 to 4.725,8.188 -line from 0.900,8.488 to 0.963,8.488 -line from 4.787,8.488 to 4.725,8.488 -line from 0.900,8.775 to 0.963,8.775 -line from 4.787,8.775 to 4.725,8.775 -line from 0.900,9.075 to 0.963,9.075 -line from 4.787,9.075 to 4.725,9.075 -line from 0.900,9.375 to 0.963,9.375 -line from 4.787,9.375 to 4.725,9.375 -line from 0.900,9.675 to 0.963,9.675 -line from 4.787,9.675 to 4.725,9.675 -line from 0.900,9.963 to 0.963,9.963 -line from 4.787,9.963 to 4.725,9.963 -line from 0.900,10.262 to 0.963,10.262 -line from 4.787,10.262 to 4.725,10.262 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.613,7.888 to 1.613,7.950 -line from 1.613,10.262 to 1.613,10.200 -line from 2.312,7.888 to 2.312,7.950 -line from 2.312,10.262 to 2.312,10.200 -line from 3.025,7.888 to 3.025,7.950 -line from 3.025,10.262 to 3.025,10.200 -line from 3.725,7.888 to 3.725,7.950 -line from 3.725,10.262 to 3.725,10.200 -line from 4.438,7.888 to 4.438,7.950 -line from 4.438,10.262 to 4.438,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 3.800,8.775 to 4.025,8.775 -line from 0.925,10.088 to 0.925,10.088 -line from 0.925,10.088 to 0.938,9.812 -line from 0.938,9.812 to 0.988,9.825 -line from 0.988,9.825 to 1.075,9.838 -line from 1.075,9.838 to 1.163,9.938 -line from 1.163,9.938 to 1.250,9.838 -line from 1.250,9.838 to 1.613,9.825 -line from 1.613,9.825 to 2.312,9.750 -line from 2.312,9.750 to 3.025,9.713 -line from 3.025,9.713 to 3.725,9.850 -line from 3.725,9.850 to 4.438,9.875 -dashwid = 0.037i -line dotted from 3.800,8.625 to 4.025,8.625 -line dotted from 0.925,9.912 to 0.925,9.912 -line dotted from 0.925,9.912 to 0.938,9.887 -line dotted from 0.938,9.887 to 0.988,9.713 -line dotted from 0.988,9.713 to 1.075,9.562 -line dotted from 1.075,9.562 to 1.163,9.562 -line dotted from 1.163,9.562 to 1.250,9.562 -line dotted from 1.250,9.562 to 1.613,9.675 -line dotted from 1.613,9.675 to 2.312,9.363 -line dotted from 2.312,9.363 to 3.025,9.375 -line dotted from 3.025,9.375 to 3.725,9.387 -line dotted from 3.725,9.387 to 4.438,9.450 -line dashed from 3.800,8.475 to 4.025,8.475 -line dashed from 0.925,10.000 to 0.925,10.000 -line dashed from 0.925,10.000 to 0.938,9.787 -line dashed from 0.938,9.787 to 0.988,9.650 -line dashed from 0.988,9.650 to 1.075,9.537 -line dashed from 1.075,9.537 to 1.163,9.613 -line dashed from 1.163,9.613 to 1.250,9.800 -line dashed from 1.250,9.800 to 1.613,9.488 -line dashed from 1.613,9.488 to 2.312,9.375 -line dashed from 2.312,9.375 to 3.025,9.363 -line dashed from 3.025,9.363 to 3.725,9.325 -line dashed from 3.725,9.325 to 4.438,9.438 -dashwid = 0.075i -line dotted from 3.800,8.325 to 4.025,8.325 -line dotted from 0.925,9.963 to 0.925,9.963 -line dotted from 0.925,9.963 to 0.938,9.750 -line dotted from 0.938,9.750 to 0.988,9.662 -line dotted from 0.988,9.662 to 1.075,9.613 -line dotted from 1.075,9.613 to 1.163,9.613 -line dotted from 1.163,9.613 to 1.250,9.700 -line dotted from 1.250,9.700 to 1.613,9.438 -line dotted from 1.613,9.438 to 2.312,9.463 -line dotted from 2.312,9.463 to 3.025,9.312 -line dotted from 3.025,9.312 to 3.725,9.387 -line dotted from 3.725,9.387 to 4.438,9.425 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"20" at 0.825,8.110 rjust -"40" at 0.825,8.410 rjust -"60" at 0.825,8.697 rjust -"80" at 0.825,8.997 rjust -"100" at 0.825,9.297 rjust -"120" at 0.825,9.597 rjust -"140" at 0.825,9.885 rjust -"160" at 0.825,10.185 rjust -"0" at 0.900,7.660 -"2" at 1.613,7.660 -"4" at 2.312,7.660 -"6" at 3.025,7.660 -"8" at 3.725,7.660 -"10" at 4.438,7.660 -"Time (sec)" at 0.150,8.997 -"Buffer Cache Size (MBytes)" at 2.837,7.510 -"Figure #1: MAB Phase 5 (compile)" at 2.837,10.335 -"NFS" at 3.725,8.697 rjust -"Leases" at 3.725,8.547 rjust -"Leases, Rdirlookup" at 3.725,8.397 rjust -"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust -.ps -.ft -.PE -.)z -.sh 2 "Multiple Client Load Tests" -.pp -During preliminary runs of the MAB, it was observed that the server RPC -counts were reduced significantly by NQNFS as compared to NFS (table 1). -(Spritely NFS and Ultrix\(tm4.3/NFS numbers were taken from [Mogul93] -and are not directly comparable, due to numerous differences in the -experimental setup including deletion of the load step from phase 5.) -This suggests -that the NQNFS protocol might scale better with -respect to the number of clients accessing the server. -The experiment described in this section -ran the MAB on from one to ten clients concurrently, to observe the -effects of heavier server load. -The clients were started at roughly the same time by pressing all the - keys together and, although not synchronized beyond that point, -all clients would finish the test run within about two seconds of each -other. -This was not a realistic load of N active clients, but it did -result in a reproducible increasing client load on the server. -The results for the four variants -are plotted in figures 2-5. -.(z -.ps -1 -.TS -box, center; -c s s s s s s s -c c c c c c c c -l | n n n n n n n. -Table #1: MAB RPC Counts -RPC Getattr Read Write Lookup Other GetLease/Open-Close Total -_ -BSD/NQNFS 277 139 306 575 294 127 1718 -BSD/NFS 1210 506 451 489 238 0 2894 -Spritely NFS 259 836 192 535 306 1467 3595 -Ultrix4.3/NFS 1225 1186 476 810 305 0 4002 -.TE -.ps -.)z -.pp -For the MAB benchmark, the NQNFS protocol reduces the RPC counts significantly, -but with a minimum of extra overhead (the GetLease/Open-Close count). -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.225 to 0.963,8.225 -line from 4.787,8.225 to 4.725,8.225 -line from 0.900,8.562 to 0.963,8.562 -line from 4.787,8.562 to 4.725,8.562 -line from 0.900,8.900 to 0.963,8.900 -line from 4.787,8.900 to 4.725,8.900 -line from 0.900,9.250 to 0.963,9.250 -line from 4.787,9.250 to 4.725,9.250 -line from 0.900,9.588 to 0.963,9.588 -line from 4.787,9.588 to 4.725,9.588 -line from 0.900,9.925 to 0.963,9.925 -line from 4.787,9.925 to 4.725,9.925 -line from 0.900,10.262 to 0.963,10.262 -line from 4.787,10.262 to 4.725,10.262 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.613,7.888 to 1.613,7.950 -line from 1.613,10.262 to 1.613,10.200 -line from 2.312,7.888 to 2.312,7.950 -line from 2.312,10.262 to 2.312,10.200 -line from 3.025,7.888 to 3.025,7.950 -line from 3.025,10.262 to 3.025,10.200 -line from 3.725,7.888 to 3.725,7.950 -line from 3.725,10.262 to 3.725,10.200 -line from 4.438,7.888 to 4.438,7.950 -line from 4.438,10.262 to 4.438,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 3.800,8.900 to 4.025,8.900 -line from 1.250,8.325 to 1.250,8.325 -line from 1.250,8.325 to 1.613,8.500 -line from 1.613,8.500 to 2.312,8.825 -line from 2.312,8.825 to 3.025,9.175 -line from 3.025,9.175 to 3.725,9.613 -line from 3.725,9.613 to 4.438,10.012 -dashwid = 0.037i -line dotted from 3.800,8.750 to 4.025,8.750 -line dotted from 1.250,8.275 to 1.250,8.275 -line dotted from 1.250,8.275 to 1.613,8.412 -line dotted from 1.613,8.412 to 2.312,8.562 -line dotted from 2.312,8.562 to 3.025,9.088 -line dotted from 3.025,9.088 to 3.725,9.375 -line dotted from 3.725,9.375 to 4.438,10.000 -line dashed from 3.800,8.600 to 4.025,8.600 -line dashed from 1.250,8.250 to 1.250,8.250 -line dashed from 1.250,8.250 to 1.613,8.438 -line dashed from 1.613,8.438 to 2.312,8.637 -line dashed from 2.312,8.637 to 3.025,9.088 -line dashed from 3.025,9.088 to 3.725,9.525 -line dashed from 3.725,9.525 to 4.438,10.075 -dashwid = 0.075i -line dotted from 3.800,8.450 to 4.025,8.450 -line dotted from 1.250,8.262 to 1.250,8.262 -line dotted from 1.250,8.262 to 1.613,8.425 -line dotted from 1.613,8.425 to 2.312,8.613 -line dotted from 2.312,8.613 to 3.025,9.137 -line dotted from 3.025,9.137 to 3.725,9.512 -line dotted from 3.725,9.512 to 4.438,9.988 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"20" at 0.825,8.147 rjust -"40" at 0.825,8.485 rjust -"60" at 0.825,8.822 rjust -"80" at 0.825,9.172 rjust -"100" at 0.825,9.510 rjust -"120" at 0.825,9.847 rjust -"140" at 0.825,10.185 rjust -"0" at 0.900,7.660 -"2" at 1.613,7.660 -"4" at 2.312,7.660 -"6" at 3.025,7.660 -"8" at 3.725,7.660 -"10" at 4.438,7.660 -"Time (sec)" at 0.150,8.997 -"Number of Clients" at 2.837,7.510 -"Figure #2: MAB Phase 2 (copying)" at 2.837,10.335 -"NFS" at 3.725,8.822 rjust -"Leases" at 3.725,8.672 rjust -"Leases, Rdirlookup" at 3.725,8.522 rjust -"Leases, Attrib leases, Rdirlookup" at 3.725,8.372 rjust -.ps -.ft -.PE -.)z -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.188 to 0.963,8.188 -line from 4.787,8.188 to 4.725,8.188 -line from 0.900,8.488 to 0.963,8.488 -line from 4.787,8.488 to 4.725,8.488 -line from 0.900,8.775 to 0.963,8.775 -line from 4.787,8.775 to 4.725,8.775 -line from 0.900,9.075 to 0.963,9.075 -line from 4.787,9.075 to 4.725,9.075 -line from 0.900,9.375 to 0.963,9.375 -line from 4.787,9.375 to 4.725,9.375 -line from 0.900,9.675 to 0.963,9.675 -line from 4.787,9.675 to 4.725,9.675 -line from 0.900,9.963 to 0.963,9.963 -line from 4.787,9.963 to 4.725,9.963 -line from 0.900,10.262 to 0.963,10.262 -line from 4.787,10.262 to 4.725,10.262 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.613,7.888 to 1.613,7.950 -line from 1.613,10.262 to 1.613,10.200 -line from 2.312,7.888 to 2.312,7.950 -line from 2.312,10.262 to 2.312,10.200 -line from 3.025,7.888 to 3.025,7.950 -line from 3.025,10.262 to 3.025,10.200 -line from 3.725,7.888 to 3.725,7.950 -line from 3.725,10.262 to 3.725,10.200 -line from 4.438,7.888 to 4.438,7.950 -line from 4.438,10.262 to 4.438,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 3.800,8.775 to 4.025,8.775 -line from 1.250,8.975 to 1.250,8.975 -line from 1.250,8.975 to 1.613,8.963 -line from 1.613,8.963 to 2.312,8.988 -line from 2.312,8.988 to 3.025,9.037 -line from 3.025,9.037 to 3.725,9.062 -line from 3.725,9.062 to 4.438,9.100 -dashwid = 0.037i -line dotted from 3.800,8.625 to 4.025,8.625 -line dotted from 1.250,9.312 to 1.250,9.312 -line dotted from 1.250,9.312 to 1.613,9.287 -line dotted from 1.613,9.287 to 2.312,9.675 -line dotted from 2.312,9.675 to 3.025,9.262 -line dotted from 3.025,9.262 to 3.725,9.738 -line dotted from 3.725,9.738 to 4.438,9.512 -line dashed from 3.800,8.475 to 4.025,8.475 -line dashed from 1.250,9.400 to 1.250,9.400 -line dashed from 1.250,9.400 to 1.613,9.287 -line dashed from 1.613,9.287 to 2.312,9.575 -line dashed from 2.312,9.575 to 3.025,9.300 -line dashed from 3.025,9.300 to 3.725,9.613 -line dashed from 3.725,9.613 to 4.438,9.512 -dashwid = 0.075i -line dotted from 3.800,8.325 to 4.025,8.325 -line dotted from 1.250,9.400 to 1.250,9.400 -line dotted from 1.250,9.400 to 1.613,9.412 -line dotted from 1.613,9.412 to 2.312,9.700 -line dotted from 2.312,9.700 to 3.025,9.537 -line dotted from 3.025,9.537 to 3.725,9.938 -line dotted from 3.725,9.938 to 4.438,9.812 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"5" at 0.825,8.110 rjust -"10" at 0.825,8.410 rjust -"15" at 0.825,8.697 rjust -"20" at 0.825,8.997 rjust -"25" at 0.825,9.297 rjust -"30" at 0.825,9.597 rjust -"35" at 0.825,9.885 rjust -"40" at 0.825,10.185 rjust -"0" at 0.900,7.660 -"2" at 1.613,7.660 -"4" at 2.312,7.660 -"6" at 3.025,7.660 -"8" at 3.725,7.660 -"10" at 4.438,7.660 -"Time (sec)" at 0.150,8.997 -"Number of Clients" at 2.837,7.510 -"Figure #3: MAB Phase 3 (stat/find)" at 2.837,10.335 -"NFS" at 3.725,8.697 rjust -"Leases" at 3.725,8.547 rjust -"Leases, Rdirlookup" at 3.725,8.397 rjust -"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust -.ps -.ft -.PE -.)z -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.188 to 0.963,8.188 -line from 4.787,8.188 to 4.725,8.188 -line from 0.900,8.488 to 0.963,8.488 -line from 4.787,8.488 to 4.725,8.488 -line from 0.900,8.775 to 0.963,8.775 -line from 4.787,8.775 to 4.725,8.775 -line from 0.900,9.075 to 0.963,9.075 -line from 4.787,9.075 to 4.725,9.075 -line from 0.900,9.375 to 0.963,9.375 -line from 4.787,9.375 to 4.725,9.375 -line from 0.900,9.675 to 0.963,9.675 -line from 4.787,9.675 to 4.725,9.675 -line from 0.900,9.963 to 0.963,9.963 -line from 4.787,9.963 to 4.725,9.963 -line from 0.900,10.262 to 0.963,10.262 -line from 4.787,10.262 to 4.725,10.262 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.613,7.888 to 1.613,7.950 -line from 1.613,10.262 to 1.613,10.200 -line from 2.312,7.888 to 2.312,7.950 -line from 2.312,10.262 to 2.312,10.200 -line from 3.025,7.888 to 3.025,7.950 -line from 3.025,10.262 to 3.025,10.200 -line from 3.725,7.888 to 3.725,7.950 -line from 3.725,10.262 to 3.725,10.200 -line from 4.438,7.888 to 4.438,7.950 -line from 4.438,10.262 to 4.438,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 3.800,8.775 to 4.025,8.775 -line from 1.250,9.412 to 1.250,9.412 -line from 1.250,9.412 to 1.613,9.425 -line from 1.613,9.425 to 2.312,9.463 -line from 2.312,9.463 to 3.025,9.600 -line from 3.025,9.600 to 3.725,9.875 -line from 3.725,9.875 to 4.438,10.075 -dashwid = 0.037i -line dotted from 3.800,8.625 to 4.025,8.625 -line dotted from 1.250,9.450 to 1.250,9.450 -line dotted from 1.250,9.450 to 1.613,9.438 -line dotted from 1.613,9.438 to 2.312,9.438 -line dotted from 2.312,9.438 to 3.025,9.525 -line dotted from 3.025,9.525 to 3.725,9.550 -line dotted from 3.725,9.550 to 4.438,9.662 -line dashed from 3.800,8.475 to 4.025,8.475 -line dashed from 1.250,9.438 to 1.250,9.438 -line dashed from 1.250,9.438 to 1.613,9.412 -line dashed from 1.613,9.412 to 2.312,9.450 -line dashed from 2.312,9.450 to 3.025,9.500 -line dashed from 3.025,9.500 to 3.725,9.613 -line dashed from 3.725,9.613 to 4.438,9.675 -dashwid = 0.075i -line dotted from 3.800,8.325 to 4.025,8.325 -line dotted from 1.250,9.387 to 1.250,9.387 -line dotted from 1.250,9.387 to 1.613,9.600 -line dotted from 1.613,9.600 to 2.312,9.625 -line dotted from 2.312,9.625 to 3.025,9.738 -line dotted from 3.025,9.738 to 3.725,9.850 -line dotted from 3.725,9.850 to 4.438,9.800 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"5" at 0.825,8.110 rjust -"10" at 0.825,8.410 rjust -"15" at 0.825,8.697 rjust -"20" at 0.825,8.997 rjust -"25" at 0.825,9.297 rjust -"30" at 0.825,9.597 rjust -"35" at 0.825,9.885 rjust -"40" at 0.825,10.185 rjust -"0" at 0.900,7.660 -"2" at 1.613,7.660 -"4" at 2.312,7.660 -"6" at 3.025,7.660 -"8" at 3.725,7.660 -"10" at 4.438,7.660 -"Time (sec)" at 0.150,8.997 -"Number of Clients" at 2.837,7.510 -"Figure #4: MAB Phase 4 (grep/wc/find)" at 2.837,10.335 -"NFS" at 3.725,8.697 rjust -"Leases" at 3.725,8.547 rjust -"Leases, Rdirlookup" at 3.725,8.397 rjust -"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust -.ps -.ft -.PE -.)z -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.150 to 0.963,8.150 -line from 4.787,8.150 to 4.725,8.150 -line from 0.900,8.412 to 0.963,8.412 -line from 4.787,8.412 to 4.725,8.412 -line from 0.900,8.675 to 0.963,8.675 -line from 4.787,8.675 to 4.725,8.675 -line from 0.900,8.938 to 0.963,8.938 -line from 4.787,8.938 to 4.725,8.938 -line from 0.900,9.213 to 0.963,9.213 -line from 4.787,9.213 to 4.725,9.213 -line from 0.900,9.475 to 0.963,9.475 -line from 4.787,9.475 to 4.725,9.475 -line from 0.900,9.738 to 0.963,9.738 -line from 4.787,9.738 to 4.725,9.738 -line from 0.900,10.000 to 0.963,10.000 -line from 4.787,10.000 to 4.725,10.000 -line from 0.900,10.262 to 0.963,10.262 -line from 4.787,10.262 to 4.725,10.262 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.613,7.888 to 1.613,7.950 -line from 1.613,10.262 to 1.613,10.200 -line from 2.312,7.888 to 2.312,7.950 -line from 2.312,10.262 to 2.312,10.200 -line from 3.025,7.888 to 3.025,7.950 -line from 3.025,10.262 to 3.025,10.200 -line from 3.725,7.888 to 3.725,7.950 -line from 3.725,10.262 to 3.725,10.200 -line from 4.438,7.888 to 4.438,7.950 -line from 4.438,10.262 to 4.438,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 3.800,8.675 to 4.025,8.675 -line from 1.250,8.800 to 1.250,8.800 -line from 1.250,8.800 to 1.613,8.912 -line from 1.613,8.912 to 2.312,9.113 -line from 2.312,9.113 to 3.025,9.438 -line from 3.025,9.438 to 3.725,9.750 -line from 3.725,9.750 to 4.438,10.088 -dashwid = 0.037i -line dotted from 3.800,8.525 to 4.025,8.525 -line dotted from 1.250,8.637 to 1.250,8.637 -line dotted from 1.250,8.637 to 1.613,8.700 -line dotted from 1.613,8.700 to 2.312,8.713 -line dotted from 2.312,8.713 to 3.025,8.775 -line dotted from 3.025,8.775 to 3.725,8.887 -line dotted from 3.725,8.887 to 4.438,9.037 -line dashed from 3.800,8.375 to 4.025,8.375 -line dashed from 1.250,8.675 to 1.250,8.675 -line dashed from 1.250,8.675 to 1.613,8.688 -line dashed from 1.613,8.688 to 2.312,8.713 -line dashed from 2.312,8.713 to 3.025,8.825 -line dashed from 3.025,8.825 to 3.725,8.887 -line dashed from 3.725,8.887 to 4.438,9.062 -dashwid = 0.075i -line dotted from 3.800,8.225 to 4.025,8.225 -line dotted from 1.250,8.700 to 1.250,8.700 -line dotted from 1.250,8.700 to 1.613,8.688 -line dotted from 1.613,8.688 to 2.312,8.762 -line dotted from 2.312,8.762 to 3.025,8.812 -line dotted from 3.025,8.812 to 3.725,8.925 -line dotted from 3.725,8.925 to 4.438,9.025 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"50" at 0.825,8.072 rjust -"100" at 0.825,8.335 rjust -"150" at 0.825,8.597 rjust -"200" at 0.825,8.860 rjust -"250" at 0.825,9.135 rjust -"300" at 0.825,9.397 rjust -"350" at 0.825,9.660 rjust -"400" at 0.825,9.922 rjust -"450" at 0.825,10.185 rjust -"0" at 0.900,7.660 -"2" at 1.613,7.660 -"4" at 2.312,7.660 -"6" at 3.025,7.660 -"8" at 3.725,7.660 -"10" at 4.438,7.660 -"Time (sec)" at 0.150,8.997 -"Number of Clients" at 2.837,7.510 -"Figure #5: MAB Phase 5 (compile)" at 2.837,10.335 -"NFS" at 3.725,8.597 rjust -"Leases" at 3.725,8.447 rjust -"Leases, Rdirlookup" at 3.725,8.297 rjust -"Leases, Attrib leases, Rdirlookup" at 3.725,8.147 rjust -.ps -.ft -.PE -.)z -.pp -In figure 2, where a subtree of seventy small files is copied, the difference between the protocol variants is minimal, -with the NQNFS variants performing slightly better. -For this case, the Readdir_and_Lookup RPC is a slight hindrance under heavy -load, possibly because it results in larger directory blocks in the buffer -cache. -.pp -In figure 3, for the phase that gets file attributes for a large number -of files, the leasing variants take about 50% longer, indicating that -there are performance problems in this area. For the case where valid -current leases are required for every file when attributes are returned, -the performance is significantly worse than when the attributes are allowed -to be stale by a few seconds on the client. -I have not been able to explain the oscillation in the curves for the -Lease cases. -.pp -For the string searching phase depicted in figure 4, the leasing variants -that do not require valid leases for files when attributes are returned -appear to scale better with server load than NFS. -However, the effect appears to be -negligible until the server load is fairly heavy. -.pp -Most of the time in the MAB benchmark is spent in the compilation phase -and this is where the differences between caching methods are most -pronounced. -In figure 5 it can be seen that any protocol variant using Leases performs -about a factor of two better than NFS -at a load of ten clients. This indicates that the use of NQNFS may -allow servers to handle significantly more clients for this type of -workload. -.pp -Table 2 summarizes the MAB run times for all phases for the single client -DECstation 5000/25. The \fILeases\fR case refers to using leases, whereas -the \fILeases, Rdirl\fR case uses the Readdir_and_Lookup RPC as well and -the \fIBCache Only\fR case uses leases, but only the buffer cache and not -the attribute or name caches. -The \fINo Caching\fR cases does not do any client side caching, performing -all system calls via synchronous RPCs to the server. -.(z -.ps -1 -.TS -box, center; -c s s s s s s -c c c c c c c c -l | n n n n n n n. -Table #2: Single DECstation 5000/25 Client Elapsed Times (sec) -Phase 1 2 3 4 5 Total % Improvement -_ -No Caching 6 35 41 40 258 380 -93 -NFS 5 24 15 20 133 197 0 -BCache Only 5 20 24 23 116 188 5 -Leases, Rdirl 5 20 21 20 105 171 13 -Leases 5 19 21 21 99 165 16 -.TE -.ps -.)z -.sh 2 "Processor Speed Tests" -.pp -An important goal of client-side file system caching is to decouple the -I/O system calls from the underlying distributed file system, so that the -client's system performance might scale with processor speed. In order -to test this, a series of MAB runs were performed on three -DECstations that are similar except for processor speed. -In addition to the four protocol variants used for the above tests, runs -were done with the client caches turned off, for -worst case performance numbers for caching mechanisms with a 100% miss rate. The CPU utilization -was measured, as an indicator of how much the processor was blocking for -I/O system calls. Note that since the systems were running in single user mode -and otherwise quiescent, almost all CPU activity was directly related -to the MAB run. -The results are presented in -table 3. -The CPU time is simply the product of the CPU utilization and -elapsed running time and, as such, is the optimistic bound on performance -achievable with an ideal client caching scheme that never blocks for I/O. -.(z -.ps -1 -.TS -box, center; -c s s s s s s s s s -c c s s c s s c s s -c c c c c c c c c c -c c c c c c c c c c -l | n n n n n n n n n. -Table #3: MAB Phase 5 (compile) - DS2100 (10.5 MIPS) DS3100 (14.0 MIPS) DS5000/25 (26.7 MIPS) - Elapsed CPU CPU Elapsed CPU CPU Elapsed CPU CPU - time Util(%) time time Util(%) time time Util(%) time -_ -Leases 143 89 127 113 87 98 99 89 88 -Leases, Rdirl 150 89 134 110 91 100 105 88 92 -BCache Only 169 85 144 129 78 101 116 75 87 -NFS 172 77 132 135 74 100 133 71 94 -No Caching 330 47 155 256 41 105 258 39 101 -.TE -.ps -.)z -As can be seen in the table, any caching mechanism achieves significantly -better performance than when caching is disabled, roughly doubling the CPU -utilization with a corresponding reduction in run time. For NFS, the CPU -utilization is dropping with increase in CPU speed, which would suggest that -it is not scaling with CPU speed. For the NQNFS variants, the CPU utilization -remains at just below 90%, which suggests that the caching mechanism is working -well and scaling within this CPU range. -Note that for this benchmark, the ratio of CPU times for -the DECstation 3100 and DECstation 5000/25 are quite different than the -Dhrystone MIPS ratings would suggest. -.pp -Overall, the results seem encouraging, although it remains to be seen whether -or not the caching provided by NQNFS can continue to scale with CPU -performance. -There is a good indication that NQNFS permits a server to scale -to more clients than does NFS, at least for workloads akin to the MAB compile phase. -A more difficult question is "What if the server is much faster doing -write RPCs?" as a result of some technology such as Prestoserve -or write gathering. -Since a significant part of the difference between NFS and NQNFS is -the synchronous writing, it is difficult to predict how much a server -capable of fast write RPCs will negate the performance improvements of NQNFS. -At the very least, table 1 indicates that the write RPC load on the server -has decreased by approximately 30%, and this reduced write load should still -result in some improvement. -.pp -Indications are that the Readdir_and_Lookup RPC has not improved performance -for these tests and may in fact be degrading performance slightly. -The results in figure 3 indicate some problems, possibly with handling -of the attribute cache. It seems logical that the Readdir_and_Lookup RPC -should be permit priming of the attribute cache improving hit rate, but the -results are counter to that. -.sh 2 "Internetwork Delay Tests" -.pp -This experimental setup was used to explore how the different protocol -variants might perform over internetworks with larger RPC RTTs. The -server was moved to a separate Ethernet, using a MicroVAXII\(tm as an -IP router to the other Ethernet. The 4.3Reno BSD Unix system running on the -MicroVAXII was modified to delay IP packets being forwarded by a tunable N -millisecond delay. The implementation was rather crude and did not try to -simulate a distribution of delay times nor was it programmed to drop packets -at a given rate, but it served as a simple emulation of a long, -fat network\** [Jacobson88]. -.(f -\**Long fat networks refer to network interconnections with -a Bandwidth X RTT product > 10\u5\d bits. -.)f -The MAB was run using both UDP and TCP RPC transports -for a variety of RTT delays from five to two hundred milliseconds, -to observe the effects of RTT delay on RPC transport. -It was found that, due to a high variability between runs, four runs was not -suffice, so eight runs at each value was done. -The results in figure 6 and table 4 are the average for the eight runs. -.(z -.PS -.ps -.ps 10 -dashwid = 0.050i -line dashed from 0.900,7.888 to 4.787,7.888 -line dashed from 0.900,7.888 to 0.900,10.262 -line from 0.900,7.888 to 0.963,7.888 -line from 4.787,7.888 to 4.725,7.888 -line from 0.900,8.350 to 0.963,8.350 -line from 4.787,8.350 to 4.725,8.350 -line from 0.900,8.800 to 0.963,8.800 -line from 4.787,8.800 to 4.725,8.800 -line from 0.900,9.262 to 0.963,9.262 -line from 4.787,9.262 to 4.725,9.262 -line from 0.900,9.713 to 0.963,9.713 -line from 4.787,9.713 to 4.725,9.713 -line from 0.900,10.175 to 0.963,10.175 -line from 4.787,10.175 to 4.725,10.175 -line from 0.900,7.888 to 0.900,7.950 -line from 0.900,10.262 to 0.900,10.200 -line from 1.825,7.888 to 1.825,7.950 -line from 1.825,10.262 to 1.825,10.200 -line from 2.750,7.888 to 2.750,7.950 -line from 2.750,10.262 to 2.750,10.200 -line from 3.675,7.888 to 3.675,7.950 -line from 3.675,10.262 to 3.675,10.200 -line from 4.600,7.888 to 4.600,7.950 -line from 4.600,10.262 to 4.600,10.200 -line from 0.900,7.888 to 4.787,7.888 -line from 4.787,7.888 to 4.787,10.262 -line from 4.787,10.262 to 0.900,10.262 -line from 0.900,10.262 to 0.900,7.888 -line from 4.125,8.613 to 4.350,8.613 -line from 0.988,8.400 to 0.988,8.400 -line from 0.988,8.400 to 1.637,8.575 -line from 1.637,8.575 to 2.375,8.713 -line from 2.375,8.713 to 3.125,8.900 -line from 3.125,8.900 to 3.862,9.137 -line from 3.862,9.137 to 4.600,9.425 -dashwid = 0.037i -line dotted from 4.125,8.463 to 4.350,8.463 -line dotted from 0.988,8.375 to 0.988,8.375 -line dotted from 0.988,8.375 to 1.637,8.525 -line dotted from 1.637,8.525 to 2.375,8.850 -line dotted from 2.375,8.850 to 3.125,8.975 -line dotted from 3.125,8.975 to 3.862,9.137 -line dotted from 3.862,9.137 to 4.600,9.625 -line dashed from 4.125,8.312 to 4.350,8.312 -line dashed from 0.988,8.525 to 0.988,8.525 -line dashed from 0.988,8.525 to 1.637,8.688 -line dashed from 1.637,8.688 to 2.375,8.838 -line dashed from 2.375,8.838 to 3.125,9.150 -line dashed from 3.125,9.150 to 3.862,9.275 -line dashed from 3.862,9.275 to 4.600,9.588 -dashwid = 0.075i -line dotted from 4.125,8.162 to 4.350,8.162 -line dotted from 0.988,8.525 to 0.988,8.525 -line dotted from 0.988,8.525 to 1.637,8.838 -line dotted from 1.637,8.838 to 2.375,8.863 -line dotted from 2.375,8.863 to 3.125,9.137 -line dotted from 3.125,9.137 to 3.862,9.387 -line dotted from 3.862,9.387 to 4.600,10.200 -.ps -.ps -1 -.ft -.ft I -"0" at 0.825,7.810 rjust -"100" at 0.825,8.272 rjust -"200" at 0.825,8.722 rjust -"300" at 0.825,9.185 rjust -"400" at 0.825,9.635 rjust -"500" at 0.825,10.097 rjust -"0" at 0.900,7.660 -"50" at 1.825,7.660 -"100" at 2.750,7.660 -"150" at 3.675,7.660 -"200" at 4.600,7.660 -"Time (sec)" at 0.150,8.997 -"Round Trip Delay (msec)" at 2.837,7.510 -"Figure #6: MAB Phase 5 (compile)" at 2.837,10.335 -"Leases,UDP" at 4.050,8.535 rjust -"Leases,TCP" at 4.050,8.385 rjust -"NFS,UDP" at 4.050,8.235 rjust -"NFS,TCP" at 4.050,8.085 rjust -.ps -.ft -.PE -.)z -.(z -.ps -1 -.TS -box, center; -c s s s s s s s s -c c s c s c s c s -c c c c c c c c c -c c c c c c c c c -l | n n n n n n n n. -Table #4: MAB Phase 5 (compile) for Internetwork Delays - NFS,UDP NFS,TCP Leases,UDP Leases,TCP -Delay Elapsed Standard Elapsed Standard Elapsed Standard Elapsed Standard -(msec) time (sec) Deviation time (sec) Deviation time (sec) Deviation time (sec) Deviation -_ -5 139 2.9 139 2.4 112 7.0 108 6.0 -40 175 5.1 208 44.5 150 23.8 139 4.3 -80 207 3.9 213 4.7 180 7.7 210 52.9 -120 276 29.3 273 17.1 221 7.7 238 5.8 -160 304 7.2 328 77.1 275 21.5 274 10.1 -200 372 35.0 506 235.1 338 25.2 379 69.2 -.TE -.ps -.)z -.pp -I found these results somewhat surprising, since I had assumed that stability -across an internetwork connection would be a function of RPC transport -protocol. -Looking at the standard deviations observed between the eight runs, there is an indication -that the NQNFS protocol plays a larger role in -maintaining stability than the underlying RPC transport protocol. -It appears that NFS over TCP transport -is the least stable variant tested. -It should be noted that the TCP implementation used was roughly at 4.3BSD Tahoe -release and that the 4.4BSD TCP implementation was far less stable and would -fail intermittently, due to a bug I was not able to isolate. -It would appear that some of the recent enhancements to the 4.4BSD TCP -implementation have a detrimental effect on the performance of -RPC-type traffic loads, which intermix small and large -data transfers in both directions. -It is obvious that more exploration of this area is needed before any -conclusions can be made -beyond the fact that over a local area network, TCP transport provides -performance comparable to UDP. -.sh 1 "Lessons Learned" -.pp -Evaluating the performance of a distributed file system is fraught with -difficulties, due to the many software and hardware factors involved. -The limited benchmarking presented here took a considerable amount of time -and the results gained by the exercise only give indications of what the -performance might be for a few scenarios. -.pp -The IP router with delay introduction proved to be a valuable tool for protocol debugging\**, -.(f -\**It exposed two bugs in the 4.4BSD networking, one a problem in the Lance chip -driver for the DECstation and the other a TCP window sizing problem that I was -not able to isolate. -.)f -and may be useful for a more extensive study of performance over internetworks -if enhanced to do a better job of simulating internetwork delay and packet loss. -.pp -The Leases mechanism provided a simple model for the provision of cache -consistency and did seem to improve performance for various scenarios. -Unfortunately, it does not provide the server state information that is required -for file system semantics, such as locking, that many software systems demand. -In production environments on my campus, the need for file locking and the correct -generation of the ETXTBSY error code -are far more important that full cache consistency, and leasing -does not satisfy these needs. -Another file system semantic that requires hard server state is the delay -of file removal until the last close system call. Although Spritely NFS -did not support this semantic either, it is logical that the open file -state maintained by that system would facilitate the implementation of -this semantic more easily than would the Leases mechanism. -.sh 1 "Further Work" -.pp -The current implementation uses a fixed, moderate sized buffer cache designed -for the local UFS [McKusick84] file system. -The results in figure 1 suggest that this is adequate so long as the cache -is of an appropriate size. -However, a mechanism permitting the cache to vary in size -has been shown to outperform fixed sized buffer caches [Nelson90], and could -be beneficial. It could also be useful to allow the buffer cache to grow very -large by making use of local backing store for cases where server performance -is limited. -A very large buffer cache size would in turn permit experimentation with -much larger read/write data sizes, facilitating bulk data transfers -across long fat networks, such as will characterize the Internet of the -near future. -A careful redesign of the buffer cache mechanism to provide -support for these features would probably be the next implementation step. -.pp -The results in figure 3 indicate that the mechanics of caching file -attributes and maintaining the attribute cache's consistency needs to -be looked at further. -There also needs to be more work done on the interaction between a -Readdir_and_Lookup RPC and the name and attribute caches, in an effort -to reduce Getattr and Lookup RPC loads. -.pp -The NQNFS protocol has never been used in a production environment and doing -so would provide needed insight into how well the protocol saisfies the -needs of real workstation environments. -It is hoped that the distribution of the implementation in 4.4BSD will -facilitate use of the protocol in production environments elsewhere. -.pp -The big question that needs to be resolved is whether Leases are an adequate -mechanism for cache consistency or whether hard server state is required. -Given the work presented here and in the papers related to Sprite and Spritely -NFS, there are clear indications that a cache consistency algorithm can -improve both performance and file system semantics. -As yet, however, it is unclear what the best approach to maintain consistency is. -It would appear that hard state information is required for file locking and -other mechanisms and, if so, it seems appropriate to use it for cache -consistency as well. -.sh 1 "Acknowledgements" -.pp -I would like to thank the members of the CSRG at the University of California, -Berkeley for their continued support over the years. Without their encouragement and assistance this -software would never have been implemented. -Prof. Jim Linders and Prof. Tom Wilson here at the University of Guelph helped -proofread this paper and Jeffrey Mogul provided a great deal of -assistance, helping to turn my gibberish into something at least moderately -readable. -.sh 1 "References" -.ip [Baker91] 15 -Mary Baker and John Ousterhout, Availability in the Sprite Distributed -File System, In \fIOperating System Review\fR, (25)2, pg. 95-98, -April 1991. -.ip [Baker91a] 15 -Mary Baker, private communication, May 1991. -.ip [Burrows88] 15 -Michael Burrows, Efficient Data Sharing, Technical Report #153, -Computer Laboratory, University of Cambridge, Dec. 1988. -.ip [Gray89] 15 -Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant -Mechanism for Distributed File Cache Consistency, In \fIProc. of the -Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park, -AZ, Dec. 1989. -.ip [Howard88] 15 -John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, -M. Satyanarayanan, Robert N. Sidebotham and Michael J. West, -Scale and Performance in a Distributed File System, \fIACM Trans. on -Computer Systems\fR, (6)1, pg 51-81, Feb. 1988. -.ip [Jacobson88] 15 -Van Jacobson and R. Braden, \fITCP Extensions for Long-Delay Paths\fR, -ARPANET Working Group Requests for Comment, DDN Network Information Center, -SRI International, Menlo Park, CA, October 1988, RFC-1072. -.ip [Jacobson89] 15 -Van Jacobson, Sun NFS Performance Problems, \fIPrivate Communication,\fR -November, 1989. -.ip [Juszczak89] 15 -Chet Juszczak, Improving the Performance and Correctness of an NFS Server, -In \fIProc. Winter 1989 USENIX Conference,\fR pg. 53-63, San Diego, CA, January 1989. -.ip [Juszczak94] 15 -Chet Juszczak, Improving the Write Performance of an NFS Server, -to appear in \fIProc. Winter 1994 USENIX Conference,\fR San Francisco, CA, January 1994. -.ip [Kazar88] 15 -Michael L. Kazar, Synchronization and Caching Issues in the Andrew File System, -In \fIProc. Winter 1988 USENIX Conference,\fR pg. 27-36, Dallas, TX, February -1988. -.ip [Kent87] 15 -Christopher. A. Kent and Jeffrey C. Mogul, \fIFragmentation Considered Harmful\fR, Research Report 87/3, -Digital Equipment Corporation Western Research Laboratory, Dec. 1987. -.ip [Kent87a] 15 -Christopher. A. Kent, \fICache Coherence in Distributed Systems\fR, Research Report 87/4, -Digital Equipment Corporation Western Research Laboratory, April 1987. -.ip [Macklem90] 15 -Rick Macklem, Lessons Learned Tuning the 4.3BSD Reno Implementation of the -NFS Protocol, -In \fIProc. Winter 1991 USENIX Conference,\fR pg. 53-64, Dallas, TX, -January 1991. -.ip [Macklem93] 15 -Rick Macklem, The 4.4BSD NFS Implementation, -In \fIThe System Manager's Manual\fR, 4.4 Berkeley Software Distribution, -University of California, Berkeley, June 1993. -.ip [McKusick84] 15 -Marshall K. McKusick, William N. Joy, Samuel J. Leffler and Robert S. Fabry, -A Fast File System for UNIX, \fIACM Transactions on Computer Systems\fR, -Vol. 2, Number 3, pg. 181-197, August 1984. -.ip [McKusick90] 15 -Marshall K. McKusick, Michael J. Karels and Keith Bostic, A Pageable Memory -Based Filesystem, -In \fIProc. Summer 1990 USENIX Conference,\fR pg. 137-143, Anaheim, CA, June -1990. -.ip [Mogul93] 15 -Jeffrey C. Mogul, Recovery in Spritely NFS, -Research Report 93/2, Digital Equipment Corporation Western Research -Laboratory, June 1993. -.ip [Moran90] 15 -Joseph Moran, Russel Sandberg, Don Coleman, Jonathan Kepecs and Bob Lyon, -Breaking Through the NFS Performance Barrier, -In \fIProc. Spring 1990 EUUG Conference,\fR pg. 199-206, Munich, FRG, -April 1990. -.ip [Nelson88] 15 -Michael N. Nelson, Brent B. Welch, and John K. Ousterhout, Caching in the -Sprite Network File System, \fIACM Transactions on Computer Systems\fR (6)1 -pg. 134-154, February 1988. -.ip [Nelson90] 15 -Michael N. Nelson, \fIVirtual Memory vs. The File System\fR, Research Report -90/4, Digital Equipment Corporation Western Research Laboratory, March 1990. -.ip [Nowicki89] 15 -Bill Nowicki, Transport Issues in the Network File System, In \fIComputer -Communication Review\fR, pg. 16-20, March 1989. -.ip [Ousterhout90] 15 -John K. Ousterhout, Why Aren't Operating Systems Getting Faster As Fast as -Hardware? In \fIProc. Summer 1990 USENIX Conference\fR, pg. 247-256, Anaheim, -CA, June 1990. -.ip [Sandberg85] 15 -Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon, -Design and Implementation of the Sun Network filesystem, In \fIProc. Summer -1985 USENIX Conference\fR, pages 119-130, Portland, OR, June 1985. -.ip [Srinivasan89] 15 -V. Srinivasan and Jeffrey. C. Mogul, Spritely NFS: Experiments with -Cache-Consistency Protocols, -In \fIProc. of the -Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park, -AZ, Dec. 1989. -.ip [Steiner88] 15 -J. G. Steiner, B. C. Neuman and J. I. Schiller, Kerberos: An Authentication -Service for Open Network Systems, -In \fIProc. Winter 1988 USENIX Conference,\fR pg. 191-202, Dallas, TX, February -1988. -.ip [SUN89] 15 -Sun Microsystems Inc., \fINFS: Network File System Protocol Specification\fR, -ARPANET Working Group Requests for Comment, DDN Network Information Center, -SRI International, Menlo Park, CA, March 1989, RFC-1094. -.ip [SUN93] 15 -Sun Microsystems Inc., \fINFS: Network File System Version 3 Protocol Specification\fR, -Sun Microsystems Inc., Mountain View, CA, June 1993. -.ip [Wittle93] 15 -Mark Wittle and Bruce E. Keith, LADDIS: The Next Generation in NFS File -Server Benchmarking, -In \fIProc. Summer 1993 USENIX Conference,\fR pg. 111-128, Cincinnati, OH, June -1993. -.(f -\(mo -NFS is believed to be a trademark of Sun Microsystems, Inc. -.)f -.(f -\(dg -Prestoserve is a trademark of Legato Systems, Inc. -.)f -.(f -\(sc -MIPS is a trademark of Silicon Graphics, Inc. -.)f -.(f -\(dg -DECstation, MicroVAXII and Ultrix are trademarks of Digital Equipment Corp. -.)f -.(f -\(dd -Unix is a trademark of Novell, Inc. -.)f -- 2.11.4.GIT