1 QEMU virtio-fs shared file system daemon
2 ========================================
7 **virtiofsd** [*OPTIONS*]
12 Share a host directory tree with a guest through a virtio-fs device. This
13 program is a vhost-user backend that implements the virtio-fs device. Each
14 virtio-fs device instance requires its own virtiofsd process.
16 This program is designed to work with QEMU's ``--device vhost-user-fs-pci``
17 but should work with any virtual machine monitor (VMM) that supports
18 vhost-user. See the Examples section below.
20 This program must be run as the root user. The program drops privileges where
21 possible during startup although it must be able to create and access files
24 * The ability to invoke syscalls is limited using seccomp(2).
25 * Linux capabilities(7) are dropped.
27 In "namespace" sandbox mode the program switches into a new file system
28 namespace and invokes pivot_root(2) to make the shared directory tree its root.
29 A new pid and net namespace is also created to isolate the process.
31 In "chroot" sandbox mode the program invokes chroot(2) to make the shared
32 directory tree its root. This mode is intended for container environments where
33 the container runtime has already set up the namespaces and the program does
34 not have permission to create namespaces itself.
36 Both sandbox modes prevent "file system escapes" due to symlinks and other file
37 system objects that might lead to files outside the shared directory.
42 .. program:: virtiofsd
44 .. option:: -h, --help
48 .. option:: -V, --version
58 Print log messages to syslog instead of stderr.
66 Enable/disable flock. The default is ``no_flock``.
69 Modify the list of capabilities allowed; CAPLIST is a colon separated
70 list of capabilities, each preceded by either + or -, e.g.
71 ''+sys_admin:-chown''.
74 Print only log messages matching LEVEL or more severe. LEVEL is one of
75 ``err``, ``warn``, ``info``, or ``debug``. The default is ``info``.
77 * posix_lock|no_posix_lock -
78 Enable/disable remote POSIX locks. The default is ``no_posix_lock``.
80 * readdirplus|no_readdirplus -
81 Enable/disable readdirplus. The default is ``readdirplus``.
83 * sandbox=namespace|chroot -
85 - namespace: Create mount, pid, and net namespaces and pivot_root(2) into
87 - chroot: chroot(2) into shared directory (use in containers).
88 The default is "namespace".
91 Share host directory tree located at PATH. This option is required.
94 I/O timeout in seconds. The default depends on cache= option.
96 * writeback|no_writeback -
97 Enable/disable writeback cache. The cache allows the FUSE client to buffer
98 and merge write requests. The default is ``no_writeback``.
101 Enable/disable extended attributes (xattr) on files and directories. The
102 default is ``no_xattr``.
104 * posix_acl|no_posix_acl -
105 Enable/disable posix acl support. Posix ACLs are disabled by default.
107 * security_label|no_security_label -
108 Enable/disable security label support. Security labels are disabled by
109 default. This will allow client to send a MAC label of file during
110 file creation. Typically this is expected to be SELinux security
111 label. Server will try to set that label on newly created file
112 atomically wherever possible.
114 * killpriv_v2|no_killpriv_v2 -
115 Enable/disable ``FUSE_HANDLE_KILLPRIV_V2`` support. KILLPRIV_V2 is enabled
116 by default as long as the client supports it. Enabling this option helps
117 with performance in write path.
119 .. option:: --socket-path=PATH
121 Listen on vhost-user UNIX domain socket at PATH.
123 .. option:: --socket-group=GROUP
125 Set the vhost-user UNIX domain socket gid to GROUP.
127 .. option:: --fd=FDNUM
129 Accept connections from vhost-user UNIX domain socket file descriptor FDNUM.
130 The file descriptor must already be listening for connections.
132 .. option:: --thread-pool-size=NUM
134 Restrict the number of worker threads per request queue to NUM. The default
137 .. option:: --cache=none|auto|always
139 Select the desired trade-off between coherency and performance. ``none``
140 forbids the FUSE client from caching to achieve best coherency at the cost of
141 performance. ``auto`` acts similar to NFS with a 1 second metadata cache
142 timeout. ``always`` sets a long cache lifetime at the expense of coherency.
143 The default is ``auto``.
145 Extended attribute (xattr) mapping
146 ----------------------------------
148 By default the name of xattr's used by the client are passed through to the server
149 file system. This can be a problem where either those xattr names are used
150 by something on the server (e.g. selinux client/server confusion) or if the
151 ``virtiofsd`` is running in a container with restricted privileges where it
152 cannot access some attributes.
157 A mapping of xattr names can be made using -o xattrmap=mapping where the ``mapping``
158 string consists of a series of rules.
160 The first matching rule terminates the mapping.
161 The set of rules must include a terminating rule to match any remaining attributes
164 Each rule consists of a number of fields separated with a separator that is the
165 first non-white space character in the rule. This separator must then be used
167 White space may be added before and after each rule.
169 Using ':' as the separator a rule is of the form:
171 ``:type:scope:key:prepend:``
175 - 'client' - match 'key' against a xattr name from the client for
176 setxattr/getxattr/removexattr
177 - 'server' - match 'prepend' against a xattr name from the server
179 - 'all' - can be used to make a single rule where both the server
180 and client matches are triggered.
184 - 'prefix' - is designed to prepend and strip a prefix; the modified
185 attributes then being passed on to the client/server.
187 - 'ok' - Causes the rule set to be terminated when a match is found
188 while allowing matching xattr's through unchanged.
189 It is intended both as a way of explicitly terminating
190 the list of rules, and to allow some xattr's to skip following rules.
192 - 'bad' - If a client tries to use a name matching 'key' it's
193 denied using EPERM; when the server passes an attribute
194 name matching 'prepend' it's hidden. In many ways it's use is very like
195 'ok' as either an explicit terminator or for special handling of certain
198 - 'unsupported' - If a client tries to use a name matching 'key' it's
199 denied using ENOTSUP; when the server passes an attribute
200 name matching 'prepend' it's hidden. In many ways it's use is very like
201 'ok' as either an explicit terminator or for special handling of certain
204 **key** is a string tested as a prefix on an attribute name originating
205 on the client. It maybe empty in which case a 'client' rule
206 will always match on client names.
208 **prepend** is a string tested as a prefix on an attribute name originating
209 on the server, and used as a new prefix. It may be empty
210 in which case a 'server' rule will always match on all names from
215 ``:prefix:client:trusted.:user.virtiofs.:``
217 will match 'trusted.' attributes in client calls and prefix them before
218 passing them to the server.
220 ``:prefix:server::user.virtiofs.:``
222 will strip 'user.virtiofs.' from all server replies.
224 ``:prefix:all:trusted.:user.virtiofs.:``
226 combines the previous two cases into a single rule.
228 ``:ok:client:user.::``
230 will allow get/set xattr for 'user.' xattr's and ignore
233 ``:ok:server::security.:``
235 will pass 'securty.' xattr's in listxattr from the server
236 and ignore following rules.
240 will terminate the rule search passing any remaining attributes
243 ``:bad:server::security.:``
245 would hide 'security.' xattr's in listxattr from the server.
247 A simpler 'map' type provides a shorter syntax for the common case:
249 ``:map:key:prepend:``
251 The 'map' type adds a number of separate rules to add **prepend** as a prefix
252 to the matched **key** (or all attributes if **key** is empty).
253 There may be at most one 'map' rule and it must be the last rule in the set.
255 Note: When the 'security.capability' xattr is remapped, the daemon has to do
256 extra work to remove it during many operations, which the host kernel normally
259 Security considerations
260 ~~~~~~~~~~~~~~~~~~~~~~~
262 Operating systems typically partition the xattr namespace using
263 well defined name prefixes. Each partition may have different
264 access controls applied. For example, on Linux there are multiple
267 * ``system.*`` - access varies depending on attribute & filesystem
268 * ``security.*`` - only processes with CAP_SYS_ADMIN
269 * ``trusted.*`` - only processes with CAP_SYS_ADMIN
270 * ``user.*`` - any process granted by file permissions / ownership
272 While other OS such as FreeBSD have different name prefixes
273 and access control rules.
275 When remapping attributes on the host, it is important to
276 ensure that the remapping does not allow a guest user to
277 evade the guest access control rules.
279 Consider if ``trusted.*`` from the guest was remapped to
280 ``user.virtiofs.trusted*`` in the host. An unprivileged
281 user in a Linux guest has the ability to write to xattrs
282 under ``user.*``. Thus the user can evade the access
283 control restriction on ``trusted.*`` by instead writing
284 to ``user.virtiofs.trusted.*``.
286 As noted above, the partitions used and access controls
287 applied, will vary across guest OS, so it is not wise to
288 try to predict what the guest OS will use.
290 The simplest way to avoid an insecure configuration is
291 to remap all xattrs at once, to a given fixed prefix.
292 This is shown in example (1) below.
294 If selectively mapping only a subset of xattr prefixes,
295 then rules must be added to explicitly block direct
296 access to the target of the remapping. This is shown
297 in example (2) below.
302 1) Prefix all attributes with 'user.virtiofs.'
306 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
309 This uses two rules, using : as the field separator;
310 the first rule prefixes and strips 'user.virtiofs.',
311 the second rule hides any non-prefixed attributes that
314 This is equivalent to the 'map' rule:
318 -o xattrmap=":map::user.virtiofs.:"
320 2) Prefix 'trusted.' attributes, allow others through
324 "/prefix/all/trusted./user.virtiofs./
325 /bad/server//trusted./
326 /bad/client/user.virtiofs.//
330 Here there are four rules, using / as the field
331 separator, and also demonstrating that new lines can
332 be included between rules.
333 The first rule is the prefixing of 'trusted.' and
334 stripping of 'user.virtiofs.'.
335 The second rule hides unprefixed 'trusted.' attributes
337 The third rule stops a guest from explicitly setting
338 the 'user.virtiofs.' path directly to prevent access
339 control bypass on the target of the earlier prefix
341 Finally, the fourth rule lets all remaining attributes
344 This is equivalent to the 'map' rule:
348 -o xattrmap="/map/trusted./user.virtiofs./"
350 3) Hide 'security.' attributes, and allow everything else
354 "/bad/all/security./security./
357 The first rule combines what could be separate client and server
358 rules into a single 'all' rule, matching 'security.' in either
359 client arguments or lists returned from the host. This stops
360 the client seeing any 'security.' attributes on the server and
361 stops it setting any.
365 One can enable support for SELinux by running virtiofsd with option
366 "-o security_label". But this will try to save guest's security context
367 in xattr security.selinux on host and it might fail if host's SELinux
368 policy does not permit virtiofsd to do this operation.
370 Hence, it is preferred to remap guest's "security.selinux" xattr to say
371 "trusted.virtiofs.security.selinux" on host.
373 "-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
375 This will make sure that guest and host's SELinux xattrs on same file
376 remain separate and not interfere with each other. And will allow both
377 host and guest to implement their own separate SELinux policies.
379 Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need
380 add this capability to daemon.
382 "-o modcaps=+sys_admin"
384 Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more
385 powerful and if gets compromised, it can do lot of damage to host system.
386 So keep this trade-off in my mind while making a decision.
391 Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket
392 ``/var/run/vm001-vhost-fs.sock``:
396 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
397 host# |qemu_system| \\
398 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \\
399 -device vhost-user-fs-pci,chardev=char0,tag=myfs \\
400 -object memory-backend-memfd,id=mem,size=4G,share=on \\
401 -numa node,memdev=mem \\
403 guest# mount -t virtiofs myfs /mnt