2 Device Specification for Inter-VM shared memory device
3 ------------------------------------------------------
5 The Inter-VM shared memory device is designed to share a memory region (created
6 on the host via the POSIX shared memory API) between multiple QEMU processes
7 running different guests. In order for all guests to be able to pick up the
8 shared memory area, it is modeled by QEMU as a PCI device exposing said memory
9 to the guest as a PCI BAR.
10 The memory region does not belong to any guest, but is a POSIX memory object on
11 the host. The host can access this shared memory if needed.
13 The device also provides an optional communication mechanism between guests
14 sharing the same memory object. More details about that in the section 'Guest to
15 guest communication' section.
18 The Inter-VM PCI device
19 -----------------------
21 From the VM point of view, the ivshmem PCI device supports three BARs.
23 - BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
25 - BAR1 is used for MSI-X when it is enabled in the device.
26 - BAR2 is used to access the shared memory object.
28 It is your choice how to use the device but you must choose between two
31 - basically, if you only need the shared memory part, you will map BAR2.
32 This way, you have access to the shared memory in guest and can use it as you
33 see fit (memnic, for example, uses it in userland
34 http://dpdk.org/browse/memnic).
36 - BAR0 and BAR1 are used to implement an optional communication mechanism
37 through interrupts in the guests. If you need an event mechanism between the
38 guests accessing the shared memory, you will most likely want to write a
39 kernel driver that will handle interrupts. See details in the section 'Guest
40 to guest communication' section.
42 The behavior is chosen when starting your QEMU processes:
43 - no communication mechanism needed, the first QEMU to start creates the shared
44 memory on the host, subsequent QEMU processes will use it.
46 - communication mechanism needed, an ivshmem server must be started before any
47 QEMU processes, then each QEMU process connects to the server unix socket.
49 For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
52 Guest to guest communication
53 ----------------------------
55 This section details the communication mechanism between the guests accessing
56 the ivhsmem shared memory.
60 This server code is available in qemu.git/contrib/ivshmem-server.
62 The server must be started on the host before any guest.
63 It creates a shared memory object then waits for clients to connect on a unix
64 socket. All the messages are little-endian int64_t integer.
66 For each client (QEMU process) that connects to the server:
67 - the server sends a protocol version, if client does not support it, the client
68 closes the communication,
69 - the server assigns an ID for this client and sends this ID to him as the first
71 - the server sends a fd to the shared memory object to this client,
72 - the server creates a new set of host eventfds associated to the new client and
73 sends this set to all already connected clients,
74 - finally, the server sends all the eventfds sets for all clients to the new
77 The server signals all clients when one of them disconnects.
79 The client IDs are limited to 16 bits because of the current implementation (see
80 Doorbell register in 'PCI device registers' subsection). Hence only 65536
81 clients are supported.
83 All the file descriptors (fd to the shared memory, eventfds for each client)
84 are passed to clients using SCM_RIGHTS over the server unix socket.
86 Apart from the current ivshmem implementation in QEMU, an ivshmem client has
87 been provided in qemu.git/contrib/ivshmem-client for debug.
89 *QEMU as an ivshmem client*
91 At initialisation, when creating the ivshmem device, QEMU first receives a
92 protocol version and closes communication with server if it does not match.
93 Then, QEMU gets its ID from the server then makes it available through BAR0
94 IVPosition register for the VM to use (see 'PCI device registers' subsection).
95 QEMU then uses the fd to the shared memory to map it to BAR2.
96 eventfds for all other clients received from the server are stored to implement
97 BAR0 Doorbell register (see 'PCI device registers' subsection).
98 Finally, eventfds assigned to this QEMU process are used to send interrupts in
101 *PCI device registers*
103 From the VM point of view, the ivshmem PCI device supports 4 registers of
106 enum ivshmem_registers {
113 The first two registers are the interrupt mask and status registers. Mask and
114 status are only used with pin-based interrupts. They are unused with MSI
117 Status Register: The status register is set to 1 when an interrupt occurs.
119 Mask Register: The mask register is bitwise ANDed with the interrupt status
120 and the result will raise an interrupt if it is non-zero. However, since 1 is
121 the only value the status will be set to, it is only the first bit of the mask
122 that has any effect. Therefore interrupts can be masked by setting the first
123 bit to 0 and unmasked by setting the first bit to 1.
125 IVPosition Register: The IVPosition register is read-only and reports the
126 guest's ID number. The guest IDs are non-negative integers. When using the
127 server, since the server is a separate process, the VM ID will only be set when
128 the device is ready (shared memory is received from the server and accessible
129 via the device). If the device is not ready, the IVPosition will return -1.
130 Applications should ensure that they have a valid VM ID before accessing the
133 Doorbell Register: To interrupt another guest, a guest must write to the
134 Doorbell register. The doorbell register is 32-bits, logically divided into
135 two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
136 16-bits are the interrupt vector to trigger. The semantics of the value
137 written to the doorbell depends on whether the device is using MSI or a regular
138 pin-based interrupt. In short, MSI uses vectors while regular interrupts set
143 If regular interrupts are used (due to either a guest not supporting MSI or the
144 user specifying not to use them on startup) then the value written to the lower
145 16-bits of the Doorbell register results is arbitrary and will trigger an
146 interrupt in the destination guest.
148 Message Signalled Interrupts
150 An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
151 written to the Doorbell register must be between 0 and the maximum number of
152 vectors the guest supports. The lower 16 bits written to the doorbell is the
153 MSI vector that will be raised in the destination guest. The number of MSI
154 vectors is configurable but it is set when the VM is started.
156 The important thing to remember with MSI is that it is only a signal, no status
157 is set (since MSI interrupts are not shared). All information other than the
158 interrupt itself should be communicated via the shared memory region. Devices
159 supporting multiple MSI vectors can use different vectors to indicate different
160 events have occurred. The semantics of interrupt vectors are left to the