4 (derived from ftp://ftp.tik.ee.ethz.ch/pub/students/2011-FS/MA-2011-01.pdf)
6 For performance evaluation and debugging purposes, we have implemented a fast
7 zero-copy traffic generator, named trafgen. trafgen utilizes the PF_PACKET
8 (packet(7)) socket interface of Linux which postpones complete control over
9 packet data and packet headers into the user space. Since Linux 2.6.31, a new
10 PF_PACKET extension has been added into the mainline kernel that is known
11 under the term zero-copy TX_RING [4].
13 TX_RING is a ring buffer with virtual memory that is directly mapped into both
14 address spaces (figure D.1). Thus, kernel space and user space can access this
15 buffer without needing to perform system calls or additional context switches
16 and without needing to copy buffers between address spaces. The TX_RING buffer
17 is configurable in size and each ring buffer slot has a header with control
18 information such as a status flag. The status flag provides information about
19 the current usage of the slot. Thus, (i) the kernel knows if this slot is ready
20 for transmission and (ii) the user space knows whether the current slot can
21 be filled with a new packet.
23 If the kernel is triggered to process the TX_RING data, it allocates a new
24 socket buffer structure for each filled ring slot, sets the TX_RING pages of
25 the current slot as data fragments, and finally calls dev_queue_xmit for
26 transmission (section 3.1.2).
28 For using the TX_RING with high-speed packet rates, network device drivers
29 should have NAPI (section 3.1.1) enabled to perform interrupt load mitigation.
30 In trafgen, every 10 microseconds (default, can be changed via command line
31 option), a real-time timer calls sendto(2) in order to trigger the kernel for
32 processing frames of the TX_RING.
34 Via command line option, trafgen can also be bound to run on a specific CPU.
35 Thus, overhead of process and cache-line migration is avoided, if the Linux
36 process scheduler decides to migrate trafgen to a different CPU. Further, if
37 trafgen is bound to a specific CPU, it automatically migrates the NIC’s
38 interrupt affinity to the bound CPU, too. This is done in order to avoid
39 cache-line migration to the NIC’s interrupt CPU, hence, to keep data CPU local.
41 The TX_RING size can also be configured via command line option with values
42 ranging from megabytes to gigabytes. Furthermore, trafgen makes use of our
43 own assembler-optimized memcpy for x86/x86 64 architectures with MMX
44 registers in order to speed up copying the generated packet template into the
47 By exploiting the TX_RING for transmission, small-sized packet rates with
48 approx. 1.25 mio pps were generated by trafgen on an Intel Core 2 Quad CPU
49 with 2.40 GHz, 4 GB RAM and an Intel 82566DC-2 Gigabit Ethernet card (figure
50 D.2). trafgen was bound to a single CPU and trafgen’s CPU interrupt
51 migration was activated, thus NIC interrupts were received on the same CPU on
52 which trafgen was bound to. An identical machine was used for packet reception,
53 both machines were directly connected and ifpps (section D.2) was used on the
54 receive-side for measurement. Since we have already published the source code
55 of trafgen, we attracted users to perform further benchmarks with trafgen on
56 their hardware. We found out that the results heavily depend on the used
57 Gigabit Ethernet adapter. For instance, Ronald W. Henderson wrote a Wiki
58 article [115] about our trafgen where he reached the physical line rate of
59 1.488 mio 64 Byte pps.
61 With our test setup, we have compared trafgen with two other packet generators,
62 namely mausezahn [116] and pktgen [117]. mausezahn is a fast user space packet
63 generator that uses libnet [118], a framework for low-level network packet
64 construction. The second traffic generator is pktgen, which is part of the
65 Linux mainline kernel and resides in the core of the networking subsystem.
66 In contrast to trafgen and mausezahn, pktgen must be configured via procfs.
67 pktgen’s configuration options are limited to basic protocols like IPv4 or
68 IPv6. As a transport layer protocol, only UDP is supported and packet payload
69 cannot be configured at all. Figure D.2 shows that even for small packets, the
70 kernel space pktgen is able to transmit up to 1.38 mio pps.
72 The kernel source code shows that one packet copy can be avoided in comparison
73 to trafgen. In case of trafgen, the kernel does not copy the TX_RING slot data
74 to skb->data, but sets data pages as socket buffer fragments. Hence, in
75 dev_hard_start_xmit the buffer might need to linearize its fragments in some
76 cases through __skb_linearize (section 3.1.2) to DMA-capable memory. pktgen on
77 the contrary can directly allocate an already linearized and DMA-capable
78 buffer, thus this can be the cause of trafgen’s performance penalty.
80 However, trafgen is still up to 40 percent faster than mausezahn with the
81 benefit of having more degrees of freedom regarding packet configuration in
82 contrast to pktgen. On larger packet sizes, all three traffic generators have
83 a similar pps performance in the test setup. We assume that this is mainly due
84 to hardware and bandwidth limitations of the underlying system. On better
85 equipped systems with e.g. 10 Gigabit Ethernet, we assume that the order of
86 performance of the benchmarked tools looks similar to the part with smaller
89 Next to the TX_RING, trafgen has a second working mode that allows the
90 definition of inter-packet departure times, which is mainly used for debugging
91 purposes in LANA. This method invokes system calls and the copy of packet
92 buffers for transmission, since inter-packet departure times are not supported
93 by the TX_RING. This is also realized using PF_PACKET sockets, but instead of
94 allocating a TX_RING, packets are directly transmitted with sendto(2).
96 Furthermore, trafgen provides its own packet configuration language. By this,
97 multiple packets can be defined in a single packet configuration file, where
98 packet headers and packet payload are specified byte-wise. Within such a packet
99 configuration, there can be elements like counter or random number generators,
100 thus e.g. bytes of a source MAC address can be randomized or incremented.
102 trafgen is published under the GNU GPL version 2 and has been added into the
103 netsniff-ng toolkit [119]. The netsniff-ng toolkit also ships an example packet
104 configuration file that can be used for high-speed transmissions:
106 trafgen --dev eth0 --conf trafgen2.txf --bind 0
108 See src/examples/trafgen for trafgen configuration examples! A simple txf file
111 # A more simple example for trafgen
114 0x10, 0x01, 0x3b, 0xba, 0x22, 0x0f,
116 0xa0, 0x23, 0xfc, 0xaa, 0x01, 0x3a,
120 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
121 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
122 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
123 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
124 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
125 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
126 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
127 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,