1 .\" Copyright (c) 1990 The Regents of the University of California.
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that: (1) source code distributions
6 .\" retain the above copyright notice and this paragraph in its entirety, (2)
7 .\" distributions including binary code include the above copyright notice and
8 .\" this paragraph in its entirety in the documentation or other materials
9 .\" provided with the distribution, and (3) all advertising materials mentioning
10 .\" features or use of this software display the following acknowledgement:
11 .\" ``This product includes software developed by the University of California,
12 .\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
13 .\" the University nor the names of its contributors may be used to endorse
14 .\" or promote products derived from this software without specific prior
15 .\" written permission.
16 .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
17 .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
18 .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
20 .\" This document is derived in part from the enet man page (enet.4)
21 .\" distributed with 4.3BSD Unix.
23 .\" $FreeBSD: src/share/man/man4/bpf.4,v 1.21.2.11 2002/04/07 04:57:13 dd Exp $
24 .\" $DragonFly: src/share/man/man4/bpf.4,v 1.5 2006/05/26 19:39:39 swildner Exp $
31 .Nd Berkeley Packet Filter
35 The Berkeley Packet Filter
36 provides a raw interface to data link layers in a protocol
38 All packets on the network, even those destined for other hosts,
39 are accessible through this mechanism.
41 The packet filter appears as a character special device,
45 After opening the device, the file descriptor must be bound to a
46 specific network interface with the
49 A given interface can be shared by multiple listeners, and the filter
50 underlying each descriptor will see an identical packet stream.
52 A separate device file is required for each minor device.
53 If a file is in use, the open will fail and
58 Associated with each open instance of a
60 file is a user-settable packet filter.
61 Whenever a packet is received by an interface,
62 all file descriptors listening on that interface apply their filter.
63 Each descriptor that accepts the packet receives its own copy.
65 Reads from these files return the next group of packets
66 that have matched the filter.
67 To improve performance, the buffer passed to read must be
68 the same size as the buffers used internally by
70 This size is returned by the
72 ioctl (see below), and
75 Note that an individual packet larger than this size is necessarily
78 The packet filter will support any link level protocol that has fixed length
79 headers. Currently, only Ethernet,
83 drivers have been modified to interact with
86 Since packet data is in network byte order, applications should use the
88 macros to extract multi-byte values.
90 A packet can be sent out on the network by writing to a
92 file descriptor. The writes are unbuffered, meaning only one
93 packet can be processed per write.
94 Currently, only writes to Ethernets and
100 command codes below are defined in
105 #include <sys/types.h>
106 #include <sys/time.h>
107 #include <sys/ioctl.h>
124 the following commands may be applied to any open
127 The (third) argument to
129 should be a pointer to the type indicated.
130 .Bl -tag -width BIOCGRTIMEOUT
133 Returns the required buffer length for reads on
138 Sets the buffer length for reads on
140 files. The buffer must be set before the file is attached to an interface
143 If the requested buffer size cannot be accommodated, the closest
144 allowable size will be set and returned in the argument.
145 A read call will result in
147 if it is passed a buffer that is not this size.
150 Returns the type of the data link layer underlying the attached interface.
152 is returned if no interface has been specified.
153 The device types, prefixed with
158 Forces the interface into promiscuous mode.
159 All packets, not just those destined for the local host, are processed.
160 Since more than one file can be listening on a given interface,
161 a listener that opened its interface non-promiscuously may receive
162 packets promiscuously. This problem can be remedied with an
165 Flushes the buffer of incoming packets,
166 and resets the statistics that are returned by BIOCGSTATS.
168 .Pq Li "struct ifreq"
169 Returns the name of the hardware interface that the file is listening on.
170 The name is returned in the ifr_name field of
174 All other fields are undefined.
176 .Pq Li "struct ifreq"
177 Sets the hardware interface associated with the file.
178 This command must be performed before any packets can be read.
179 The device is indicated by name using the
184 Additionally, performs the actions of
188 .Pq Li "struct timeval"
189 Set or get the read timeout parameter.
191 specifies the length of time to wait before timing
192 out on a read request.
193 This parameter is initialized to zero by
195 indicating no timeout.
197 .Pq Li "struct bpf_stat"
198 Returns the following structure of packet statistics:
201 u_int bs_recv; /* number of packets received */
202 u_int bs_drop; /* number of packets dropped */
207 .Bl -hang -offset indent
209 the number of packets received by the descriptor since opened or reset
210 (including any buffered since the last read call);
213 the number of packets which were accepted by the filter but dropped by the
214 kernel because of buffer overflows
215 (i.e., the application's reads aren't keeping up with the packet traffic).
221 based on the truth value of the argument.
222 When immediate mode is enabled, reads return immediately upon packet
223 reception. Otherwise, a read will block until either the kernel buffer
224 becomes full or a timeout occurs.
225 This is useful for programs like
227 which must respond to messages in real time.
228 The default for a new file is off.
230 .Pq Li "struct bpf_program"
231 Sets the filter program used by the kernel to discard uninteresting
232 packets. An array of instructions and its length is passed in using
233 the following structure:
237 struct bpf_insn *bf_insns;
241 The filter program is pointed to by the
243 field while its length in units of
244 .Sq Li struct bpf_insn
253 for an explanation of the filter language.
255 .Pq Li "struct bpf_version"
256 Returns the major and minor version numbers of the filter language currently
257 recognized by the kernel. Before installing a filter, applications must check
258 that the current version is compatible with the running kernel. Version
259 numbers are compatible if the major numbers match and the application minor
260 is less than or equal to the kernel minor. The kernel version number is
261 returned in the following structure:
269 The current version numbers are given by
270 .Dv BPF_MAJOR_VERSION
272 .Dv BPF_MINOR_VERSION
275 An incompatible filter
276 may result in undefined behavior (most likely, an error returned by
278 or haphazard packet matching).
282 Set or get the status of the
285 Set to zero if the link level source address should be filled in automatically
286 by the interface output routine. Set to one if the link level source
287 address will be written, as provided, to the wire. This flag is initialized
292 Set or get the flag determining whether locally generated packets on the
293 interface should be returned by BPF. Set to zero to see only incoming
294 packets on the interface. Set to one to see packets originating
295 locally and remotely on the interface. This flag is initialized to one by
299 The following structure is prepended to each packet returned by
303 struct timeval bh_tstamp; /* time stamp */
304 u_long bh_caplen; /* length of captured portion */
305 u_long bh_datalen; /* original length of packet */
306 u_short bh_hdrlen; /* length of bpf header (this struct
307 plus alignment padding */
311 The fields, whose values are stored in host order, and are:
313 .Bl -tag -compact -width bh_datalen
315 The time at which the packet was processed by the packet filter.
317 The length of the captured portion of the packet. This is the minimum of
318 the truncation amount specified by the filter and the length of the packet.
320 The length of the packet off the wire.
321 This value is independent of the truncation amount specified by the filter.
325 header, which may not be equal to
326 .\" XXX - not really a function call
327 .Fn sizeof "struct bpf_hdr" .
332 field exists to account for
333 padding between the header and the link level protocol.
334 The purpose here is to guarantee proper alignment of the packet
335 data structures, which is required on alignment sensitive
336 architectures and improves performance on many other architectures.
337 The packet filter insures that the
339 and the network layer
340 header will be word aligned. Suitable precautions
341 must be taken when accessing the link layer protocol fields on alignment
342 restricted machines. (This isn't a problem on an Ethernet, since
343 the type field is a short falling on an even offset,
344 and the addresses are probably accessed in a bytewise fashion).
346 Additionally, individual packets are padded so that each starts
347 on a word boundary. This requires that an application
348 has some knowledge of how to get from packet to packet.
354 this process. It rounds up its argument
355 to the nearest word aligned value (where a word is
361 points to the start of a packet, this expression
362 will advance it to the next packet:
363 .Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
365 For the alignment mechanisms to work properly, the
368 must itself be word aligned.
372 will always return an aligned buffer.
374 A filter program is an array of instructions, with all branches forwardly
375 directed, terminated by a
378 Each instruction performs some action on the pseudo-machine state,
379 which consists of an accumulator, index register, scratch memory store,
380 and implicit program counter.
382 The following structure defines the instruction format:
394 field is used in different ways by different instructions,
399 fields are used as offsets
400 by the branch instructions.
401 The opcodes are encoded in a semi-hierarchical fashion.
402 There are eight classes of instructions:
412 Various other mode and
413 operator bits are or'd into the class to give the actual instructions.
414 The classes and modes are defined in
417 Below are the semantics for each defined
420 We use the convention that A is the accumulator, X is the index register,
421 P[] packet data, and M[] scratch memory store.
422 P[i:n] gives the data at byte offset
425 interpreted as a word (n=4),
426 unsigned halfword (n=2), or unsigned byte (n=1).
427 M[i] gives the i'th word in the scratch memory store, which is only
428 addressed in word units. The memory store is indexed from 0 to
435 are the corresponding fields in the
436 instruction definition.
438 refers to the length of the packet.
440 .Bl -tag -width BPF_STXx
442 These instructions copy a value into the accumulator. The type of the
443 source operand is specified by an
445 and can be a constant
447 packet data at a fixed offset
449 packet data at a variable offset
453 or a word in the scratch memory store
459 the data size must be specified as a word
465 The semantics of all the recognized
469 .Bl -tag -width "BPF_LD+BPF_W+BPF_IND" -compact
470 .It Li BPF_LD+BPF_W+BPF_ABS
472 .It Li BPF_LD+BPF_H+BPF_ABS
474 .It Li BPF_LD+BPF_B+BPF_ABS
476 .It Li BPF_LD+BPF_W+BPF_IND
478 .It Li BPF_LD+BPF_H+BPF_IND
480 .It Li BPF_LD+BPF_B+BPF_IND
482 .It Li BPF_LD+BPF_W+BPF_LEN
484 .It Li BPF_LD+BPF_IMM
486 .It Li BPF_LD+BPF_MEM
490 These instructions load a value into the index register. Note that
491 the addressing modes are more restrictive than those of the accumulator loads,
494 a hack for efficiently loading the IP header length.
496 .Bl -tag -width "BPF_LDX+BPF_W+BPF_MEM" -compact
497 .It Li BPF_LDX+BPF_W+BPF_IMM
499 .It Li BPF_LDX+BPF_W+BPF_MEM
501 .It Li BPF_LDX+BPF_W+BPF_LEN
503 .It Li BPF_LDX+BPF_B+BPF_MSH
507 This instruction stores the accumulator into the scratch memory.
508 We do not need an addressing mode since there is only one possibility
511 .Bl -tag -width "BPF_ST" -compact
516 This instruction stores the index register in the scratch memory store.
518 .Bl -tag -width "BPF_STX" -compact
523 The alu instructions perform operations between the accumulator and
524 index register or constant, and store the result back in the accumulator.
525 For binary operations, a source mode is required
530 .Bl -tag -width "BPF_ALU+BPF_MUL+BPF_K" -compact
531 .It Li BPF_ALU+BPF_ADD+BPF_K
533 .It Li BPF_ALU+BPF_SUB+BPF_K
535 .It Li BPF_ALU+BPF_MUL+BPF_K
537 .It Li BPF_ALU+BPF_DIV+BPF_K
539 .It Li BPF_ALU+BPF_AND+BPF_K
541 .It Li BPF_ALU+BPF_OR+BPF_K
543 .It Li BPF_ALU+BPF_LSH+BPF_K
545 .It Li BPF_ALU+BPF_RSH+BPF_K
547 .It Li BPF_ALU+BPF_ADD+BPF_X
549 .It Li BPF_ALU+BPF_SUB+BPF_X
551 .It Li BPF_ALU+BPF_MUL+BPF_X
553 .It Li BPF_ALU+BPF_DIV+BPF_X
555 .It Li BPF_ALU+BPF_AND+BPF_X
557 .It Li BPF_ALU+BPF_OR+BPF_X
559 .It Li BPF_ALU+BPF_LSH+BPF_X
561 .It Li BPF_ALU+BPF_RSH+BPF_X
563 .It Li BPF_ALU+BPF_NEG
567 The jump instructions alter flow of control. Conditional jumps
568 compare the accumulator against a constant
570 or the index register
572 If the result is true (or non-zero),
573 the true branch is taken, otherwise the false branch is taken.
574 Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
575 However, the jump always
577 opcode uses the 32 bit
579 field as the offset, allowing arbitrarily distant destinations.
580 All conditionals use unsigned comparison conventions.
582 .Bl -tag -width "BPF_JMP+BPF_KSET+BPF_X" -compact
583 .It Li BPF_JMP+BPF_JA
585 .It Li BPF_JMP+BPF_JGT+BPF_K
586 pc += (A > k) ? jt : jf
587 .It Li BPF_JMP+BPF_JGE+BPF_K
588 pc += (A >= k) ? jt : jf
589 .It Li BPF_JMP+BPF_JEQ+BPF_K
590 pc += (A == k) ? jt : jf
591 .It Li BPF_JMP+BPF_JSET+BPF_K
592 pc += (A & k) ? jt : jf
593 .It Li BPF_JMP+BPF_JGT+BPF_X
594 pc += (A > X) ? jt : jf
595 .It Li BPF_JMP+BPF_JGE+BPF_X
596 pc += (A >= X) ? jt : jf
597 .It Li BPF_JMP+BPF_JEQ+BPF_X
598 pc += (A == X) ? jt : jf
599 .It Li BPF_JMP+BPF_JSET+BPF_X
600 pc += (A & X) ? jt : jf
603 The return instructions terminate the filter program and specify the amount
604 of packet to accept (i.e., they return the truncation amount). A return
605 value of zero indicates that the packet should be ignored.
606 The return value is either a constant
611 .Bl -tag -width "BPF_RET+BPF_K" -compact
618 The miscellaneous category was created for anything that doesn't
619 fit into the above classes, and for any new instructions that might need to
620 be added. Currently, these are the register transfer instructions
621 that copy the index register to the accumulator or vice versa.
623 .Bl -tag -width "BPF_MISC+BPF_TAX" -compact
624 .It Li BPF_MISC+BPF_TAX
626 .It Li BPF_MISC+BPF_TXA
633 interface provides the following macros to facilitate
635 .Fn BPF_STMT opcode operand
637 .Fn BPF_JUMP opcode operand true_offset false_offset .
639 .Bl -tag -compact -width /dev/bpfXXX
640 .It Pa /dev/bpf Ns Sy n
641 the packet filter device
644 The following filter is taken from the Reverse ARP Daemon. It accepts
645 only Reverse ARP requests.
647 struct bpf_insn insns[] = {
648 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
649 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
650 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
651 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
652 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
653 sizeof(struct ether_header)),
654 BPF_STMT(BPF_RET+BPF_K, 0),
658 This filter accepts only IP packets between host 128.3.112.15 and
661 struct bpf_insn insns[] = {
662 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
663 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
664 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
665 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
666 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
667 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
668 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
669 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
670 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
671 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
672 BPF_STMT(BPF_RET+BPF_K, 0),
676 Finally, this filter returns only TCP finger packets. We must parse
677 the IP header to reach the TCP header. The
680 checks that the IP fragment offset is 0 so we are sure
681 that we have a TCP header.
683 struct bpf_insn insns[] = {
684 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
685 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
686 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
687 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
688 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
689 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
690 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
691 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
692 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
693 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
694 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
695 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
696 BPF_STMT(BPF_RET+BPF_K, 0),
707 .%T "An efficient, extensible, and portable network monitor"
710 The Enet packet filter was created in 1980 by Mike Accetta and
711 Rick Rashid at Carnegie-Mellon University. Jeffrey Mogul, at
712 Stanford, ported the code to
714 and continued its development from
715 1983 on. Since then, it has evolved into the Ultrix Packet Filter
728 of Lawrence Berkeley Laboratory, implemented BPF in
729 Summer 1990. Much of the design is due to
732 The read buffer must be of a fixed size (returned by the
736 A file that does not request promiscuous mode may receive promiscuously
737 received packets as a side effect of another file requesting this
738 mode on the same hardware interface. This could be fixed in the kernel
739 with additional processing overhead. However, we favor the model where
740 all files must assume that the interface is promiscuous, and if
741 so desired, must utilize a filter to reject foreign packets.
743 Data link protocols with variable length headers are not currently supported.