1 .\" Copyright (c) 1983, 1991, 1993
2 .\" The Regents of the University of California. All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. Neither the name of the University nor the names of its contributors
13 .\" may be used to endorse or promote products derived from this software
14 .\" without specific prior written permission.
16 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28 .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
29 .\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $
30 .\" $DragonFly: src/share/man/man4/tcp.4,v 1.9 2008/10/17 11:30:24 swildner Exp $
37 .Nd Internet Transmission Control Protocol
43 .Fn socket AF_INET SOCK_STREAM 0
47 protocol provides reliable, flow-controlled, two-way
48 transmission of data. It is a byte-stream protocol used to
51 abstraction. TCP uses the standard
52 Internet address format and, in addition, provides a per-host
55 Thus, each address is composed
56 of an Internet address specifying the host and network, with
59 port on the host identifying the peer entity.
61 Sockets utilizing the tcp protocol are either
65 Active sockets initiate connections to passive
68 sockets are created active; to create a
71 system call must be used
72 after binding the socket with the
75 passive sockets may use the
77 call to accept incoming connections. Only active sockets may
80 call to initiate connections.
84 their location to match
85 incoming connection requests from multiple networks. This
87 .Dq wildcard addressing ,
89 server to provide service to clients on multiple networks.
90 To create a socket which listens on all networks, the Internet
95 port may still be specified
96 at this time; if the port is not specified the system will assign one.
97 Once a connection has been established the socket's address is
98 fixed by the peer entity's location. The address assigned the
99 socket is the address associated with the network interface
100 through which packets are being transmitted and received. Normally
101 this address corresponds to the peer entity's network.
104 supports a number of socket options which can be set with
108 .Bl -tag -width TCP_NODELAYx
110 Under most circumstances,
112 sends data when it is presented;
113 when outstanding data has not yet been acknowledged, it gathers
114 small amounts of output to be sent in a single packet once
115 an acknowledgement is received.
116 For a small number of clients, such as window systems
117 that send a stream of mouse events which receive no replies,
118 this packetization may cause significant delays.
121 defeats this algorithm.
123 By default, a sender\- and receiver-TCP
124 will negotiate among themselves to determine the maximum segment size
125 to be used for each connection. The
127 option allows the user to determine the result of this negotiation,
128 and to reduce it if desired.
131 usually sends a number of options in each packet, corresponding to
134 extensions which are provided in this implementation. The boolean
137 is provided to disable
139 option use on a per-connection basis.
141 By convention, the sender-TCP
144 bit and begin transmission immediately (if permitted) at the end of
151 option is set to a non-zero value,
153 will delay sending any data at all until either the socket is closed,
154 or the internal send buffer is filled.
155 .It Dv TCP_SIGNATURE_ENABLE
156 This option enables the use of MD5 digests (also known as TCP-MD5)
157 on writes to the specified socket.
158 In the current release, only outgoing traffic is digested;
159 digests on incoming traffic are not verified.
160 The current default behavior for the system is to respond to a system
161 advertising this option with TCP-MD5; this may change.
163 One common use for this in a DragonFlyBSD router deployment is to enable
164 based routers to interwork with Cisco equipment at peering points.
165 Support for this feature conforms to RFC 2385.
166 Only IPv4 (AF_INET) sessions are supported.
168 In order for this option to function correctly, it is necessary for the
169 administrator to add a tcp-md5 key entry to the system's security
170 associations database (SADB) using the
173 This entry must have an SPI of 0x1000 and can therefore only be specified
174 on a per-host basis at this time.
176 If an SADB entry cannot be found for the destination, the outgoing traffic
177 will have an invalid digest option prepended, and the following error message
178 will be visible on the system console:
179 .Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" .
183 connection cannot be established within a period of time,
185 will time out the connection attempt.
188 option specifies the number of milliseconds to wait
189 before the connection attempt times out.
190 The default value for
192 is tcp.keepinit milliseconds.
193 For the accepted sockets, the
195 option value is inherited from the listening socket.
201 sends a keepalive probe to the remote system of a connection
202 that has been idle for a period of time.
205 specifies the number of milliseconds before
207 will send the initial keepalive probe.
208 The default value for
210 is tcp.keepidle milliseconds.
211 For the accepted sockets,
214 option value is inherited from the listening socket.
220 sends a keepalive probe to the remote system of a connection
221 that has been idle for a period of time.
224 option specifies the number of milliseconds to wait
225 before retransmitting a keepalive probe.
226 The default value for
228 is tcp.keepintvl milliseconds.
229 For the accepted sockets,
232 option value is inherited from the listening socket.
238 sends a keepalive probe to the remote system of a connection
239 that has been idle for a period of time.
242 option specifies the maximum number of keepalive
243 probes to be sent before dropping the connection.
244 The default value for
246 is tcp.keepcnt milliseconds.
247 For the accepted sockets,
250 option value is inherited from the listening socket.
253 The option level for the
255 call is the protocol number for
258 .Xr getprotobyname 3 ,
261 All options are declared in
266 transport level may be used with
270 Incoming connection requests that are source-routed are noted,
271 and the reverse source route is used in responding.
275 protocol implements a number of variables in the
280 .Bl -tag -width TCPCTL_DO_RFC1644
281 .It Dv TCPCTL_DO_RFC1323
283 Implement the window scaling and timestamp options of RFC 1323
285 .It Dv TCPCTL_MSSDFLT
287 The default value used for the maximum segment size
289 when no advice to the contrary is received from MSS negotiation.
290 .It Dv TCPCTL_SENDSPACE
292 Maximum TCP send window.
293 .It Dv TCPCTL_RECVSPACE
295 Maximum TCP receive window.
297 Log any connection attempts to ports where there is not a socket
298 accepting connections.
299 The value of 1 limits the logging to SYN (connection establishment)
301 That of 2 results in any TCP packets to closed ports being logged.
302 Any value unlisted above disables the logging
303 (default is 0, i.e., the logging is disabled).
305 The Maximum Segment Lifetime for a packet.
307 Timeout for new, non-established TCP connections.
309 Amount of time the connection should be idle before keepalive
310 probes (if enabled) are sent.
312 The interval between keepalive probes sent to remote machines.
315 (default 8) probes are sent, with no response, the connection is dropped.
317 The maximum number of keepalive probes to be sent
318 before dropping the connection.
319 .It tcp.always_keepalive
324 connections, the kernel will
325 periodically send a packet to the remote host to verify the connection
330 unreachable messages may abort connections in
336 reassembly queue if the system is low on mbufs.
338 If enabled, disable sending of RST when a connection is attempted
339 to a port where there is not a socket accepting connections.
343 Delay ACK to try and piggyback it onto a data packet.
345 Maximum amount of time before a delayed ACK is sent.
347 Enable TCP NewReno Fast Recovery algorithm,
348 as described in RFC 2582.
349 .It tcp.path_mtu_discovery
350 Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding
351 IP fragmentation when tranferring lots of data to the same client.
352 For web servers, where most of the connections are short and to
353 different clients, PMTU Discovery actually hurts performance due
354 to unnecessary retransmissions. Turn this on only if most of your
355 TCP connections are long transfers or are repeatedly to the same
360 control-block hashtable
362 This may be tuned using the kernel option
365 .Va net.inet.tcp.tcbhashsize
369 Number of active process control blocks
372 Determines whether or not syn cookies should be generated for
373 outbound syn-ack packets. Syn cookies are a great help during
374 syn flood attacks, and are enabled by default.
375 .It tcp.isn_reseed_interval
376 The interval (in seconds) specifying how often the secret data used in
377 RFC 1948 initial sequence number calculations should be reseeded.
378 By default, this variable is set to zero, indicating that
379 no reseeding will occur.
380 Reseeding should not be necessary, and will break
382 recycling for a few minutes.
383 .It tcp.inet.tcp.rexmit_{min,slop}
384 Adjust the retransmit timer calculation for TCP. The slop is
385 typically added to the raw calculation to take into account
386 occasional variances that the SRTT (smoothed round trip time)
387 is unable to accommodate, while the minimum specifies an
388 absolute minimum. While a number of TCP RFCs suggest a 1
389 second minimum these RFCs tend to focus on streaming behavior
390 and fail to deal with the fact that a 1 second minimum has severe
391 detrimental effects over lossy interactive connections, such
392 as a 802.11b wireless link, and over very fast but lossy
393 connections for those cases not covered by the fast retransmit
394 code. For this reason we suggest changing the slop to 200ms and
395 setting the minimum to something out of the way, like 20ms,
396 which gives you an effective minimum of 200ms (similar to Linux).
397 .It tcp.inflight_enable
400 bandwidth delay product limiting. An attempt will be made to calculate
401 the bandwidth delay product for each individual TCP connection and limit
402 the amount of inflight data being transmitted to avoid building up
403 unnecessary packets in the network. This option is recommended if you
404 are serving a lot of data over connections with high bandwidth-delay
405 products, such as modems, GigE links, and fast long-haul WANs, and/or
406 you have configured your machine to accommodate large TCP windows. In such
407 situations, without this option, you may experience high interactive
408 latencies or packet loss due to the overloading of intermediate routers
409 and switches. Note that bandwidth delay product limiting only affects
410 the transmit side of a TCP connection.
411 .It tcp.inflight_debug
412 Enable debugging for the bandwidth delay product algorithm. This may
413 default to on (1) so if you enable the algorithm you should probably also
414 disable debugging by setting this variable to 0.
416 This puts an lower bound on the bandwidth delay product window, in bytes.
417 A value of 1024 is typically used for debugging. 6000-16000 is more typical
418 in a production installation. Setting this value too low may result in
419 slow ramp-up times for bursty connections. Setting this value too high
420 effectively disables the algorithm.
422 This puts an upper bound on the bandwidth delay product window, in bytes.
423 This value should not generally be modified but may be used to set a
424 global per-connection limit on queued data, potentially allowing you to
425 intentionally set a less than optimum limit to smooth data flow over a
426 network while still being able to specify huge internal TCP buffers.
427 .It tcp.inflight_stab
428 This value stabilizes the bwnd (write window) calculation at high speeds
429 by increasing the bandwidth calculation in 1/10% increments. The default
430 value of 50 represents a +5% increase. In addition, bwnd is further increased
431 by a fixed 2*maxseg bytes to stabilize the algorithm at low speeds.
432 Changing the stab value is not recommended, but you may come across
433 situations where tuning is beneficial.
434 However, our recommendation for tuning is to stick with only adjusting
436 Reducing tcp.inflight_stab too much can lead to upwards of a 20%
437 underutilization of the link and prevent the algorithm from properly adapting
438 to changing situations. Increasing tcp.inflight_stab too much can lead to
439 an excessive packet buffering situation.
442 A socket operation may fail with one of the following errors returned:
445 when trying to establish a connection on a socket which
448 when the system runs out of memory for
449 an internal data structure;
451 when a connection was dropped
452 due to excessive retransmissions;
455 forces the connection to be closed;
456 .It Bq Er ECONNREFUSED
458 peer actively refuses connection establishment (usually because
459 no process is listening to the port);
462 is made to create a socket with a port which has already been
464 .It Bq Er EADDRNOTAVAIL
465 when an attempt is made to create a
466 socket with a network address for which no network interface
468 .It Bq Er EAFNOSUPPORT
469 when an attempt is made to bind or connect a socket to a multicast
485 .%T "TCP Extensions for High Performance"
490 .%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
498 The RFC 1323 extensions for window scaling and timestamps were added