1 .\" Copyright (c) 1983, 1991, 1993
2 .\" The Regents of the University of California. All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. Neither the name of the University nor the names of its contributors
13 .\" may be used to endorse or promote products derived from this software
14 .\" without specific prior written permission.
16 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28 .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
29 .\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $
36 .Nd Internet Transmission Control Protocol
42 .Fn socket AF_INET SOCK_STREAM 0
46 protocol provides reliable, flow-controlled, two-way
47 transmission of data. It is a byte-stream protocol used to
50 abstraction. TCP uses the standard
51 Internet address format and, in addition, provides a per-host
54 Thus, each address is composed
55 of an Internet address specifying the host and network, with
58 port on the host identifying the peer entity.
60 Sockets utilizing the tcp protocol are either
64 Active sockets initiate connections to passive
67 sockets are created active; to create a
70 system call must be used
71 after binding the socket with the
74 passive sockets may use the
76 call to accept incoming connections. Only active sockets may
79 call to initiate connections.
83 their location to match
84 incoming connection requests from multiple networks. This
86 .Dq wildcard addressing ,
88 server to provide service to clients on multiple networks.
89 To create a socket which listens on all networks, the Internet
94 port may still be specified
95 at this time; if the port is not specified the system will assign one.
96 Once a connection has been established the socket's address is
97 fixed by the peer entity's location. The address assigned the
98 socket is the address associated with the network interface
99 through which packets are being transmitted and received. Normally
100 this address corresponds to the peer entity's network.
103 supports a number of socket options which can be set with
107 .Bl -tag -width TCP_NODELAYx
109 Under most circumstances,
111 sends data when it is presented;
112 when outstanding data has not yet been acknowledged, it gathers
113 small amounts of output to be sent in a single packet once
114 an acknowledgement is received.
115 For a small number of clients, such as window systems
116 that send a stream of mouse events which receive no replies,
117 this packetization may cause significant delays.
120 defeats this algorithm.
122 By default, a sender\- and receiver-TCP
123 will negotiate among themselves to determine the maximum segment size
124 to be used for each connection. The
126 option allows the user to determine the result of this negotiation,
127 and to reduce it if desired.
130 usually sends a number of options in each packet, corresponding to
133 extensions which are provided in this implementation. The boolean
136 is provided to disable
138 option use on a per-connection basis.
140 By convention, the sender-TCP
143 bit and begin transmission immediately (if permitted) at the end of
150 option is set to a non-zero value,
152 will delay sending any data at all until either the socket is closed,
153 or the internal send buffer is filled.
154 .\".It Dv TCP_SIGNATURE_ENABLE
155 .\"This option enables the use of MD5 digests (also known as TCP-MD5)
156 .\"on writes to the specified socket.
157 .\"In the current release, only outgoing traffic is digested;
158 .\"digests on incoming traffic are not verified.
159 .\"The current default behavior for the system is to respond to a system
160 .\"advertising this option with TCP-MD5; this may change.
162 .\"One common use for this in a DragonFlyBSD router deployment is to enable
163 .\"based routers to interwork with Cisco equipment at peering points.
164 .\"Support for this feature conforms to RFC 2385.
165 .\"Only IPv4 (AF_INET) sessions are supported.
167 .\"In order for this option to function correctly, it is necessary for the
168 .\"administrator to add a tcp-md5 key entry to the system's security
169 .\"associations database (SADB) using the
172 .\"This entry must have an SPI of 0x1000 and can therefore only be specified
173 .\"on a per-host basis at this time.
175 .\"If an SADB entry cannot be found for the destination, the outgoing traffic
176 .\"will have an invalid digest option prepended, and the following error message
177 .\"will be visible on the system console:
178 .\".Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" .
182 connection cannot be established within a period of time,
184 will time out the connection attempt.
187 option specifies the number of milliseconds to wait
188 before the connection attempt times out.
189 The default value for
191 is tcp.keepinit milliseconds.
192 For the accepted sockets, the
194 option value is inherited from the listening socket.
200 sends a keepalive probe to the remote system of a connection
201 that has been idle for a period of time.
204 specifies the number of milliseconds before
206 will send the initial keepalive probe.
207 The default value for
209 is tcp.keepidle milliseconds.
210 For the accepted sockets,
213 option value is inherited from the listening socket.
219 sends a keepalive probe to the remote system of a connection
220 that has been idle for a period of time.
223 option specifies the number of milliseconds to wait
224 before retransmitting a keepalive probe.
225 The default value for
227 is tcp.keepintvl milliseconds.
228 For the accepted sockets,
231 option value is inherited from the listening socket.
237 sends a keepalive probe to the remote system of a connection
238 that has been idle for a period of time.
241 option specifies the maximum number of keepalive
242 probes to be sent before dropping the connection.
243 The default value for
245 is tcp.keepcnt milliseconds.
246 For the accepted sockets,
249 option value is inherited from the listening socket.
252 The option level for the
254 call is the protocol number for
257 .Xr getprotobyname 3 ,
260 All options are declared in
265 transport level may be used with
269 Incoming connection requests that are source-routed are noted,
270 and the reverse source route is used in responding.
274 protocol implements a number of variables in the
279 .Bl -tag -width TCPCTL_DO_RFC1644
280 .It Dv TCPCTL_DO_RFC1323
282 Implement the window scaling and timestamp options of RFC 1323
284 .It Dv TCPCTL_MSSDFLT
286 The default value used for the maximum segment size
288 when no advice to the contrary is received from MSS negotiation.
289 .It Dv TCPCTL_SENDSPACE
291 Maximum TCP send window.
292 .It Dv TCPCTL_RECVSPACE
294 Maximum TCP receive window.
296 Log any connection attempts to ports where there is not a socket
297 accepting connections.
298 The value of 1 limits the logging to SYN (connection establishment)
300 That of 2 results in any TCP packets to closed ports being logged.
301 Any value unlisted above disables the logging
302 (default is 0, i.e., the logging is disabled).
304 The Maximum Segment Lifetime for a packet.
306 Timeout for new, non-established TCP connections.
308 Amount of time the connection should be idle before keepalive
309 probes (if enabled) are sent.
311 The interval between keepalive probes sent to remote machines.
314 (default 8) probes are sent, with no response, the connection is dropped.
316 The maximum number of keepalive probes to be sent
317 before dropping the connection.
318 .It tcp.always_keepalive
323 connections, the kernel will
324 periodically send a packet to the remote host to verify the connection
329 unreachable messages may abort connections in
335 reassembly queue if the system is low on mbufs.
337 If enabled, disable sending of RST when a connection is attempted
338 to a port where there is not a socket accepting connections.
342 Delay ACK to try and piggyback it onto a data packet.
344 Maximum amount of time before a delayed ACK is sent.
346 Enable TCP NewReno Fast Recovery algorithm,
347 as described in RFC 2582.
348 .It tcp.path_mtu_discovery
349 Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding
350 IP fragmentation when tranferring lots of data to the same client.
351 For web servers, where most of the connections are short and to
352 different clients, PMTU Discovery actually hurts performance due
353 to unnecessary retransmissions. Turn this on only if most of your
354 TCP connections are long transfers or are repeatedly to the same
359 control-block hashtable
361 This may be tuned using the kernel option
364 .Va net.inet.tcp.tcbhashsize
368 Number of active process control blocks
371 Determines whether or not syn cookies should be generated for
372 outbound syn-ack packets. Syn cookies are a great help during
373 syn flood attacks, and are enabled by default.
374 .It tcp.isn_reseed_interval
375 The interval (in seconds) specifying how often the secret data used in
376 RFC 1948 initial sequence number calculations should be reseeded.
377 By default, this variable is set to zero, indicating that
378 no reseeding will occur.
379 Reseeding should not be necessary, and will break
381 recycling for a few minutes.
382 .It tcp.inet.tcp.rexmit_{min,slop}
383 Adjust the retransmit timer calculation for TCP. The slop is
384 typically added to the raw calculation to take into account
385 occasional variances that the SRTT (smoothed round trip time)
386 is unable to accommodate, while the minimum specifies an
387 absolute minimum. While a number of TCP RFCs suggest a 1
388 second minimum these RFCs tend to focus on streaming behavior
389 and fail to deal with the fact that a 1 second minimum has severe
390 detrimental effects over lossy interactive connections, such
391 as a 802.11b wireless link, and over very fast but lossy
392 connections for those cases not covered by the fast retransmit
393 code. For this reason we suggest changing the slop to 200ms and
394 setting the minimum to something out of the way, like 20ms,
395 which gives you an effective minimum of 200ms (similar to Linux).
396 .It tcp.inflight_enable
399 bandwidth delay product limiting. An attempt will be made to calculate
400 the bandwidth delay product for each individual TCP connection and limit
401 the amount of inflight data being transmitted to avoid building up
402 unnecessary packets in the network. This option is recommended if you
403 are serving a lot of data over connections with high bandwidth-delay
404 products, such as modems, GigE links, and fast long-haul WANs, and/or
405 you have configured your machine to accommodate large TCP windows. In such
406 situations, without this option, you may experience high interactive
407 latencies or packet loss due to the overloading of intermediate routers
408 and switches. Note that bandwidth delay product limiting only affects
409 the transmit side of a TCP connection.
410 .It tcp.inflight_debug
411 Enable debugging for the bandwidth delay product algorithm. This may
412 default to on (1) so if you enable the algorithm you should probably also
413 disable debugging by setting this variable to 0.
415 This puts an lower bound on the bandwidth delay product window, in bytes.
416 A value of 1024 is typically used for debugging. 6000-16000 is more typical
417 in a production installation. Setting this value too low may result in
418 slow ramp-up times for bursty connections. Setting this value too high
419 effectively disables the algorithm.
421 This puts an upper bound on the bandwidth delay product window, in bytes.
422 This value should not generally be modified but may be used to set a
423 global per-connection limit on queued data, potentially allowing you to
424 intentionally set a less than optimum limit to smooth data flow over a
425 network while still being able to specify huge internal TCP buffers.
426 .It tcp.inflight_stab
427 This value stabilizes the bwnd (write window) calculation at high speeds
428 by increasing the bandwidth calculation in 1/10% increments. The default
429 value of 50 represents a +5% increase. In addition, bwnd is further increased
430 by a fixed 2*maxseg bytes to stabilize the algorithm at low speeds.
431 Changing the stab value is not recommended, but you may come across
432 situations where tuning is beneficial.
433 However, our recommendation for tuning is to stick with only adjusting
435 Reducing tcp.inflight_stab too much can lead to upwards of a 20%
436 underutilization of the link and prevent the algorithm from properly adapting
437 to changing situations. Increasing tcp.inflight_stab too much can lead to
438 an excessive packet buffering situation.
441 A socket operation may fail with one of the following errors returned:
444 when trying to establish a connection on a socket which
447 when the system runs out of memory for
448 an internal data structure;
450 when a connection was dropped
451 due to excessive retransmissions;
454 forces the connection to be closed;
455 .It Bq Er ECONNREFUSED
457 peer actively refuses connection establishment (usually because
458 no process is listening to the port);
461 is made to create a socket with a port which has already been
463 .It Bq Er EADDRNOTAVAIL
464 when an attempt is made to create a
465 socket with a network address for which no network interface
467 .It Bq Er EAFNOSUPPORT
468 when an attempt is made to bind or connect a socket to a multicast
483 .%T "TCP Extensions for High Performance"
488 .%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
496 The RFC 1323 extensions for window scaling and timestamps were added