From cbf1c68379a1d74ef9d4dbe358a523761f405e41 Mon Sep 17 00:00:00 2001 From: Bert Hubert Date: Mon, 17 Dec 2001 12:46:00 +0000 Subject: [PATCH] lots of changes I should commit more often --- 2.4routing.sgml | 477 +++++++++++++++++++++++++++++++++++++++++++--------- index.php3 | 16 +- manpages/Makefile | 13 +- manpages/index.php3 | 9 +- manpages/tc-cbq.8 | 17 +- manpages/tc-red.8 | 131 +++++++++++++++ manpages/tc-tbf.8 | 35 ++-- manpages/tc.8 | 49 +++++- 8 files changed, 631 insertions(+), 116 deletions(-) create mode 100644 manpages/tc-red.8 diff --git a/2.4routing.sgml b/2.4routing.sgml index 9b197d8..847f942 100755 --- a/2.4routing.sgml +++ b/2.4routing.sgml @@ -44,7 +44,7 @@ Welcome, gentle reader. This document hopes to enlighten you on how to do more with Linux 2.2/2.4 routing. Unbeknownst to most users, you already run tools which allow you to do spectacular things. Commands like 'route' and 'ifconfig' are actually -very thin wrappers for the very powerful iproute2 infrastructure +very thin wrappers for the very powerful iproute2 infrastructure.

I hope that this HOWTO will become as readable as the ones by Rusty Russell of (amongst other things) netfilter fame. @@ -834,7 +834,7 @@ subnet notation works just like with regular IP adresses. Your IPv4 address is 145.100.24.181 and the 6bone router has IPv4 address 145.100.1.5 -# ip tunnel add sixbone mode sit remote 145.100.1.5 [local 145.100.24.181 ttl 225] +# ip tunnel add sixbone mode sit remote 145.100.1.5 [local 145.100.24.181 ttl 255] # ip link set sixbone up # ip addr add 3FFE:604:6:7::2/126 dev sixbone # ip route add 3ffe::0/16 dev sixbone @@ -989,6 +989,12 @@ dedicated bandwidth management systems. Linux even goes far beyond what Frame and ATM provide. +Just to prevent confusion, tc uses the following rules for bandwith +specification: + +mb = 1024 kb = 1024 * 1024 b => byte/s +mbit = 1024 kbit = 1024 * 1024 bit => bit/s. + Queues and Queueing Disciplines explained

With queueing we determine the way in which data is sent. It is @@ -1014,6 +1020,13 @@ If you have a router and wish to prevent certain hosts within your network from downloading too fast, you need to do your shaping on the *inner* interface of your router, the one that sends data to your own computers. +You also have to be sure you are controlling the bottleneck of the link. +If you have a 100Mbit NIC and you have a router that has a 256kbit link, +you have to make sure you are not sending more data than your router can +handle. Othewise, it will be the router who is controlling the link and +shaping the available bandwith. We need to 'own the queue' so to speak, and +be the slowest link in the chain. Luckily this is easily possoble. + Simple, classless Queueing Disciplines

As said, with queueing disciplines, we change the way data is sent. @@ -1270,7 +1283,7 @@ destroys interactivity. This is because uploading will fill the queue in the modem, which is probably *huge* because this helps actually achieving good data throughput uploading. But this is not what you want, you want to have the queue not too -big so interactivity remains and you can stil do other stuff while sending +big so interactivity remains and you can still do other stuff while sending data. The line above slows down sending to a rate that does not lead to a queue in @@ -1374,8 +1387,10 @@ youth of the subject, a lot of different words are used when people in fact mean the same thing. The following is loosely based on draft-ietf-diffserv-model-06.txt, 'An -Informal Management Model for Diffserv Routers'. +Informal Management Model for Diffserv Routers'. It can currently be found +at http://www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt. +Read it for the strict definitions of the terms used. Queueing Discipline An algorithm that manages the queue of a device, either incoming (ingress) @@ -1405,11 +1420,12 @@ performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing. Shaping The process of delaying packets before they go out to make traffic confirm -to a configured maximum rate. Shaping is performed on egress. +to a configured maximum rate. Shaping is performed on egress. Colloquially, +dropping packets to slow traffic down is also often called Shaping. Policing -The inverse of Shaping, which is performed on incoming traffic, on ingress. -In Linux, policing can only drop a packet and not delay it, as there is no -real 'ingress queue'. +Delaying or dropping packets in order to make traffic stay below a +configured bandwidth. In Linux, policing can only drop a packet and not +delay it - there is no 'ingress queue'. Work-Conserving A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to @@ -1426,24 +1442,32 @@ are. Userspace programs - ^ - | - +----------------+------------------------+ - | \|/ | - | _ IP Stack | - | /| | /-qdisc1-\ | - | / | Egress /--qdisc2--\ | - -> | Ingress / | Classifier---qdisc3---- | -> - | Classifier | /|\ \__qdisc4__/ | - | | \|/ | \-qdiscN_/ | - | +-> Forwarding | - | | - +-----------------------------------------+ - + ^ + | + +---------------+-----------------------------------------+ + | Y | + | -------> IP Stack | + | | | | + | | Y | + | | Y | + | ^ | | + | | / ----------> Forwarding -> | + | ^ / | | + | |/ Y | + | | | | + | ^ Y /-qdisc1-\ | + | | Egress /--qdisc2--\ | + --->->Ingress Classifier ---qdisc3---- | -> + | Qdisc \__qdisc4__/ | + | \-qdiscN_/ | + | | + +----------------------------------------------------------+ +Thanks to Jamal Hadi Salim for this ascii representation. + The big block represents the kernel. The leftmost arrow represents traffic entering your machine from the network. It is then fed to the Ingress -Classifier which may apply Filters to a packet, and decide to drop it. This +Qdisc which may apply Filters to a packet, and decide to drop it. This is called 'Policing'. This happens at a very early stage, before it has seen a lot of the kernel. @@ -1504,13 +1528,15 @@ your actual link speed. There is no queue to schedule then. The qdisc family: roots, handles, siblings and parents

-Each interface has a 'root qdisc', by default the earlier mentioned +Each interface has one egress 'root qdisc', by default the earlier mentioned classless pfifo_fast queueing discipline. Each qdisc can be assigned a handle, which can be used by later configuration statements to refer to that -qdisc. +qdisc. Besides an egress qdisc, an interface may also have an ingress, which +polices traffic coming in. -These handles consist of two parts, a major number and a minor number. It is -habitual to name the root qdisc '1:', which is equal to '1:0'. +The handles of these qdiscs consist of two parts, a major number and a minor +number. It is habitual to name the root qdisc '1:', which is equal to '1:0'. +The minor number of a qdisc is always 0. Classes need to have the same major number as their parent. How filters are used to classify traffic @@ -1539,7 +1565,7 @@ A packet might get classified in a chain like this: The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each chosing a -branch to take next. This can make sense. However, tnis is also possible: +branch to take next. This can make sense. However, this is also possible: 1: -> 12:2 @@ -1808,8 +1834,8 @@ priorities and that lower priority numbers will be polled before the higher priority ones. Each time a packet is requested by the hardware layer to be sent out to the -network, a weighted round robin process starts, beginning with the lower -priority classes. +network, a weighted round robin process ('WRR') starts, beginning with the +lower priority classes. These are then grouped and queried if they have data available. If so, it is returned. After a class has been allowed to dequeue a number of bytes, the @@ -1865,22 +1891,24 @@ reverse of 'bounded'. A typical situation might be where you have two agencies on your link which are both 'isolated' and 'bounded', which means that they are really limited -to their assigened rate, and also won't allow each other to borrow. +to their assigned rate, and also won't allow each other to borrow. Within such an agency class, there might be other classes which are allowed to swap bandwidth. Sample configuration

This configuration limits webserver traffic to 5mbit and smtp traffic to 3 -mbit, and limits the sum to 5mbit: +mbit. Together, they may not get more than 6mbit. We have a 100mbit NIC and +the classes may borrow bandwidth from each other. # tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 cell 8 # tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ - rate 5Mbit weight 0.5Mbit prio 8 allot 1514 cell 8 maxburst 20 \ - avpkt 1000 + rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ + avpkt 1000 bounded -This part installs the root and the customary 1:0 class. +This part installs the root and the customary 1:0 class. The 1:1 class is +bounded, so the total bandwidth can't exceed 6mbit. As said before, CBQ requires a *lot* of knobs. All parameters are explained above, however. The corresponding HTB configuration is lots simpler. @@ -1888,28 +1916,25 @@ above, however. The corresponding HTB configuration is lots simpler. # tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ - avpkt 1000 bounded + avpkt 1000 # tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ - avpkt 1000 bounded + avpkt 1000 These are our two classes. Note how we scale the weight with the configured -rate. Also note that both classes are bounded and won't therefore try to -borrow traffic. The classid's need to be within the same major number as the -parent CBQ, by the way! +rate. Both classes are not bounded, but they are connected to class 1:1 +which is bounded. So the sum of bandwith of the 2 classes will never be +more than 6mbit. The classid's need to be within the same major number as +the parent CBQ, by the way! -# tc qdisc add dev eth0 parent 1:3 tbf rate 5Mbit buffer 10Kb/8 limit \ - 15Kb mtu 1540 -# tc qdisc add dev eth0 parent 1:4 tbf rate 3Mbit buffer 10Kb/8 limit \ - 15Kb mtu 1540 +# tc qdisc add dev eth0 parent 1:3 handle 30: sfq +# tc qdisc add dev eth0 parent 1:4 handle 40: sfq -Here we install token bucket filters in the two configured classes. The -/8 corresponds to the cell size we mentioned earlier for CBQ. We create a -bucket of 10kbytes of tokens, a maximum 'pre-bucket' backlog of 15kbyte. - +Both classes have a FIFO qdisc by default. But we replaced these with an SFQ +queue so each flow of data is treated equally. # tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 80 0xffff flowid 1:3 @@ -1927,10 +1952,12 @@ You may wonder what happens to traffic that is not classified by any of the two rules. It appears that in this case, data will then be processed within 1:0, and be unlimited. -If smtp+web together try to exceed the set limit of 5mbit/s, bandwidth will +If smtp+web together try to exceed the set limit of 6mbit/s, bandwidth will be divided according to the weight parameter, giving 5/8 of traffic to the webserver and 3/8 to the mailserver. +With this configuratien you can also say that webserver traffic will always +get at minimum 5/8 * 6 mbit = 3.75 mbit. Other CBQ parameters: split & defmap

As said before, a classful qdisc needs to call filters to determine @@ -2063,11 +2090,11 @@ Functionally almost identical to the CBQ sample configuration above: # tc qdisc add dev eth0 root handle 1: htb default 30 -# tc class add dev eth0 parent 1: classid 1:1 htb rate 5mbit burst 15k +# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k # tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k -# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 5mbit burst 15k -# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 5mbit burst 15k +# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k +# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k The author then recommends SFQ for beneath these classes: @@ -2567,34 +2594,35 @@ TTL is the field starting just after 8-th byte of the IP header. FIXME: it has been pointed out that this syntax does not work currently. -Stuart DJ Lynne uses this to match ACKs: +Use this to match ACKs on packets smaller than 64 bytes: ## match acks the hard way, ## IP protocol 6, ## IP header length 0x5(32 bit words), -## IP Total length 0x34 +## IP Total length 0x34 (ACK + 12 bytes of TCP options) ## TCP ack set (bit 5, offset 33) # tc filter add dev ppp14 parent 1:0 protocol ip prio 10 u32 \ match ip protocol 6 0xff \ match u8 0x05 0x0f at 0 \ - match u8 0x34 0xff at 3 \ + match u16 0x0000 0xffc0 at 2 \ match u8 0x10 0xff at 33 \ flowid 1:3

-This rule will only match TCP packets with ACK bit set. Here we can see -an example of using two selectors, the final result will be logical AND -of their results. If we take a look at TCP header diagram, we can see -that the ACK bit is second older bit (0x10) in the 14-th byte of the TCP -header (at nexthdr+13). As for the second selector, if we'd like -to make our life harder, we could write match u8 0x06 0xff at 9 -instead of using the specific selector protocol tcp, because -6 is the number of TCP protocol, present in 10-th byte of the IP header. -On the other hand, in this example we couldn't use any specific selector -for the first match - simply because there's no specific selector to match -TCP ACK bits. + +This rule will only match TCP packets with ACK bit set, and no further +payload. Here we can see an example of using two selectors, the final result +will be logical AND of their results. If we take a look at TCP header +diagram, we can see that the ACK bit is second older bit (0x10) in the 14-th +byte of the TCP header (at nexthdr+13). As for the second +selector, if we'd like to make our life harder, we could write match u8 +0x06 0xff at 9 instead of using the specific selector protocol +tcp, because 6 is the number of TCP protocol, present in 10-th byte of +the IP header. On the other hand, in this example we couldn't use any +specific selector for the first match - simply because there's no specific +selector to match TCP ACK bits. Specific selectors

@@ -2618,6 +2646,8 @@ Some examples: flowid 1:4 +FIXME: tcp dst match does not work as described below: + The above rule will match packets which have the TOS field set to 0x10. The TOS field starts at second byte of the packet and is one byte big, so we could write an equivalent general selector: match u8 0x10 0xff @@ -3727,22 +3757,6 @@ relevant behavior for such a site is a central part of the WRR distribution. This section contains 'cookbook' entries which may help you solve problems. A cookbook is no replacement for understanding however, so try and comprehend what is going on. - Running multiple sites with different SLAs

@@ -3799,7 +3813,8 @@ FIXME: why no token bucket filter? is there a default pfifo_fast fallback somewhere? Protecting your host from SYN floods -

From Alexey's iproute documentation, adapted to netfilter and with more +

+From Alexey's iproute documentation, adapted to netfilter and with more plausible paths. If you use this, take care to adjust the numbers to reasonable values for your system. @@ -4335,6 +4350,299 @@ think that you know best, you can also do something like this: This sets the MSS of passing SYN packets to 128. Use this if you have VoIP with tiny packets, and huge http packets which are causing chopping in your voice calls. +The Ultimate Traffic Conditioner: Low Latency, Fast Up & Downloads +

+Note: This script has recently been upgraded and previously only worked for +Linux clients in your network! So you might want to update if you have +Windows machines or Macs in your network and noticed that they were not able +to download faster while others were uploading. + +I attempted to create the holy grail: + +Maintain low latency for interfactive traffic at all times +This means that downloading or uploading files should not disturb SSH or +even telnet. These are the most important things, even 200ms latency is +sluggish to work over. +Allow 'surfing' at reasonable speeds while up or downloading +Even though http is 'bulk' traffic, other traffic should not drown it out +too much. +Make sure uploads don't harm downloads, and the other way around +This is a much observed phenomenon where upstream traffic simply destroys +download speed. + +It turns out that all this is possible, at the cost of a tiny bit of +bandwidth. The reason that uploads, downloads and ssh hurt eachother is the +presence of large queues in many domestic access devices like cable or DSL +modems. + +The next section explains in depth what causes the delays, and how we can +fix them. You can safely skip it and head straight for the script if you +don't care how the magic is performed. +Why it doesn't work well by default +

+ISPs know that they are benchmarked solely on how fast people can download. +Besides available bandwidth, download speed is influenced heavily by packet +loss, which seriously hampers TCP/IP performance. Large queues can help +prevent packetloss, and speed up downloads. So ISPs configure large queues. + +These large queues however damage interactivity. A keystroke must first +travel the upstream queue, which may be seconds (!) long and go to your +remote host. It is then displayed, which leads to a packet coming back, which +must then traverse the downstream queue, located at your ISP, before it +appears on your screen. + +This HOWTO learns you how to mangle and process the queue in many ways, but +sadly, not all queues are accessible to us. The queue over at the ISP is +completely off-limits, whereas the upstream queue probably lives inside your +cable modem or DSL device. You may or may not be able to configure it. Most +probably not. + +So, what next? As we can't control either of those queues, they must be +eliminated, and moved to your Linux router. Luckily this is possible. + + +Limit upload speed +By limiting our upload speed to slightly less than the truly available rate, +no queues are built up in our modem. The queue is now moved to Linux. +Limit download speed +This is slightly trickier as we can't really influence how fast the internet +ships us data. We can however drop packets that are coming in too fast, +which causes TCP/IP to slow down to just the rate we want. Because we don't +want to drop traffic unnecessarily, we configure a 'burst' size we allow at +higher speed. + + +Now, once we have done this, we have eliminated the downstream queue totally +(except for short bursts), and gain the ability to manage the upstream queue +with all the power Linux offers. + +What remains to be done is to make sure interactive traffic jumps to the +front of the upstream queue. To make sure that uploads don't hurt downloads, +we also move ACK packets to the front of the queue. This is what normally +causes the huge slowdown observed when generating bulk traffic both ways. +The ACKnowledgements for downstream traffic must compete with upstream +traffic, and get delayed in the process. + +If we do all this we get the following measurements using an excellent ADSL +connection from xs4all in the Netherlands: + + +Baseline latency: +round-trip min/avg/max = 14.4/17.1/21.7 ms + +Without traffic conditioner, while downloading: +round-trip min/avg/max = 560.9/573.6/586.4 ms + +Without traffic conditioner, while uploading: +round-trip min/avg/max = 2041.4/2332.1/2427.6 ms + +With conditioner, during 220kbit/s upload: +round-trip min/avg/max = 15.7/51.8/79.9 ms + +With conditioner, during 850kbit/s download: +round-trip min/avg/max = 20.4/46.9/74.0 ms + +When uploading, downloads proceed at ~80% of the available speed. Uploads +at around 90%. Latency then jumps to 850 ms, still figuring out why. + + +What you can expect from this script depends a lot on your actual uplink +speed. When uploading at full speed, there will always be a single packet +ahead of your keystroke. That is the lower limit to the latency you can +achieve - divide your MTU by your upstream speed to calculate. Typical +values will be somewhat higher than that. Lower your MTU for better effects! + +Next, two versions of this script, one with Devik's excellent HTB, the other +with CBQ which is in each Linux kernel, unlike HTB. Both are tested and work +well. +The actual script (CBQ) +

+Works on all kernels. Within the CBQ +qdisc we place two Stochastic Fairness Queues that make sure that multiple +bulk streams don't drown each other out. + +Downstream traffic is policed using a tc filter containing a Token Bucket +Filter. + +You might improve on this script by adding 'bounded' to the line that starts +with 'tc class add .. classid 1:20'. If you lowered your MTU, also lower the +allot & avpkt numbers! + + +#!/bin/sh + +# The Ultimate Setup For Your Internet Connection At Home +# +# +# Set the following values to somewhat less than your actual download +# and uplink speed. In kilobits +DOWNLINK=800 +UPLINK=220 + +# clean existing down- and uplink qdiscs, hide errors +tc qdisc del dev ppp0 root 2> /dev/null > /dev/null +tc qdisc del dev ppp0 ingress 2> /dev/null > /dev/null + +###### uplink + +# install root CBQ + +tc qdisc add dev ppp0 root handle 1: cbq avpkt 1000 bandwidth 10mbit + +# shape everything at $UPLINK speed - this prevents huge queues in your +# DSL modem which destroy latency: +# main class + +tc class add dev ppp0 parent 1: classid 1:1 cbq rate ${UPLINK}kbit \ +allot 1500 prio 5 bounded isolated + +# high prio class 1:10: + +tc class add dev ppp0 parent 1:1 classid 1:10 cbq rate ${UPLINK}kbit \ + allot 1600 prio 1 avpkt 1000 + +# bulk and default class 1:20 - gets slightly less traffic, +# and a lower priority: + +tc class add dev ppp0 parent 1:1 classid 1:20 cbq rate $[9*$UPLINK/10]kbit \ + allot 1600 prio 2 avpkt 1000 + +# both get Stochastic Fairness: +tc qdisc add dev ppp0 parent 1:10 handle 10: sfq perturb 10 +tc qdisc add dev ppp0 parent 1:20 handle 20: sfq perturb 10 + +# start filters +# TOS Minimum Delay (ssh, NOT scp) in 1:10: +tc filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \ + match ip tos 0x10 0xff flowid 1:10 + +# ICMP (ip protocol 1) in the interactive class 1:10 so we +# can do measurements & impress our friends: +tc filter add dev ppp0 parent 1:0 protocol ip prio 11 u32 \ + match ip protocol 1 0xff flowid 1:10 + +# To speed up downloads while an upload is going on, put ACK packets in +# the interactive class: + +tc filter add dev ppp0 parent 1: protocol ip prio 12 u32 \ + match ip protocol 6 0xff \ + match u8 0x05 0x0f at 0 \ + match u16 0x0000 0xffc0 at 2 \ + match u8 0x10 0xff at 33 \ + flowid 1:10 + +# rest is 'non-interactive' ie 'bulk' and ends up in 1:20 + +tc filter add dev ppp0 parent 1: protocol ip prio 13 u32 \ + match ip dst 0.0.0.0/0 flowid 1:20 + +########## downlink ############# +# slow downloads down to somewhat less than the real speed to prevent +# queuing at our ISP. Tune to see how high you can set it. +# ISPs tend to have *huge* queues to make sure big downloads are fast +# +# attach ingress policer: + +tc qdisc add dev ppp0 handle ffff: ingress + +# filter *everything* to it (0.0.0.0/0), drop everything that's +# coming in too fast: + +tc filter add dev ppp0 parent ffff: protocol ip prio 50 u32 match ip src \ + 0.0.0.0/0 police rate ${DOWNLINK}kbit burst 10k drop flowid :1 + +If you want this script to be run by ppp on connect, copy it to +/etc/ppp/ip-up.d. + +If the last two lines give an error, update your tc tool to a newer version! +The actual script (HTB) +

+The following script achieves all goals using the wonderful HTB queue, see +the relevant chapter. Well worth patching your kernel for! + +#!/bin/sh + +# The Ultimate Setup For Your Internet Connection At Home +# +# +# Set the following values to somewhat less than your actual download +# and uplink speed. In kilobits +DOWNLINK=800 +UPLINK=220 +DEV=ppp0 + +# clean existing down- and uplink qdiscs, hide errors +tc qdisc del dev $DEV root 2> /dev/null > /dev/null +tc qdisc del dev $DEV ingress 2> /dev/null > /dev/null + +###### uplink + +# install root HTB, point default traffic to 1:20: + +tc qdisc add dev $DEV root handle 1: htb default 20 + +# shape everything at $UPLINK speed - this prevents huge queues in your +# DSL modem which destroy latency: + +tc class add dev $DEV parent 1: classid 1:1 htb rate ${UPLINK}kbit burst 6k + +# high prio class 1:10: + +tc class add dev $DEV parent 1:1 classid 1:10 htb rate ${UPLINK}kbit \ + burst 6k prio 1 + +# bulk & default class 1:20 - gets slightly less traffic, +# and a lower priority: + +tc class add dev $DEV parent 1:1 classid 1:20 htb rate $[9*$UPLINK/10]kbit \ + burst 6k prio 2 + +# both get Stochastic Fairness: +tc qdisc add dev $DEV parent 1:10 handle 10: sfq perturb 10 +tc qdisc add dev $DEV parent 1:20 handle 20: sfq perturb 10 + +# TOS Minimum Delay (ssh, NOT scp) in 1:10: +tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \ + match ip tos 0x10 0xff flowid 1:10 + +# ICMP (ip protocol 1) in the interactive class 1:10 so we +# can do measurements & impress our friends: +tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \ + match ip protocol 1 0xff flowid 1:10 + +# To speed up downloads while an upload is going on, put ACK packets in +# the interactive class: + +tc filter add dev $DEV parent 1: protocol ip prio 10 u32 \ + match ip protocol 6 0xff \ + match u8 0x05 0x0f at 0 \ + match u16 0x0000 0xffc0 at 2 \ + match u8 0x10 0xff at 33 \ + flowid 1:10 + +# rest is 'non-interactive' ie 'bulk' and ends up in 1:20 + + +########## downlink ############# +# slow downloads down to somewhat less than the real speed to prevent +# queuing at our ISP. Tune to see how high you can set it. +# ISPs tend to have *huge* queues to make sure big downloads are fast +# +# attach ingress policer: + +tc qdisc add dev $DEV handle ffff: ingress + +# filter *everything* to it (0.0.0.0/0), drop everything that's +# coming in too fast: + +tc filter add dev $DEV parent ffff: protocol ip prio 50 u32 match ip src \ + 0.0.0.0/0 police rate ${DOWNLINK}kbit burst 10k drop flowid :1 + + +If you want this script to be run by ppp on connect, copy it to +/etc/ppp/ip-up.d. + +If the last two lines give an error, update your tc tool to a newer version! Building bridges, and pseudo-bridges with Proxy ARP

Bridges are devices which can be installed in a network without any @@ -4725,6 +5033,7 @@ helping. Nadeem Hasan <nhasan@usa.net> Vik Heyndrickx <vik.heyndrickx@edchq.com> Koos van den Hout <koos@kzdoos.xs4all.nl> +Gareth John <gdjohn%zepler.org> Martin Josefsson <gandalf%wlug.westbo.se> Andi Kleen <ak%suse.de> Pawel Krawczyk <kravietz%alfa.ceti.pl> @@ -4743,6 +5052,7 @@ helping. Ram Narula <ram@princess1.net> Jorge Novo <jnovo@educanet.net> Patrik <ph@kurd.nu> +Lutz Preßler <Lutz.Pressler%SerNet.DE> Jason Pyeron <jason%pyeron.com> Rusty Russell <rusty%rustcorp.com.au> Jamal Hadi Salim <hadi%cyberus.ca> @@ -4756,7 +5066,6 @@ helping. Charles Tassell <ctassell%isn.net> Glen Turner <glen.turner%aarnet.edu.au> Song Wang <wsong@ece.uci.edu> - diff --git a/index.php3 b/index.php3 index d9700c4..a42a9f1 100755 --- a/index.php3 +++ b/index.php3 @@ -26,7 +26,8 @@ Paul B Schroeder <paulsch@us.ibm.com>
(mailing list/archive, the only place to send questions!)
-#lartc on irc.openprojects.net +#lartc on irc.openprojects.net (archives) @@ -58,13 +59,18 @@ This site attempts to document how to configure and use these features.

News

+ + - +
2001-12-10Added The Wonder +Shaper, which allows you to retain very low latency while doing very +fast up- and downloads. You can even do both at the same time, but then +latency suffers. Added logs +for the irc channel which is already seeing some traffic.
2001-12-10The manpages now include a huge CBQ page. Read it and weep.
2001-12-09We now have an IRC channel, #lartc on irc.openprojects.net. Join #lartc to chat about Linux & Routing & Shaping!
2001-12-08Started work on the manpages for tc and everything related to it. Some -interesting material is already there. If you can help, please do, it is -very hard work.
2001-12-06Finished documenting policing filters, added short piece on Generic Random Early Detection queueing. diff --git a/manpages/Makefile b/manpages/Makefile index 29d1aa6..72352b7 100644 --- a/manpages/Makefile +++ b/manpages/Makefile @@ -1,6 +1,9 @@ -all: tc.txt tc.dvi tc.pdf tc.ps tc.ps.gz \ - tc-sfq.txt tc-sfq.dvi tc-sfq.pdf tc-sfq.ps tc-sfq.ps.gz \ - tc-tbf.txt tc-tbf.dvi tc-tbf.pdf tc-tbf.ps tc-tbf.ps.gz +all: tc.txt tc.dvi tc.pdf tc.ps tc.ps.gz tc.html \ + tc-sfq.txt tc-sfq.dvi tc-sfq.pdf tc-sfq.ps tc-sfq.ps.gz tc-sfq.html \ + tc-tbf.txt tc-tbf.dvi tc-tbf.pdf tc-tbf.ps tc-tbf.ps.gz tc-tbf.html \ + tc-prio.txt tc-prio.dvi tc-prio.pdf tc-prio.ps tc-prio.ps.gz tc-prio.html \ + tc-red.txt tc-red.dvi tc-red.pdf tc-red.ps tc-red.ps.gz tc-red.html \ + tc-cbq.txt tc-cbq.dvi tc-cbq.pdf tc-cbq.ps tc-cbq.ps.gz tc-cbq.html clean: rm *.txt *.dvi *.pdf *.ps *.ps.gz *~ @@ -11,6 +14,10 @@ clean: %.dvi: %.8 man -l -Tdvi $< > $@ +%.html: %.8 + groff -man -mhtml - -Thtml < ./$< > $@ + + %.pdf: %.dvi dvipdfm $< diff --git a/manpages/index.php3 b/manpages/index.php3 index 0f93564..86fac54 100755 --- a/manpages/index.php3 +++ b/manpages/index.php3 @@ -42,6 +42,7 @@ I'm attempting, with your help, to write manpages for tc. These pages will complement the HOWTO and are intented to be donated to Alexey, the tc author, for inclusion in the distribution.

+HTML output is somewhat shoddy in places, but looks nice: "); if($there) { print(""); + print(""); print(""); print(""); print(""); print(""); } - else print(""); + else print(""); print(""); } manpage("tc","The main command",1); manpage("tc-filter","tc filters in depth",0); -manpage("tc-cbq","The Class Based Queueing qdisc"); +manpage("tc-cbq","The Class Based Queueing qdisc",1); manpage("tc-dsmark","The DiffServ qdisc"); manpage("tc-htb","The Hierarchy Token Bucket qdisc"); manpage("tc-sfq","Stochastic Fairness Queueing",1); -manpage("tc-red","Random Early Detection"); +manpage("tc-red","Random Early Detection",1); manpage("tc-tbf","Token Bucket Filter",1); manpage("tc-pfifo","Packet limited First In First Out"); manpage("tc-bfifo","Byte limited First In First Out"); +manpage("tc-prio","N-band classful scheduler",1); manpage("tc-pfifo_fast","Default three-band scheduler"); ?>
$name$desctxthtmlmanpdfps/ps.gzdviforthcomingforthcoming
diff --git a/manpages/tc-cbq.8 b/manpages/tc-cbq.8 index 1409e56..0a227b5 100644 --- a/manpages/tc-cbq.8 +++ b/manpages/tc-cbq.8 @@ -1,4 +1,4 @@ -.TH CBQ 8 "8 December 2001" "iproute2" "Linux" +.TH CBQ 8 "16 December 2001" "iproute2" "Linux" .SH NAME CBQ \- Class Based Queueing .SH SYNOPSIS @@ -147,7 +147,7 @@ prevented from doing so by declaring it 'bounded'. A class can also indicate its unwillingness to lend out bandwidth by being 'isolated'. .SH QDISC -The root qdisc of a CBQ class tree has the following parameters: +The root of a CBQ qdisc class tree has the following parameters: .TP parent major:minor | root @@ -157,7 +157,8 @@ of an interface or within an existing class. .TP handle major: Like all other qdiscs, the CBQ can be assigned a handle. Should consist only -of a major number, followed by a colon. Optional. +of a major number, followed by a colon. Optional, but very useful if classes +will be generated within this qdisc. .TP allot bytes This allotment is the 'chunkiness' of link sharing and is used for determining packet @@ -215,15 +216,19 @@ of this class is maximal, in which case it is set to 1. allot bytes Allot specifies how many bytes a qdisc can dequeue during each round of the process. This parameter is weighted using the -renormalized class weight described above. Is silently capped at a -minimum of 3/2 of avpkt. Mandatory. +renormalized class weight described above. Silently capped at a minimum of +3/2 avpkt. Mandatory. .TP -priority priority +prio priority In the round-robin process, classes with the lowest priority field are tried for packets first. Mandatory. .TP +avpkt +See the QDISC section. + +.TP rate rate Maximum rate this class and all its children combined can send at. Mandatory. diff --git a/manpages/tc-red.8 b/manpages/tc-red.8 new file mode 100644 index 0000000..a76584a --- /dev/null +++ b/manpages/tc-red.8 @@ -0,0 +1,131 @@ +.TH RED 8 "13 December 2001" "iproute2" "Linux" +.SH NAME +red \- Random Early Detection +.SH SYNOPSIS +.B tc qdisc ... red +.B limit +bytes +.B min +bytes +.B max +bytes +.B avpkt +bytes +.B burst +packets +.B [ ecn ] [ bandwidth +rate +.B ] probability +chance + +.SH DESCRIPTION +Random Early Detection is a classless qdisc which manages its queue size +smartly. Regular queues simply drop packets from the tail when they are +full, which may not be the optimal behaviour. RED also performs tail drop, +but does so in a more gradual way. + +Once the queue hits a certain average length, packets enqueued have a +configurable chance of being marked (which may mean dropped). This chance +increases linearly up to a point called the +.B max +average queue length, although the queue might get bigger. + +This has a host of benefits over simple taildrop, while not being processor +intensive. It prevents synchronous retransmits after a burst in traffic, +which cause further retransmits, etc. + +The goal is the have a small queue size, which is good for interactivity +while not disturbing TCP/IP traffic with too many sudden drops after a burst +of traffic. + +Depending on wether ECN is configured, marking either means dropping or +purely marking a packet as overlimit. +.SH ALGORITHM +The average queue size is used for determining the marking +probability. This is calculated using an Exponential Weighted Moving +Average, which can be more or less sensitive to bursts. + +When the average queue size is below +.B min +bytes, no packet will ever be marked. When it exceeds +.B min, +the probability of doing so climbs linearly up +to +.B probability, +until the average queue size hits +.B max +bytes. Because +.B probability +is normally not set to 100%, the queue size might +conceivably rise above +.B max +bytes, so the +.B limit +parameter is provided to set a hard maximum for the size of the queue. + +.SH PARAMETERS +.TP +min +Average queue size at which marking becomes a possibility. +.TP +max +At this average queue size, the marking probability is maximal. Should be at +least twice +.B min +to prevent synchronous retransmits, higher for low +.B min. +.TP +probability +Maximum probability for marking, specified as a floating point +number from 0.0 to 1.0. Suggested values are 0.01 or 0.02 (1 or 2%, +respectively). +.TP +limit +Hard limit on the real (not average) queue size in bytes. Further packets +are dropped. Should be set higher than max+burst. It is advised to set this +a few times higher than +.B max. +.TP +burst +Used for determining how fast the average queue size is influenced by the +real queue size. Larger values make the calculation more sluggish, allowing +longer bursts of traffic before marking starts. Real life experiments +support the following guideline: (min+min+max)/(3*avpkt). +.TP +avpkt +Specified in bytes. Used with burst to determine the time constant for +average queue size calculations. 1000 is a good value. +.TP +bandwidth +This rate is used for calculating the average queue size after some +idle time. Should be set to the bandwidth of your interface. Does not mean +that RED will shape for you! Optional. +.TP +ecn +As mentioned before, RED can either 'mark' or 'drop'. Explicit Congestion +Notification allows RED to notify remote hosts that their rate exceeds the +amount of bandwidth available. Non-ECN capable hosts can only be notified by +dropping a packet. If this parameter is specified, packets which indicate +that their hosts honor ECN will only be marked and not dropped, unless the +queue size hits +.B limit +bytes. Needs a tc binary with RED support compiled in. Recommended. + +.SH SEE ALSO +.BR tc (8) + +.SH SOURCES +.TP +o +Floyd, S., and Jacobson, V., Random Early Detection gateways for +Congestion Avoidance. http://www.aciri.org/floyd/papers/red/red.html +.TP +o +Some changes to the algorithm by Alexey N. Kuznetsov. + +.SH AUTHORS +Alexey N. Kuznetsov, , Alexey Makarenko +, J Hadi Salim . +This manpage maintained by bert hubert + + diff --git a/manpages/tc-tbf.8 b/manpages/tc-tbf.8 index 1d21764..1191e84 100644 --- a/manpages/tc-tbf.8 +++ b/manpages/tc-tbf.8 @@ -1,4 +1,4 @@ -.TH TC 8 "8 December 2001" "iproute2" "Linux" +.TH TC 8 "13 December 2001" "iproute2" "Linux" .SH NAME tbf \- Token Bucket Filter .SH SYNOPSIS @@ -65,15 +65,20 @@ on HZ as 1/HZ. For perfect shaping, only a single packet can get sent per jiffy packets of on average 1000 bytes each, which roughly corresponds to 1mbit/s. .SH PARAMETERS +See +.BR tc (8) +for how to specify the units of these values. .TP limit or latency -Limit is the number of bytes that can be queued waiting for tokens to become available. You can also specify this -the other way around by setting the latency parameter, which specifies the maximum amount of time a packet can sit -in the TBF. The latter calculation takes into account the size of the bucket, the rate and possibly the peakrate (if set). - -These two parameters are mutually exclusive. +Limit is the number of bytes that can be queued waiting for tokens to become +available. You can also specify this the other way around by setting the +latency parameter, which specifies the maximum amount of time a packet can +sit in the TBF. The latter calculation takes into account the size of the +bucket, the rate and possibly the peakrate (if set). These two parameters +are mutually exclusive. .TP -burst (also known as buffer or maxburst). +burst +Also known as buffer or maxburst. Size of the bucket, in bytes. This is the maximum amount of bytes that tokens can be available for instantaneously. In general, larger shaping rates require a larger buffer. For 10mbit/s on Intel, you need at least 10kbyte buffer if you want to reach your configured rate! @@ -83,18 +88,20 @@ The minimum buffer size can be calculated by dividing the rate by HZ. Token usage calculations are performed using a table which by default has a resolution of 8 packets. This resolution can be changed by specifying the -.B -cell -size with the burst. For example, to specify a 6000 byte buffer with a 16 byte cell size, set a burst of 6000/16. You will -probably never have to set this. Must be an integral power of 2. +.B cell +size with the burst. For example, to specify a 6000 byte buffer with a 16 +byte cell size, set a burst of 6000/16. You will probably never have to set +this. Must be an integral power of 2. .TP mpu A zero-sized packet does not use zero bandwidth. For ethernet, no packet uses less than 64 bytes. The Minimum Packet Unit -determines the minimal token usage for a packet. Defaults to zero. +determines the minimal token usage (specified in bytes) for a packet. Defaults to zero. .TP rate -The speedknob. See remarks above about limits! -.P +The speedknob. See remarks above about limits! See +.BR tc (8) +for units. +.PP Furthermore, if a peakrate is desired, the following parameters are available: .TP diff --git a/manpages/tc.8 b/manpages/tc.8 index f1a4c29..b24e22e 100644 --- a/manpages/tc.8 +++ b/manpages/tc.8 @@ -1,4 +1,4 @@ -.TH TC 8 "8 December 2001" "iproute2" "Linux" +.TH TC 8 "16 December 2001" "iproute2" "Linux" .SH NAME tc \- show / manipulate traffic control settings .SH SYNOPSIS @@ -244,6 +244,53 @@ FILTERS Filters have a three part ID, which is only needed when using a hashed filter hierarchy, for which see .BR tc-filters (8). +.SH UNITS +All parameters accept a floating point number, possibly followed by a unit. +.P +Bandwidths or rates can be specified in: +.TP +kbps +Kilobytes per second +.TP +mbps +Megabytes per second +.TP +kbit +Kilobits per second +.TP +mbit +Megabits per second +.TP +bps or a bare number +Bits per second +.P +Amounts of data can be specified in: +.TP +kb or k +Kilobytes +.TP +mb or m +Megabytes +.TP +mbit +Megabits +.TP +kbit +Kilobits +.TP +b or a bare number +Bytes. +.P +Lengths of time can be specified in: +.TP +s, sec or secs +Whole seconds +.TP +ms, msec or msecs +Milliseconds +.TP +us, usec, usecs or a bare number +Microseconds. .SH TC COMMANDS The following commands are available for qdiscs, classes and filter: -- 2.11.4.GIT