patch #7219
[mldonkey.git] / docs / BitTorrent.html
blob0ebd1a32350341fe752b58f033f6077d39acd1c3
1 <html><head><title>BitTorrent Protocol Documentation</title></head><body><center><table border="1" cellpadding="10"><tbody><tr>
2 <td><a href="http://bitconjurer.org/BitTorrent/index.html">BitTorrent</a></td>
3 <td><a href="http://bitconjurer.org/BitTorrent/download.html">download</a></td>
4 <td><a href="http://bitconjurer.org/BitTorrent/FAQ.html">FAQ</a></td>
5 <td><a href="http://bitconjurer.org/BitTorrent/doc.html">documentation</a></td>
6 <td><a href="http://bitconjurer.org/BitTorrent/press.html">press</a></td>
7 <td><a href="http://bitconjurer.org/BitTorrent/donate.html">Donate!</a></td>
8 </tr></tbody></table></center><p> BitTorrent is a protocol for distributing
9 files. It identifies content by url and is designed to integrate seamlessly
10 with the web. Its advantage over plain http is that when multiple downloads
11 of the same file happen concurrently, the downloaders upload to each other,
12 making it possible for the file source to support very large numbers of downloaders
13 with only a modest increase in its load.</p><p>
15 The life cycle of a BitTorrent file distribution.</p><p>
17 A BitTorrent file distribution consists of these entities -</p><p>
19 </p><ul>
20 <li>An ordinary web server
21 </li><li>A static 'metainfo' file
22 </li><li>A BitTorrent tracker
23 </li><li>An 'origin' downloader
24 </li><li>The end user web browsers
25 </li><li>The end user downloaders
26 </li></ul><p>
28 There are ideally many end users for a single file.</p><p>
30 To start serving, a host goes through the following steps -</p><p>
32 </p><ol>
33 <li>Start running a tracker (or, more likely, have one running already).
34 </li><li>Start running an ordinary web server, such as apache, or have one already.
35 </li><li>Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so already).
36 </li><li>Generate a metainfo file using the complete file to be served and the url of the tracker.
37 </li><li>Put the metainfo file on the web server.
38 </li><li>Link to the metainfo file from some other web page.
39 </li><li>Start a downloader which already has the complete file (the 'origin').
40 </li></ol><p>
42 To start downloading, a user does the following -</p><p>
44 </p><ol>
45 <li>Run a BitTorrent installer (or have done so already).
46 </li><li>Web surf.
47 </li><li>Click on a link to a .torrent file.
48 </li><li>Select where to save the file locally, or select a partial download to resume.
49 </li><li>Wait for download to complete.
50 </li><li>Tell downloader to exit (it keeps uploading until this happens).
51 </li></ol><p>
53 The connectivity is as follows -</p><p>
55 </p><ul>
56 <li>The web site is serving up static files as normal, but kicking off the BitTorrent helper app on the clients.
57 </li><li>The tracker is receiving information from all downloaders and giving
58 them random lists of peers. This is done over http or https. </li><li>Downloaders are periodically checking in with the tracker to keep
59 it informed of their progress, and are uploading to and downloading from
60 each other via direct connections. These connections use the BitTorrent peer
61 protocol, which operates over TCP. </li><li>The origin is uploading but not downloading at all, since it has
62 the entire file. The origin is necessary to get the entire file into the
63 network. Often for popular downloads the origin can be taken down after a
64 while since several downloads may have completed and been left running indefinitely.
65 </li></ul><p> Metainfo file and tracker responses are both sent in a simple,
66 efficient, and extensible format called bencoding (pronounced 'bee encoding').
67 Bencoded messages are nested dictionaries and lists, which can contain strings
68 and integers. Extensibility is supported by ignoring unexpected dictionary
69 keys, so additional optional ones can be added later.</p><p>
71 Bencoding is done as follows -</p><p>
73 </p><ul>
74 <li>Strings are length-prefixed base ten followed by a colon and the string. For example '4:spam' corresponds to 'spam'.
75 </li><li>Integers are represented by an 'i' followed by the number in base
76 10 followed by an 'e'. For example 'i3e' corresponds to 3 and 'i-3e' corresponds
77 to -3. Integers have no size limitation. 'i-0e' is invalid. All encodings
78 with a leading zero, such as 'i03e', are invalid, other than 'i0e', which
79 of course corresponds to 0. </li><li>Lists are encoded as an 'l' followed by their elements (also bencoded)
80 followed by an 'e'. For example 'l4:spam4:eggse' corresponds to ['spam',
81 'eggs']. </li><li>Dictionaries are encoded as a 'd' followed by a list of alternating
82 keys and their corresponding values followed by an 'e'. For example, 'd3:cow3:moo4:spam4:eggse'
83 corresponds to {'cow': 'moo', 'spam': 'eggs'} and 'd4:spaml1:a1:bee' corresponds
84 to {'spam': ['a', 'b']} . Keys must be strings and appear in sorted order
85 (sorted as raw strings, not alphanumerics). </li></ul><p>
87 Metainfo files are bencoded dictionaries with the following keys -</p><p>
89 </p><dl><dt>'announce'</dt><dd>
90 The url of the tracker.<p>
92 </p></dd><dt>'info'</dt><dd>
93 This maps to a dictionary, with keys described below.<p>
95 The 'name' key maps to a string which is the suggested name to save the file (or directory) as. It is purely advisory. </p><p>
96 'piece length' maps to the number of bytes in each piece the file is split
97 into. For the purposes of transfer, files are split into fixed-size pieces
98 which are all the same length except for possibly the last one which may
99 be truncated. Piece length is almost always a power of two, most commonly
100 2<sup>20</sup>. </p><p>
101 'pieces' maps to a string whose length is a multiple of 20. It is to be
102 subdivided into strings of length 20, each of which is the sha1 hash of the
103 piece at the corresponding index.</p><p>
104 There is also a key 'length' or a key 'files', but not both or neither.
105 If 'length' is present then the download represents a single file, otherwise
106 it represents a set of files which go in a directory structure.</p><p>
108 In the single file case, 'length' maps to the length of the file in bytes.</p><p>
109 For the purposes of the other keys, the multi-file case is treated as only
110 having a single file by concatenating the files in the order they appear
111 in the files list. The files list is the value 'files' maps to, and is a
112 list of dictionaries containing the following keys -</p><p>
114 </p></dd><dl><dt>'length'
115 </dt><dd>The length of the file, in bytes.
116 </dd><dt>'path'
117 </dt><dd>A list of strings corresponding to subdirectory names, the last
118 of which is the actual file name (a zero length list is an error case). </dd></dl><p>
120 In the single file case, the 'name' key is the name of a file, in the muliple file case, it's the name of a directory.</p><p>
123 </p></dl> Tracker queries are two way. The tracker receives information
124 via GET parameters and returns a bencoded message. Note that although the
125 current tracker implementation has its own web server, the tracker could
126 run very nicely as, for example, an apache module.<p>
128 Tracker GET requests have the following keys -</p><p>
129 </p><dl><dt>'info_hash'</dt><dd> The 20 byte sha1 hash of the bencoded form
130 of the 'info' value from the metainfo file. Note that this is a substring
131 of the metainfo file. This value will almost certainly have to be escaped.<p>
133 </p></dd><dt>'peer_id'</dt><dd> A string of length 20 which this downloader
134 uses as its id. Each downloader generates its own id at random at the start
135 of a new download. This value will also almost certainly have to be escaped.<p>
137 </p></dd><dt>'ip'</dt><dd> An optional parameter giving the ip (or dns name)
138 which this peer is at. Generally used for the origin if it's on the same
139 machine as the tracker.<p>
141 </p></dd><dt>'port'</dt><dd> The port number this peer is listening on. Common
142 behavior is for a downloader to try to listen on port 6881 and if that port
143 is taken try 6882, then 6883, etc. and give up after 6889.<p>
145 </p></dd><dt>'uploaded'</dt><dd>
146 The total amount uploaded so far, encoded in base ten ascii.<p>
148 </p></dd><dt>'downloaded'</dt><dd>
149 The total amount downloaded so far, encoded in base ten ascii.<p>
151 </p></dd><dt>'left'</dt><dd> The number of bytes this peer still has to download,
152 encoded in base ten ascii. Note that this can't be computed from downloaded
153 and the file length since it might be a resume, and there's a chance that
154 some of the downloaded data failed an integrity check and had to be re-downloaded.<p>
156 </p></dd><dt>'event'</dt><dd> This is an optional key which maps to 'started',
157 'completed', or 'stopped' (or '', which is the same as not being present).
158 If not present, this is one of the announcements done at regular intervals.
159 An announcement using 'started' is sent when a download first begins, and
160 one using 'completed' is sent when the download is complete. No 'completed'
161 is sent if the file was complete when started. Downloaders send an announcement
162 using 'stopped' when they cease downloading.<p>
164 </p></dd></dl> Tracker responses are bencoded dictionaries. If a tracker
165 response has a key 'failure reason', then that maps to a human readable string
166 which explains why the query failed, and no other keys are required. Otherwise,
167 it must have two keys - 'interval', which maps to the number of seconds the
168 downloader should wait between regular rerequests, and 'peers'. 'peers' maps
169 to a list of dictionaries corresponding to peers, each of which contains
170 the keys 'peer id', 'ip', and 'port', which map to the peer's self-selected
171 id, ip address or dns name as a string, and port number, respectively. Note
172 that downloaders may rerequest on nonscheduled times if an event happens
173 or they need more peers.<p>
175 If you want to make any extensions to metainfo files or tracker queries, please coordinate with <a href="mailto:bram@bitconjurer.org">Bram Cohen</a> to make sure that all extensions are done compatibly.</p><p>
177 BitTorrent's peer protocol operates over TCP. It performs efficiently without setting any socket options.</p><p>
179 Peer connections are symmetrical. Messages sent in both directions look the same, and data can flow in either direction.</p><p>
180 The peer protocol refers to pieces of the file by index as described in
181 the metainfo file, starting at zero. When a peer finishes downloading a piece
182 and checks that the hash matches, it announces that it has that piece to
183 all of its peers.</p><p>
184 Connections contain two bits of state on either end - choked or not, and
185 interested or not. Choking is a notification that no data will be sent until
186 unchoking happens. The reasoning and common techniques behind choking are
187 explained later in this document.</p><p>
188 Data transfer takes place whenever one side is interested and the other
189 side is not choking. Interest state must be kept up to date at all times
190 - whenever a downloader doesn't have something they currently would ask a
191 peer for in unchoked, they must express lack of interest, despite being choked.
192 Implementing this properly is tricky, but makes it possible for downloaders
193 to know which peers will start downloading immediately if unchoked.</p><p>
195 Connections start out choked and not interested.</p><p>
196 When data is being transferred, downloaders should keep several piece requests
197 queued up at once in order to get good TCP performance (this is called 'pipelining'.)
198 On the other side, requests which can't be written out to the TCP buffer
199 immediately should be queued up in memory rather than kept in an application-level
200 network buffer, so they can all be thrown out when a choke happens.</p><p>
201 The peer wire protocol consists of a handshake followed by a never-ending
202 stream of length-prefixed messages. The handshake starts with character ninteen
203 followed by the string 'BitTorrent protocol'. The leading character is a
204 length prefix, put there in the hope that other new protocols may do the
205 same and thus be trivially distinguishable from each other.</p><p>
207 All later integers sent in the protocol are encoded as four bytes big-endian.</p><p>
208 After the fixed headers come eight reserved bytes, which are all zero in
209 all current implementations. If you wish to extend the protocol using these
210 bytes, please coordinate with <a href="mailto:bram@bitconjurer.org">Bram Cohen</a> to make sure all extensions are done compatibly.</p><p>
211 Next comes the 20 byte sha1 hash of the bencoded form of the 'info' value
212 from the metainfo file. (This is the same value which is announced as info_hash
213 to the tracker, only here it's raw instead of quoted here). If both sides
214 don't send the same value, they sever the connection. The one possible exception
215 is if a downloader wants to do multiple downloads over a single port, they
216 may wait for incoming connections to give a download hash first, and respond
217 with the same one if it's in their list.</p><p>
218 After the download hash comes the 20-byte peer id which is reported in tracker
219 requests and contained in peer lists in tracker responses. If the receiving
220 side's peer id doesn't match the one the initiating side expects, it severs
221 the connection.</p><p>
222 That's it for handshaking, next comes an alternating stream of length prefixes
223 and messages. Messages of length zero are keepalives, and ignored. Keepalives
224 are generally sent once every two minutes, but note that timeouts can be
225 done much more quickly when data is expected.</p><p>
227 All non-keepalive messages start with a single byte which gives their type. The possible values are -</p><p>
228 </p><ul>
229 <li>0 - choke
230 </li><li>1 - unchoke
231 </li><li>2 - interested
232 </li><li>3 - not interested
233 </li><li>4 - have
234 </li><li>5 - bitfield
235 </li><li>6 - request
236 </li><li>7 - piece
237 </li><li>8 - cancel
238 </li></ul><p>
240 Choke, unchoke, interested, and not interested have no payload.</p><p>
241 Bitfield is only ever sent as the first message. Its payload is a bitfield
242 with each index that downloader has sent set to one and the rest set to zero.
243 Downloaders which don't have anything yet may skip the bitfield message.
244 The first byte of the bitfield corresponds to indices 0-7 from high bit to
245 low bit, respectively. The next one 8-15, etc. Spare bits at the end are
246 set to zero.</p><p>
248 The have message's payload is a single number, the index which that downloader just completed and checked the hash of.</p><p>
249 Request messages contain an index, begin, and length. The last two are byte
250 offsets. Length is generally a power of two unless it gets truncated by the
251 end of the file. All current implementations use 2<sup>15</sup>, and close connections which request an amount greater than 2<sup>17</sup>.</p><p>
252 Cancel messages have the same payload as request messages. They are generally
253 only sent towards the end of a download, during what's called 'endgame mode'.
254 When a download is almost complete, there's a tendency for the last few pieces
255 to all be downloaded off a single hosed modem line, taking a very long time.
256 To make sure the last few pieces come in quickly, once requests for all pieces
257 a given downloader doesn't have yet are currently pending, it sends requests
258 for everything to everyone it's downloading from. To keep this from becoming
259 horribly inefficient, it sends cancels to everyone else every time a piece
260 arrives.</p><p>
261 Piece messages contain an index, begin, and piece. Note that they are correlated
262 with request messages implicitly. It's possible for an unexpected piece to
263 arrive if choke and unchoke messages are sent in quick succession and/or
264 transfer is going very slowly.</p><p>
265 Downloaders generally download pieces in random order, which does a reasonably
266 good job of keeping them from having a strict subset or superset of the pieces
267 of any of their peers.</p><p>
268 Choking is done for several reasons. TCP congestion control behaves very
269 poorly when sending over many connections at once. Also, choking lets each
270 peer use a tit-for-tat-ish algorithm to ensure that they get a consistent
271 download rate.</p><p>
272 The choking algorithm described below is the currently deployed one. It
273 is very important that all new algorithms work well both in a network consisting
274 entirely of themselves and in a network consisting mostly of this one.</p><p>
275 There are several criteria a good choking algorithm should meet. It should
276 cap the number of simultaneous uploads for good TCP performance. It should
277 avoid choking and unchoking quickly, known as fibrillation. It should reciprocate
278 to peers who let it download. Finally, it should try out unused connections
279 once in a while to find out if they might be better than the currently used
280 ones, known as optimistic unchoking.</p><p>
281 The currently deployed choking algorithm avoids fibrillation by only changing
282 who's choked once every ten seconds. It does reciprocation and number of
283 uploads capping by unchoking the four peers which it has the best download
284 rates from and are interested. Peers which have a better upload rate but
285 aren't interested get unchoked and if they become interested the worst uploader
286 gets choked. If a downloader has a complete file, it uses its upload rate
287 rather than its download rate to decide who to unchoke.</p><p>
288 For optimistic unchoking, at any one time there is a single peer which is
289 unchoked regardless of it's upload rate (if interested, it counts as one
290 of the four allowed downloaders.) Which peer is optimistically unchoked rotates
291 every 30 seconds. To give them a decent chance of getting a complete piece
292 to upload, new connections are three times as likely to start as the current
293 optimistic unchoke as anywhere else in the rotation.</p><p></p></body></html>