Removed old template, fixed doc files
[ana-net.git] / doc / misc / 1 / main.tex
blob350d307ce361b980c7e1e4f74cf18c38bd8ef652
1 \documentclass[times,10pt,twocolumn]{article}
2 \usepackage{main}
3 \usepackage{times}
4 \usepackage[english]{babel}
5 \usepackage{hyperref}
6 \usepackage{listings}
7 \usepackage[utf8]{inputenc}
8 \usepackage{latexsym}
9 \pagestyle{empty}
11 \parindent0em
13 \begin{document}
14 \title{
15 Ideas on context-oriented networking on LANA\\\smallskip [DRAFT]
18 \author{
19 Daniel Borkmann\\
20 dborkma@tik.ee.ethz.ch\\
23 \maketitle
24 \thispagestyle{empty}
26 \begin{abstract}
27 This document includes notes about possible context-oriented networking
28 LANA (Lightweight Autonomic Network Architecture) regarding API design,
29 addressing, routing, stack setup and more; rather loosely organized and
30 (for now) more in the sense of expressing an idea than having a full
31 sophisticated specification.
32 \end{abstract}
34 \Section{Addressing Scheme}
35 \label{addr}
36 The addressing scheme which is used by userland applications must be
37 independent of IPv4 or IPv6 and should be very generic and simple.
38 However, it should also fit into the BSD socket API. A \texttt{communication
39 endpoint} in LANA is therefore defined as
40 \begin{quote}\texttt{peer}@\texttt{[segment]}\end{quote}
41 where
42 \begin{itemize}
43 \setlength{\itemsep}{-1mm}
44 \item \texttt{peer}$\in\mathcal{L}(peer)$ with arbitrary length
45 \item \texttt{[segment]} is a string concatenation of
46 \texttt{segment} separated by the character \texttt{.} (dot) to the right
47 \item \texttt{segment}$\in\mathcal{L}(segment)$ with arbitrary length
48 \end{itemize}
49 and
50 \begin{itemize}
51 \setlength{\itemsep}{-1mm}
52 \item $\Sigma_{1} =\{a,b,c,\ldots,z,0,1,2,\ldots,9\}$
53 \item $\Sigma_{2} =\{$-$\}$
54 \item $\Sigma_{3} =\{.\} \cup \Sigma_{2}$
55 \item $\mathcal{L}(peer)=\{\alpha^{+}(\beta\alpha^{+})^{*}\} \;\vert\; \alpha\in\Sigma_{1},\beta\in\Sigma_{3}$
56 \item $\mathcal{L}(segment)=\{\alpha^{+}(\beta\alpha^{+})^{*}\} \;\vert\; \alpha\in\Sigma_{1},\beta\in\Sigma_{2}$
57 \end{itemize}
59 The concatenation of \texttt{segment}s forms a namespace like in
60 \textit{Java} package namespaces for instance, but with a different
61 semantic. So the more \texttt{segment}s are concatenated to the
62 right, the more fine-grained networking \texttt{peer group}, in
63 the sense of 'all \texttt{peer}s within a single \texttt{[segment]}',
64 we have.\newline
66 An empty \texttt{[segment]} is not allowed, there must be at
67 least one \texttt{segment} and the dot character after the most
68 fine-grained \texttt{segment} must be left out.\newline
70 Examples of valid \texttt{communication endpoint}:
71 \begin{itemize}
72 \setlength{\itemsep}{-1mm}
73 \item \texttt{sensor42@csg.temp.floor1}
74 \end{itemize}
76 Furthermore, there are three special \texttt{peer}s:\newline
78 A \texttt{peer} called \texttt{*} which defines a broadcast \texttt{peer},
79 so that a message sent to \texttt{*}@\texttt{[segment]} will be received by
80 the whole \texttt{peer group} of the \texttt{[segment]}.\newline
82 Next to this, there are also multicast \texttt{peer}s, which have an address of
83 form \texttt{+peer}@\texttt{[segment]}, thus the multicast group name
84 language is in the same $\mathcal{L}$ as $\mathcal{L}(peer)$, but with a
85 \texttt{+} (plus) as prefix.\newline
87 And the last special \texttt{peer} is used for anycasts, which form an
88 address of \texttt{-peer}@\texttt{[segment]}. Similar to multicast group
89 names, they have a \texttt{-} (minus) instead of \texttt{+} (plus) as
90 prefix.\newline
92 Examples of special \texttt{communication endpoint}s:
93 \begin{itemize}
94 \setlength{\itemsep}{-1mm}
95 \item \texttt{*@csg.temp.floor1} (broadcast)
96 \item \texttt{+sensors@csg.temp.floor1} (multicast)
97 \item \texttt{-controller@csg.temp.floor1} (anycast)
98 \end{itemize}
100 However, the regular \texttt{communication endpoint}, a concrete instance of
101 \texttt{peer}@\texttt{[segment]}, forms an unique communication address and
102 is represented by a BSD socket object, thus one host can have applications
103 running that open sockets and communicate to others. In this addressing scheme
104 a \texttt{peer} can be seen (in traditional TCP/IP network architectures) as
105 a combination of an IP address within the \textit{current} LAN \textit{and}
106 a port associated to it that offeres or uses a service. The \texttt{[segment]}
107 part of the addressing scheme can be seen as a specific subnet the \texttt{peer}
108 belongs to. So by this, one host has always the same \texttt{[segment]},
109 and may have different \texttt{peer}s depending on its applications.\newline
111 The \texttt{[segment]}s namespace should be well-structured and
112 well-defined - similar as DNS for instance, but in this case to describe
113 subnets. Note that \texttt{csg.temp.floor1} is a different subnet as
114 \texttt{csg.temp}, thus routing is needed to communicate
115 with \texttt{peer1}@\texttt{csg.temp.floor1} and \texttt{peer2}@\texttt{csg.temp}.
116 Also, one and the same \texttt{peer} may be in different \texttt{[segment]}s,
117 i.e.
118 \begin{itemize}
119 \setlength{\itemsep}{-1mm}
120 \item \texttt{sensor42}@\texttt{csg.temp.floor1}
121 \item \texttt{sensor42}@\texttt{csg.temp.floor2}
122 \end{itemize}
124 With the syntax and semantic of \texttt{peer}@\texttt{[segment]},
125 IP and port numbers become redundant (conjecture 1).\newline
127 Now, since both the \texttt{peer} and the \texttt{[segment]} can be strings
128 according to their given $\mathcal{L}$ with arbitrary length, the underlying
129 operating system and hardware must have a proper way of handling such
130 strings.\newline
132 Therefore, the underlying representation to the kernel looks like:
133 \begin{itemize}
134 \setlength{\itemsep}{-1mm}
135 \item \underline{\texttt{peer}} $\rightarrow$ \texttt{SHA1(peer)}
136 \item \underline{\texttt{[segment]}} $\rightarrow$ \texttt{SHA1([segment])}
137 \end{itemize}
139 Thus we have now a fixed-length representation of
140 \underline{\texttt{peer}}@\underline{\texttt{[segment]}} of $2*20$ Byte per
141 \texttt{communication endpoint}. We \textit{assume} that \texttt{SHA1} is
142 safe enough to use, so that there is no chance to have a kollision of a pair
143 of different \texttt{peer}s and at the same time a kollision of a pair of
144 different \texttt{[segment]}s (conjecture 2).\newline
146 The benefit from this addressing scheme is that we will not ran out
147 of addresses respectively \texttt{peer}s as in IPv4 for instance, thus
148 there would be no need to have such a long-time transistion to IPv6.
149 Organizational entities can dynamically reorganize their \texttt{[segment]}
150 structure and we do not ran out of \texttt{[segment]}s, too. Furthermore,
151 the addressing scheme is more context oriented and consists not just of
152 'meaningless numbers'.
154 \Section{Basic Protocol Types}
155 With this addressing scheme we have two basic protocol layers on top of the
156 underlying hardware:
157 \begin{itemize}
158 \setlength{\itemsep}{-1mm}
159 \item Hardware address layer (can be MAC or similar)
160 \item \texttt{peer}@\texttt{[segment]} layer
161 \end{itemize}
163 Furthermore, we have three basic protocol types on top of an underlying
164 carrier like Ethernet, InfiniBand or Bluetooth:
165 \begin{itemize}
166 \setlength{\itemsep}{-1mm}
167 \item \texttt{ETH\_P\_LACM} $\rightarrow$ Intra-segment config messages
168 \item \texttt{ETH\_P\_LADM} $\rightarrow$ Intra-segment data messages
169 \item \texttt{ETH\_P\_LRDM} $\rightarrow$ Inter-segment data messages
170 \end{itemize}
172 \texttt{ETH\_P\_LACM} is for name resolution, route resolution and other
173 \texttt{peer}@\texttt{[segment]} management messages. \texttt{ETH\_P\_LADM} is for data
174 messages within a given \texttt{[segment]} thus the
175 \underline{\texttt{[segment]}} must \textit{not} be encoded into the packet.
176 \texttt{ETH\_P\_LRDM} is for data messages between different \texttt{[segment]}s,
177 so that the source and destination \underline{\texttt{[segment]}}
178 hashes must be encoded into the packet, since these packets need
179 routing.\newline
181 Concrete identifiers for \texttt{ETH\_P\_LACM}, \texttt{ETH\_P\_LADM} and
182 \texttt{ETH\_P\_LRDM} must be choosen in a way, that there is still compatiblity
183 respectively no interference of traditional Internet protocols such as IPv4,
184 IPv6 and others.
186 \Section{Host Information Management}
187 Since - as stated in \ref{addr} - one host can have multiple \texttt{peer}s,
188 there are two kernelspace \texttt{peer}@\texttt{[segment]}s per NIC with a
189 \underline{\texttt{peer}} each that has a truly randomly assigned value
190 (i.e. via \texttt{/dev/random}). These two \texttt{peer}s are used for a hosts
191 information management, conrete:
192 \begin{itemize}
193 \setlength{\itemsep}{-1mm}
194 \item \texttt{peer} to hardware address resolution of
195 \texttt{peer}s within the own \texttt{[segment]}
196 \item Inter-\texttt{[segment]} route resolution
197 \end{itemize}
199 Both \texttt{peer}@\texttt{[segment]}s may only exchange messages within the
200 \texttt{ETH\_P\_LACM} protocol.\newline
202 Now for each of these both special kernelspace \texttt{peer}s, a distributed
203 hash table (DHT) is assigned
204 \begin{itemize}
205 \setlength{\itemsep}{-1mm}
206 \item $f:$ \underline{\texttt{peer}}$\rightarrow$hardware
207 address of \texttt{peer}
208 \item $g:$ \underline{\texttt{[segment]}}$\rightarrow$hardware
209 address of \texttt{peer} that is a gateway to a different
210 \texttt{[segment]} and has a route to this destination
211 \end{itemize}
212 so that the information respectively mapping of $f$ and $g$ is autonomously
213 managed within the \texttt{peer group} of this \texttt{[segment]}. By this,
214 no broadcast on the hardware address layer is needed at all (conjencture 3),
215 thus \texttt{[segment]} may contain more participants as typical subnets of
216 a traditional LAN.\newline
218 The DHT information management from Kademlia \cite{PeterMaymounkovDavidMazieres}
219 is applied, but with the exception that messages are exchanged on the hardware
220 address layer instead of traditional IP. The identifier, which is used in the
221 Kademlia XOR routing metric will be the specific kernelspace
222 \underline{\texttt{peer}}.\newline
224 A resolution of the hardware address of \texttt{peer}s within a
225 \texttt{[segment]} then equals a lookup in the DHT of function $f$ on the
226 hardware address layer. Similar, the resolution of a route equals a lookup
227 in the DHT of function $g$ on the hardware address layer.\newline
229 Lookups in the DHT are performed in $\mathcal{O}(\log{n})$ where $n$ is the number
230 of \texttt{peer}s of the hosts \texttt{[segment]}. Local caches can optinally
231 be applied, so that a lookup in the DHT of $f$ or $g$ does not need to be
232 performed twice within a given period.\newline
234 Now if a host has multiple NICs and therefore multiple kernelspace
235 \texttt{peer}s, the kernel must of course be able to access the DHTs
236 of the other NICs.
238 \Section{Routing between \texttt{[segment]}s}
239 \label{route}
240 As mentioned previously, the communication of \texttt{peer}s within a
241 \texttt{peer group} involves no routing at all, so this is not further
242 being discussed within this section. Also, asking for a route always
243 equals a DHT lookup on $g$ within the hosts \texttt{[segment]}.\newline
245 The DHT for the function $g$ can only be filled, if a host has at least
246 two NICs connected to different \texttt{[segment]}s. This host then publishes
247 its possible routes to its connected \texttt{[segment]}s into the corresponding
248 DHTs $g$.\newline
250 If for instance a host $x$ with two NICs $NIC_a$, $NIC_b$ is connected to two
251 different \texttt{[segment]}s $S_a$, $S_b$, it announces to the DHT $g$
252 of $S_b$ via $NIC_b$ that a route to $S_a$ is reachable from $NIC_b$ and
253 vice versa to the DHT $g$ of $S_a$ via $NIC_a$ that a route to $S_b$ is
254 reachable from $NIC_a$, so that others can simply perform a lookup to their
255 \texttt{[segment]}s DHT $g$ to obtain the related NICs hardware address
256 of $x$ and then forward data packets of type \texttt{ETH\_P\_LRDM}.\newline
258 Now this mechanism works fine for routing between neighboring
259 \texttt{[segment]}s. However, what about routing between \texttt{[segment]}s
260 that are not directly connected to each other?\newline
262 One possibility would be to have transversely routes next to the
263 \texttt{[segment]}s tree structure. The transversely routes could then be
264 regarded as a way of skipping the usual communication path through the tree
265 by having a shorter path and therefore to reduce network load on the trees
266 upper and root nodes. This structure could be seen similar to a
267 \textit{skip list} as some form of a \textit{skip tree}. By having lots of
268 such transversely routes the network structure is then being meshed by
269 having a worst case route of traversing the \texttt{[segment]}s tree.\newline
271 A worst case routing example is provided next. Here, \texttt{peer1@csg.temp.floor1}
272 wants to communicate with \texttt{peer2@csg.staff.bureau1}. Since in the worst
273 case there are no transversely routes i.e. directly from \texttt{csg.temp.floor1} to
274 \texttt{csg.staff.bureau1}, we need to traverse the tree to its root \texttt{[segment]}
275 named \texttt{csg} and then downwards to \texttt{csg.staff.bureau1}:\newline
277 Lookup 1 from \texttt{csg.temp.floor1}:
278 \begin{enumerate}
279 \setlength{\itemsep}{-1mm}
280 \item \texttt{csg.staff.bureau1}, if failed
281 \begin{enumerate}
282 \setlength{\itemsep}{-1mm}
283 \item \texttt{csg.staff}, if failed
284 \begin{enumerate}
285 \setlength{\itemsep}{-1mm}
286 \item \texttt{csg}, if failed
287 \begin{enumerate}
288 \setlength{\itemsep}{-1mm}
289 \item \texttt{csg.temp}, must succeed, so forward to \texttt{csg.temp}
290 \end{enumerate}
291 \end{enumerate}
292 \end{enumerate}
293 \end{enumerate}
295 Lookup 2 from \texttt{csg.temp}:
296 \begin{enumerate}
297 \setlength{\itemsep}{-1mm}
298 \item \texttt{csg.staff.bureau1}, if failed
299 \begin{enumerate}
300 \setlength{\itemsep}{-1mm}
301 \item \texttt{csg.staff}, if failed
302 \begin{enumerate}
303 \setlength{\itemsep}{-1mm}
304 \item \texttt{csg}, must succeed, so forward to \texttt{csg}
305 \end{enumerate}
306 \end{enumerate}
307 \end{enumerate}
309 Lookup 3 from \texttt{csg}:
310 \begin{enumerate}
311 \setlength{\itemsep}{-1mm}
312 \item \texttt{csg.staff.bureau1}, if failed
313 \begin{enumerate}
314 \setlength{\itemsep}{-1mm}
315 \item \texttt{csg.staff}, must succeed, so forward to \texttt{csg.staff}
316 \end{enumerate}
317 \end{enumerate}
319 Lookup 4 from \texttt{csg.staff}:
320 \begin{enumerate}
321 \item \texttt{csg.staff.bureau1}, must succeed, so forward to \texttt{csg.staff.bureau1}
322 \end{enumerate}
324 With a more meshed network structure, we could have a best case routing for this specific example
325 of:\newline
327 Lookup 1 from \texttt{csg.temp.floor1}:
328 \begin{enumerate}
329 \item \texttt{csg.staff.bureau1}, succeeded, so forward to \texttt{csg.staff.bureau1}
330 \end{enumerate}
332 \Section{Firewalling / Filtering}
333 A \texttt{[segment]} can be protected from unwanted traffic by doing a
334 string pattern matching (i.e. in hardware) on the \texttt{[segment]}
335 string. Since this is represented as \underline{\texttt{[segment]}} one
336 can do blacklist or whitelist matching. This is also possible for allwing
337 traffic for concrete \underline{\texttt{peer}}@\underline{\texttt{[segment]}}
338 for instance. However, pattern matching in the sense of regular expression
339 matching will not be possible since \texttt{peer}@\texttt{[segment]} are
340 represented as \texttt{SHA1} hashes.
342 \Section{Multicast Groups}
343 A multicast group is handeled as an usual \texttt{peer} within the DHT and the
344 multicast group management is postponed to the hardware address layer, if the harware
345 address layer supports multicast. (still needs more explanation)
347 \Section{Anycast Groups}
348 An anycast group is handeled different within the DHT than the multicast group. If
349 a \texttt{peer} joins an anycast group, then the keys value needs to maintain a list
350 of \texttt{peer}s associated to a specific given key. One \texttt{peer} is then returned
351 from the list, for instance in a round-robin or random fashion. (still needs more explanation)
353 \Section{Communication Bootstrap}
354 (todo)
356 \Section{Userspace API}
357 The application developer interface for the userland is discussed here.
359 \SubSection{Addressing Structure}
360 The addressing structure defined in \texttt{linux/lana.h} is named
361 \texttt{struct sockaddr\_lana} and consists of the elements
362 \begin{itemize}
363 \setlength{\itemsep}{-1mm}
364 \item \texttt{sa\_family} which is always \texttt{AF\_LANA} or \texttt{PF\_LANA}
365 \item \texttt{sa\_address} which has the form
366 \texttt{peer}@\texttt{[segment]} as described in \ref{addr}
367 \end{itemize}
369 \SubSection{Segment Map}
370 The file \texttt{/etc/segment} must contain the \texttt{[segment]} to NIC
371 mapping of all NICs, i.e.:\newline
373 \scriptsize{
374 \begin{lstlisting}
375 eth0:csg.temp.floor1
376 eth1:csg.temp.admin
377 \end{lstlisting}
379 \normalsize
381 This example implies a direct gateway from \texttt{csg.temp.floor1} to
382 the \texttt{csg.temp.admin} \texttt{[segment]} and vice versa.
384 \SubSection{Systemcalls}
385 Supported BSD socket system calls for LANA are:
386 \begin{itemize}
387 \setlength{\itemsep}{-1mm}
388 \item \texttt{socket(2)}
389 \item \texttt{bind(2)}
390 \item \texttt{getsockopt(2)/setsockopt(2)}
391 \item \texttt{select(2)/poll(2)/epoll(2)}
392 \item \texttt{recvfrom(2)}
393 \item \texttt{sendto(2)}
394 \item \texttt{close(2)}
395 \end{itemize}
397 Note that the \texttt{bind(2)} system call must be done before any calls
398 to \texttt{recvfrom(2)} or \texttt{sendto(2)} in order to bind the socket
399 to a specific \texttt{peer}@\texttt{[segment]} otherwise the kernel will
400 assign a randomly choosen string for \texttt{peer} and a randomly selected
401 \texttt{[segment]} of \texttt{/etc/segment}. The latter assumes, that this
402 socket only wants to \textit{consume} services of other
403 \texttt{peer}@\texttt{[segment]}s, without offering some service for others.
404 Compare this to the notion of port numbers, where the kernel randomly
405 assigns a number for communication on client applications. Then, the kernel
406 will do internally a bind before a \texttt{recvfrom(2)} or \texttt{sendto(2)}
407 call. However, if a \texttt{bind(2)} is done explicitly and the \texttt{[segment]}
408 is different from \texttt{[segment]}s given in \texttt{/etc/segment}, then
409 the bind will fail and an error is returned.\newline
411 With \texttt{setsockopt(2)} one is able join or leave multicast and anycast
412 groups.
414 \SubSection{Message flags}
415 Message flags are statical, global and defined in \texttt{linux/lana.h} and
416 represented as bitvectors. They are applied to the \texttt{sendto(2)} call,
417 thus the kernel gets to know the applications needs for communication with
418 the remote \texttt{communication endpoint}, e.g. whether the communication
419 must be reliable or confidential. These needs are named \texttt{communication
420 features}.
422 \SubSection{Code Example}
423 Here is an example C code of how a typical LANA userspace program
424 for a temperature measuring entity could look like:\newline
426 \scriptsize{
427 \lstset{language=C}
428 \begin{lstlisting}
429 #include <linux/lana.h>
431 int main(void)
433 ssize_t ret;
434 int sock, err, flags;
435 struct sockaddr_lana own, remote;
436 char buff[256];
437 socklen_t remote_len;
439 sock = socket(AF_LANA, SOCK_RAW, 0);
441 own.sa_family = AF_LANA;
442 strlcpy(own.sa_address,
443 "sensor42@csg.temp.floor1",
444 sizeof(own.sa_address));
445 bind(sock, (struct sockaddr_in *) &own, sizeof(own));
447 remote_len = sizeof(remote);
448 recvfrom(sock, buff, sizeof(buff), 0,
449 (struct sockaddr *) remote,
450 &remote_len);
452 memset(&remote, 0, sizeof(remote));
453 remote.sa_family = AF_LANA;
454 strlcpy(remote.sa_address,
455 "-controller@csg.temp.floor1",
456 sizeof(remote.sa_address));
457 memset(buff, 0, sizeof(buff));
458 strlcpy(buff, "temp:20,scale:celsius", sizeof(buff));
459 flags = MSG_GLOBAL | MSG_RELIABLE;
460 sendto(sock, buff, strlen(buff) + 1, flags,
461 (struct sockaddr *) &remote, sizeof(remote));
463 memset(&remote, 0, sizeof(remote));
464 remote.sa_family = AF_LANA;
465 strlcpy(remote.sa_address, "*@csg.temp.floor1",
466 sizeof(remote.sa_address));
467 memset(buff, 0, sizeof(buff));
468 strlcpy(buff, "alive", sizeof(buff));
469 flags = MSG_GLOBAL | MSG_RELIABLE;
470 sendto(sock, buff, strlen(buff) + 1, flags,
471 (struct sockaddr *) &remote, sizeof(remote));
473 close(sock);
474 return 0;
476 \end{lstlisting}
478 \normalsize
480 \nocite{*}
481 \bibliographystyle{abbrv}
482 \bibliography{main}
484 \end{document}