doc/misc/1/main.tex

   1 \documentclass[times,10pt,twocolumn]{article}
   2 \usepackage{main}
   3 \usepackage{times}
   4 \usepackage[english]{babel}
   5 \usepackage{hyperref}
   6 \usepackage{listings}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{latexsym}
   9 \pagestyle{empty}
  10
  11 \parindent0em
  12
  13 \begin{document}
  14 \title{
  15 Ideas on context-oriented networking on LANA\\\smallskip [DRAFT]
  16 }
  17
  18 \author{
  19 Daniel Borkmann\\
  20 dborkma@tik.ee.ethz.ch\\
  21 }
  22
  23 \maketitle
  24 \thispagestyle{empty}
  25
  26 \begin{abstract}
  27 This document includes notes about possible context-oriented networking
  28 LANA (Lightweight Autonomic Network Architecture) regarding API design,
  29 addressing, routing, stack setup and more; rather loosely organized and
  30 (for now) more in the sense of expressing an idea than having a full
  31 sophisticated specification.
  32 \end{abstract}
  33
  34 \Section{Addressing Scheme}
  35 \label{addr}
  36 The addressing scheme which is used by userland applications must be
  37 independent of IPv4 or IPv6 and should be very generic and simple.
  38 However, it should also fit into the BSD socket API. A \texttt{communication
  39 endpoint} in LANA is therefore defined as
  40 \begin{quote}\texttt{peer}@\texttt{[segment]}\end{quote}
  41 where
  42 \begin{itemize}
  43         \setlength{\itemsep}{-1mm}
  44         \item \texttt{peer}$\in\mathcal{L}(peer)$ with arbitrary length
  45         \item \texttt{[segment]} is a string concatenation of
  46               \texttt{segment} separated by the character \texttt{.} (dot) to the right
  47         \item \texttt{segment}$\in\mathcal{L}(segment)$ with arbitrary length
  48 \end{itemize}
  49 and
  50 \begin{itemize}
  51         \setlength{\itemsep}{-1mm}
  52         \item $\Sigma_{1} =\{a,b,c,\ldots,z,0,1,2,\ldots,9\}$
  53         \item $\Sigma_{2} =\{$-$\}$
  54         \item $\Sigma_{3} =\{.\} \cup \Sigma_{2}$
  55         \item $\mathcal{L}(peer)=\{\alpha^{+}(\beta\alpha^{+})^{*}\} \;\vert\; \alpha\in\Sigma_{1},\beta\in\Sigma_{3}$
  56         \item $\mathcal{L}(segment)=\{\alpha^{+}(\beta\alpha^{+})^{*}\} \;\vert\; \alpha\in\Sigma_{1},\beta\in\Sigma_{2}$
  57 \end{itemize}
  58
  59 The concatenation of \texttt{segment}s forms a namespace like in
  60 \textit{Java} package namespaces for instance, but with a different
  61 semantic. So the more \texttt{segment}s are concatenated to the
  62 right, the more fine-grained networking \texttt{peer group}, in
  63 the sense of 'all \texttt{peer}s within a single \texttt{[segment]}',
  64 we have.\newline
  65
  66 An empty \texttt{[segment]} is not allowed, there must be at
  67 least one \texttt{segment} and the dot character after the most
  68 fine-grained \texttt{segment} must be left out.\newline
  69
  70 Examples of valid \texttt{communication endpoint}:
  71 \begin{itemize}
  72         \setlength{\itemsep}{-1mm}
  73         \item \texttt{sensor42@csg.temp.floor1}
  74 \end{itemize}
  75
  76 Furthermore, there are three special \texttt{peer}s:\newline
  77
  78 A \texttt{peer} called \texttt{*} which defines a broadcast \texttt{peer},
  79 so that a message sent to \texttt{*}@\texttt{[segment]} will be received by
  80 the whole \texttt{peer group} of the \texttt{[segment]}.\newline
  81
  82 Next to this, there are also multicast \texttt{peer}s, which have an address of
  83 form \texttt{+peer}@\texttt{[segment]}, thus the multicast group name
  84 language is in the same $\mathcal{L}$ as $\mathcal{L}(peer)$, but with a
  85 \texttt{+} (plus) as prefix.\newline
  86
  87 And the last special \texttt{peer} is used for anycasts, which form an
  88 address of \texttt{-peer}@\texttt{[segment]}. Similar to multicast group
  89 names, they have a \texttt{-} (minus) instead of \texttt{+} (plus) as
  90 prefix.\newline
  91
  92 Examples of special \texttt{communication endpoint}s:
  93 \begin{itemize}
  94         \setlength{\itemsep}{-1mm}
  95         \item \texttt{*@csg.temp.floor1} (broadcast)
  96         \item \texttt{+sensors@csg.temp.floor1} (multicast)
  97         \item \texttt{-controller@csg.temp.floor1} (anycast)
  98 \end{itemize}
  99
 100 However, the regular \texttt{communication endpoint}, a concrete instance of
 101 \texttt{peer}@\texttt{[segment]}, forms an unique communication address and
 102 is represented by a BSD socket object, thus one host can have applications
 103 running that open sockets and communicate to others. In this addressing scheme
 104 a \texttt{peer} can be seen (in traditional TCP/IP network architectures) as
 105 a combination of an IP address within the \textit{current} LAN \textit{and}
 106 a port associated to it that offeres or uses a service. The \texttt{[segment]}
 107 part of the addressing scheme can be seen as a specific subnet the \texttt{peer}
 108 belongs to. So by this, one host has always the same \texttt{[segment]},
 109 and may have different \texttt{peer}s depending on its applications.\newline
 110
 111 The \texttt{[segment]}s namespace should be well-structured and
 112 well-defined - similar as DNS for instance, but in this case to describe
 113 subnets. Note that \texttt{csg.temp.floor1} is a different subnet as
 114 \texttt{csg.temp}, thus routing is needed to communicate
 115 with \texttt{peer1}@\texttt{csg.temp.floor1} and \texttt{peer2}@\texttt{csg.temp}.
 116 Also, one and the same \texttt{peer} may be in different \texttt{[segment]}s,
 117 i.e.
 118 \begin{itemize}
 119         \setlength{\itemsep}{-1mm}
 120         \item \texttt{sensor42}@\texttt{csg.temp.floor1}
 121         \item \texttt{sensor42}@\texttt{csg.temp.floor2}
 122 \end{itemize}
 123
 124 With the syntax and semantic of \texttt{peer}@\texttt{[segment]},
 125 IP and port numbers become redundant (conjecture 1).\newline
 126
 127 Now, since both the \texttt{peer} and the \texttt{[segment]} can be strings
 128 according to their given $\mathcal{L}$ with arbitrary length, the underlying
 129 operating system and hardware must have a proper way of handling such
 130 strings.\newline
 131
 132 Therefore, the underlying representation to the kernel looks like:
 133 \begin{itemize}
 134         \setlength{\itemsep}{-1mm}
 135         \item \underline{\texttt{peer}} $\rightarrow$ \texttt{SHA1(peer)}
 136         \item \underline{\texttt{[segment]}} $\rightarrow$ \texttt{SHA1([segment])}
 137 \end{itemize}
 138
 139 Thus we have now a fixed-length representation of
 140 \underline{\texttt{peer}}@\underline{\texttt{[segment]}} of $2*20$ Byte per
 141 \texttt{communication endpoint}. We \textit{assume} that \texttt{SHA1} is
 142 safe enough to use, so that there is no chance to have a kollision of a pair
 143 of different \texttt{peer}s and at the same time a kollision of a pair of
 144 different \texttt{[segment]}s (conjecture 2).\newline
 145
 146 The benefit from this addressing scheme is that we will not ran out
 147 of addresses respectively \texttt{peer}s as in IPv4 for instance, thus
 148 there would be no need to have such a long-time transistion to IPv6.
 149 Organizational entities can dynamically reorganize their \texttt{[segment]}
 150 structure and we do not ran out of \texttt{[segment]}s, too. Furthermore,
 151 the addressing scheme is more context oriented and consists not just of
 152 'meaningless numbers'.
 153
 154 \Section{Basic Protocol Types}
 155 With this addressing scheme we have two basic protocol layers on top of the
 156 underlying hardware:
 157 \begin{itemize}
 158         \setlength{\itemsep}{-1mm}
 159         \item Hardware address layer (can be MAC or similar)
 160         \item \texttt{peer}@\texttt{[segment]} layer
 161 \end{itemize}
 162
 163 Furthermore, we have three basic protocol types on top of an underlying
 164 carrier like Ethernet, InfiniBand or Bluetooth:
 165 \begin{itemize}
 166         \setlength{\itemsep}{-1mm}
 167         \item \texttt{ETH\_P\_LACM} $\rightarrow$ Intra-segment config messages
 168         \item \texttt{ETH\_P\_LADM} $\rightarrow$ Intra-segment data messages
 169         \item \texttt{ETH\_P\_LRDM} $\rightarrow$ Inter-segment data messages
 170 \end{itemize}
 171
 172 \texttt{ETH\_P\_LACM} is for name resolution, route resolution and other
 173 \texttt{peer}@\texttt{[segment]} management messages. \texttt{ETH\_P\_LADM} is for data
 174 messages within a given \texttt{[segment]} thus the
 175 \underline{\texttt{[segment]}} must \textit{not} be encoded into the packet.
 176 \texttt{ETH\_P\_LRDM} is for data messages between different \texttt{[segment]}s,
 177 so that the source and destination \underline{\texttt{[segment]}}
 178 hashes must be encoded into the packet, since these packets need
 179 routing.\newline
 180
 181 Concrete identifiers for \texttt{ETH\_P\_LACM}, \texttt{ETH\_P\_LADM} and
 182 \texttt{ETH\_P\_LRDM} must be choosen in a way, that there is still compatiblity
 183 respectively no interference of traditional Internet protocols such as IPv4,
 184 IPv6 and others.
 185
 186 \Section{Host Information Management}
 187 Since - as stated in \ref{addr} - one host can have multiple \texttt{peer}s,
 188 there are two kernelspace \texttt{peer}@\texttt{[segment]}s per NIC with a
 189 \underline{\texttt{peer}} each that has a truly randomly assigned value
 190 (i.e. via \texttt{/dev/random}). These two \texttt{peer}s are used for a hosts
 191 information management, conrete:
 192 \begin{itemize}
 193         \setlength{\itemsep}{-1mm}
 194         \item \texttt{peer} to hardware address resolution of
 195               \texttt{peer}s within the own \texttt{[segment]}
 196         \item Inter-\texttt{[segment]} route resolution
 197 \end{itemize}
 198
 199 Both \texttt{peer}@\texttt{[segment]}s may only exchange messages within the
 200 \texttt{ETH\_P\_LACM} protocol.\newline
 201
 202 Now for each of these both special kernelspace \texttt{peer}s, a distributed
 203 hash table (DHT) is assigned
 204 \begin{itemize}
 205         \setlength{\itemsep}{-1mm}
 206         \item $f:$ \underline{\texttt{peer}}$\rightarrow$hardware
 207               address of \texttt{peer}
 208         \item $g:$ \underline{\texttt{[segment]}}$\rightarrow$hardware
 209               address of \texttt{peer} that is a gateway to a different
 210               \texttt{[segment]} and has a route to this destination
 211 \end{itemize}
 212 so that the information respectively mapping of $f$ and $g$ is autonomously
 213 managed within the \texttt{peer group} of this \texttt{[segment]}. By this,
 214 no broadcast on the hardware address layer is needed at all (conjencture 3),
 215 thus \texttt{[segment]} may contain more participants as typical subnets of
 216 a traditional LAN.\newline
 217
 218 The DHT information management from Kademlia \cite{PeterMaymounkovDavidMazieres}
 219 is applied, but with the exception that messages are exchanged on the hardware
 220 address layer instead of traditional IP. The identifier, which is used in the
 221 Kademlia XOR routing metric will be the specific kernelspace
 222 \underline{\texttt{peer}}.\newline
 223
 224 A resolution of the hardware address of \texttt{peer}s within a
 225 \texttt{[segment]} then equals a lookup in the DHT of function $f$ on the
 226 hardware address layer. Similar, the resolution of a route equals a lookup
 227 in the DHT of function $g$ on the hardware address layer.\newline
 228
 229 Lookups in the DHT are performed in $\mathcal{O}(\log{n})$ where $n$ is the number
 230 of \texttt{peer}s of the hosts \texttt{[segment]}. Local caches can optinally
 231 be applied, so that a lookup in the DHT of $f$ or $g$ does not need to be
 232 performed twice within a given period.\newline
 233
 234 Now if a host has multiple NICs and therefore multiple kernelspace
 235 \texttt{peer}s, the kernel must of course be able to access the DHTs
 236 of the other NICs.
 237
 238 \Section{Routing between \texttt{[segment]}s}
 239 \label{route}
 240 As mentioned previously, the communication of \texttt{peer}s within a
 241 \texttt{peer group} involves no routing at all, so this is not further
 242 being discussed within this section. Also, asking for a route always
 243 equals a DHT lookup on $g$ within the hosts \texttt{[segment]}.\newline
 244
 245 The DHT for the function $g$ can only be filled, if a host has at least
 246 two NICs connected to different \texttt{[segment]}s. This host then publishes
 247 its possible routes to its connected \texttt{[segment]}s into the corresponding
 248 DHTs $g$.\newline
 249
 250 If for instance a host $x$ with two NICs $NIC_a$, $NIC_b$ is connected to two
 251 different \texttt{[segment]}s $S_a$, $S_b$, it announces to the DHT $g$
 252 of $S_b$ via $NIC_b$ that a route to $S_a$ is reachable from $NIC_b$ and
 253 vice versa to the DHT $g$ of $S_a$ via $NIC_a$ that a route to $S_b$ is
 254 reachable from $NIC_a$, so that others can simply perform a lookup to their
 255 \texttt{[segment]}s DHT $g$ to obtain the related NICs hardware address
 256 of $x$ and then forward data packets of type \texttt{ETH\_P\_LRDM}.\newline
 257
 258 Now this mechanism works fine for routing between neighboring
 259 \texttt{[segment]}s. However, what about routing between \texttt{[segment]}s
 260 that are not directly connected to each other?\newline
 261
 262 One possibility would be to have transversely routes next to the
 263 \texttt{[segment]}s tree structure. The transversely routes could then be
 264 regarded as a way of skipping the usual communication path through the tree
 265 by having a shorter path and therefore to reduce network load on the trees
 266 upper and root nodes. This structure could be seen similar to a
 267 \textit{skip list} as some form of a \textit{skip tree}. By having lots of
 268 such transversely routes the network structure is then being meshed by
 269 having a worst case route of traversing the \texttt{[segment]}s tree.\newline
 270
 271 A worst case routing example is provided next. Here, \texttt{peer1@csg.temp.floor1}
 272 wants to communicate with \texttt{peer2@csg.staff.bureau1}. Since in the worst
 273 case there are no transversely routes i.e. directly from \texttt{csg.temp.floor1} to
 274 \texttt{csg.staff.bureau1}, we need to traverse the tree to its root \texttt{[segment]}
 275 named \texttt{csg} and then downwards to \texttt{csg.staff.bureau1}:\newline
 276
 277 Lookup 1 from \texttt{csg.temp.floor1}:
 278 \begin{enumerate}
 279         \setlength{\itemsep}{-1mm}
 280         \item \texttt{csg.staff.bureau1}, if failed
 281         \begin{enumerate}
 282                 \setlength{\itemsep}{-1mm}
 283                 \item \texttt{csg.staff}, if failed
 284                 \begin{enumerate}
 285                         \setlength{\itemsep}{-1mm}
 286                         \item \texttt{csg}, if failed
 287                         \begin{enumerate}
 288                                 \setlength{\itemsep}{-1mm}
 289                                 \item \texttt{csg.temp}, must succeed, so forward to \texttt{csg.temp}
 290                         \end{enumerate}
 291                 \end{enumerate}
 292         \end{enumerate}
 293 \end{enumerate}
 294
 295 Lookup 2 from \texttt{csg.temp}:
 296 \begin{enumerate}
 297         \setlength{\itemsep}{-1mm}
 298         \item \texttt{csg.staff.bureau1}, if failed
 299         \begin{enumerate}
 300                 \setlength{\itemsep}{-1mm}
 301                 \item \texttt{csg.staff}, if failed
 302                 \begin{enumerate}
 303                         \setlength{\itemsep}{-1mm}
 304                         \item \texttt{csg}, must succeed, so forward to \texttt{csg}
 305                 \end{enumerate}
 306         \end{enumerate}
 307 \end{enumerate}
 308
 309 Lookup 3 from \texttt{csg}:
 310 \begin{enumerate}
 311         \setlength{\itemsep}{-1mm}
 312         \item \texttt{csg.staff.bureau1}, if failed
 313         \begin{enumerate}
 314                 \setlength{\itemsep}{-1mm}
 315                 \item \texttt{csg.staff}, must succeed, so forward to \texttt{csg.staff}
 316         \end{enumerate}
 317 \end{enumerate}
 318
 319 Lookup 4 from \texttt{csg.staff}:
 320 \begin{enumerate}
 321         \item \texttt{csg.staff.bureau1}, must succeed, so forward to \texttt{csg.staff.bureau1}
 322 \end{enumerate}
 323
 324 With a more meshed network structure, we could have a best case routing for this specific example
 325 of:\newline
 326
 327 Lookup 1 from \texttt{csg.temp.floor1}:
 328 \begin{enumerate}
 329         \item \texttt{csg.staff.bureau1}, succeeded, so forward to \texttt{csg.staff.bureau1}
 330 \end{enumerate}
 331
 332 \Section{Firewalling / Filtering}
 333 A \texttt{[segment]} can be protected from unwanted traffic by doing a
 334 string pattern matching (i.e. in hardware) on the \texttt{[segment]}
 335 string. Since this is represented as \underline{\texttt{[segment]}} one
 336 can do blacklist or whitelist matching. This is also possible for allwing
 337 traffic for concrete \underline{\texttt{peer}}@\underline{\texttt{[segment]}}
 338 for instance. However, pattern matching in the sense of regular expression
 339 matching will not be possible since \texttt{peer}@\texttt{[segment]} are
 340 represented as \texttt{SHA1} hashes.
 341
 342 \Section{Multicast Groups}
 343 A multicast group is handeled as an usual \texttt{peer} within the DHT and the
 344 multicast group management is postponed to the hardware address layer, if the harware
 345 address layer supports multicast. (still needs more explanation)
 346
 347 \Section{Anycast Groups}
 348 An anycast group is handeled different within the DHT than the multicast group. If
 349 a \texttt{peer} joins an anycast group, then the keys value needs to maintain a list
 350 of \texttt{peer}s associated to a specific given key. One \texttt{peer} is then returned
 351 from the list, for instance in a round-robin or random fashion. (still needs more explanation)
 352
 353 \Section{Communication Bootstrap}
 354 (todo)
 355
 356 \Section{Userspace API}
 357 The application developer interface for the userland is discussed here.
 358
 359 \SubSection{Addressing Structure}
 360 The addressing structure defined in \texttt{linux/lana.h} is named
 361 \texttt{struct sockaddr\_lana} and consists of the elements
 362 \begin{itemize}
 363         \setlength{\itemsep}{-1mm}
 364         \item \texttt{sa\_family} which is always \texttt{AF\_LANA} or \texttt{PF\_LANA}
 365         \item \texttt{sa\_address} which has the form
 366               \texttt{peer}@\texttt{[segment]} as described in \ref{addr}
 367 \end{itemize}
 368
 369 \SubSection{Segment Map}
 370 The file \texttt{/etc/segment} must contain the \texttt{[segment]} to NIC
 371 mapping of all NICs, i.e.:\newline
 372
 373 \scriptsize{
 374 \begin{lstlisting}
 375 eth0:csg.temp.floor1
 376 eth1:csg.temp.admin
 377 \end{lstlisting}
 378 }
 379 \normalsize
 380
 381 This example implies a direct gateway from \texttt{csg.temp.floor1} to
 382 the \texttt{csg.temp.admin} \texttt{[segment]} and vice versa.
 383
 384 \SubSection{Systemcalls}
 385 Supported BSD socket system calls for LANA are:
 386 \begin{itemize}
 387         \setlength{\itemsep}{-1mm}
 388         \item \texttt{socket(2)}
 389         \item \texttt{bind(2)}
 390         \item \texttt{getsockopt(2)/setsockopt(2)}
 391         \item \texttt{select(2)/poll(2)/epoll(2)}
 392         \item \texttt{recvfrom(2)}
 393         \item \texttt{sendto(2)}
 394         \item \texttt{close(2)}
 395 \end{itemize}
 396
 397 Note that the \texttt{bind(2)} system call must be done before any calls
 398 to \texttt{recvfrom(2)} or \texttt{sendto(2)} in order to bind the socket
 399 to a specific \texttt{peer}@\texttt{[segment]} otherwise the kernel will
 400 assign a randomly choosen string for \texttt{peer} and a randomly selected
 401 \texttt{[segment]} of \texttt{/etc/segment}. The latter assumes, that this
 402 socket only wants to \textit{consume} services of other
 403 \texttt{peer}@\texttt{[segment]}s, without offering some service for others.
 404 Compare this to the notion of port numbers, where the kernel randomly
 405 assigns a number for communication on client applications. Then, the kernel
 406 will do internally a bind before a \texttt{recvfrom(2)} or \texttt{sendto(2)}
 407 call. However, if a \texttt{bind(2)} is done explicitly and the \texttt{[segment]}
 408 is different from \texttt{[segment]}s given in \texttt{/etc/segment}, then
 409 the bind will fail and an error is returned.\newline
 410
 411 With \texttt{setsockopt(2)} one is able join or leave multicast and anycast
 412 groups.
 413
 414 \SubSection{Message flags}
 415 Message flags are statical, global and defined in \texttt{linux/lana.h} and
 416 represented as bitvectors. They are applied to the \texttt{sendto(2)} call,
 417 thus the kernel gets to know the applications needs for communication with
 418 the remote \texttt{communication endpoint}, e.g. whether the communication
 419 must be reliable or confidential. These needs are named \texttt{communication
 420 features}.
 421
 422 \SubSection{Code Example}
 423 Here is an example C code of how a typical LANA userspace program
 424 for a temperature measuring entity could look like:\newline
 425
 426 \scriptsize{
 427 \lstset{language=C}
 428 \begin{lstlisting}
 429 #include <linux/lana.h>
 430
 431 int main(void)
 432 {
 433   ssize_t ret;
 434   int sock, err, flags;
 435   struct sockaddr_lana own, remote;
 436   char buff[256];
 437   socklen_t remote_len;
 438   ...
 439   sock = socket(AF_LANA, SOCK_RAW, 0);
 440   ...
 441   own.sa_family = AF_LANA;
 442   strlcpy(own.sa_address,
 443           "sensor42@csg.temp.floor1",
 444           sizeof(own.sa_address));
 445   bind(sock, (struct sockaddr_in *) &own, sizeof(own));
 446   ...
 447   remote_len = sizeof(remote);
 448   recvfrom(sock, buff, sizeof(buff), 0,
 449            (struct sockaddr *) remote,
 450            &remote_len);
 451   ...
 452   memset(&remote, 0, sizeof(remote));
 453   remote.sa_family = AF_LANA;
 454   strlcpy(remote.sa_address,
 455           "-controller@csg.temp.floor1",
 456           sizeof(remote.sa_address));
 457   memset(buff, 0, sizeof(buff));
 458   strlcpy(buff, "temp:20,scale:celsius", sizeof(buff));
 459   flags = MSG_GLOBAL | MSG_RELIABLE;
 460   sendto(sock, buff, strlen(buff) + 1, flags,
 461          (struct sockaddr *) &remote, sizeof(remote));
 462   ...
 463   memset(&remote, 0, sizeof(remote));
 464   remote.sa_family = AF_LANA;
 465   strlcpy(remote.sa_address, "*@csg.temp.floor1",
 466         sizeof(remote.sa_address));
 467   memset(buff, 0, sizeof(buff));
 468   strlcpy(buff, "alive", sizeof(buff));
 469   flags = MSG_GLOBAL | MSG_RELIABLE;
 470   sendto(sock, buff, strlen(buff) + 1, flags,
 471          (struct sockaddr *) &remote, sizeof(remote));
 472   ...
 473   close(sock);
 474   return 0;
 475 }
 476 \end{lstlisting}
 477 }
 478 \normalsize
 479
 480 \nocite{*}
 481 \bibliographystyle{abbrv}
 482 \bibliography{main}
 483
 484 \end{document}