doc/corvix/index.html

   1 <html>
   2 <head>
   3    <title>corvix GNU/Linux</title>
   4    <meta http-equiv="content-type" content="text/html; charset=latin1"/>
   5    <link href="style.css" rel=stylesheet type=text/css>
   6    <base target="main">
   7    <meta name="author" content="Gerolf Ziegenhain">
   8    <meta name="keywords" content="corvix linux gnu distribution debian egatrop">
   9 </head>
  10 <body>
  11
  12
  13 <h1>corvix GNU/Linux</h1>
  14 Welcome to the webpage of the corvix distribution.
  15 <br>
  16 <b>Current status:</b>
  17 We are merging our working installation
  18 into a reproduceable framework. This project is under heavy development
  19 and <i>not yet working</i> for the end-user. Nevertheless you already may
  20 check out our source tree or add the repositories to your current debian installation.
  21
  22
  23 <h2>What is corvix?</h2>
  24 Corvus corvax likes to steal; this distribution steals ideas :)
  25 It steals ideas in a positive way - we contribute a set of
  26 script, which you may use to build up your own high-performance
  27 cluster without the need to search for all the packages / documentation
  28 yourself. This distribution is the result of the lessons learned by building
  29 up a 160+ CPU Linux cluster with several NFS servers from scratch.
  30 <br>
  31 In fact corvix is less a distribution and more a cheat sheet for installing
  32 debian in a scientific environment. After you install the
  33 corvix distribution, you have a debian/etch with additional
  34 features.
  35 <br>
  36 Corvix GNU/Linux provides preconfigured profiles for both workstations
  37 (i.e. your laptop) and complete clusters. Nevertheless: If you expect
  38 a colorful eyecandy system... forget it. Corvix is an advanced
  39 distribution; or call it a dirty hack... It works for us ;)
  40
  41
  42 <h2>Introduction</h2>
  43 Probably you want to read this first, before you decide to try out
  44 our stuff.
  45
  46 <h3>Motivation</h3>
  47 Usually in a scientific environment one likes to have a stable
  48 base system (this also could be FreeBSD) and then only update
  49 a couple of really important packages to the bleeding edge.
  50 Here, debian is chosen as the basis system. The corvix
  51 distribution will install a straight debian/etch with a set
  52 of common packages selected.
  53 <br>
  54 Any additional software will be compiled from scratch using
  55 a portrage (Gentoo) alike system called <i>egatrop</i>.
  56
  57
  58 <h3>Concept</h3>
  59 The main concept is: <i>console rocks</i>.
  60 <br>
  61 There is no further documentation, than this file (which is included
  62 in the distribution). For any changes / configuration you may want to
  63 do: RTFM.
  64 <br>
  65 <ul>
  66    <li>advanced user
  67    <li>textmode
  68    <li>full documentation: download git source tree, and
  69       read <pre>/doc</pre>
  70 </ul>
  71
  72
  73 <h2>Package Management</h2>
  74
  75 <h3>Debian</h3>
  76 For the stable packages we rely completely on the Debian GNU/Linux
  77 distribution.
  78 <br>
  79 You have to put the components in your /etc/apt/sources.list.
  80
  81 <h4>Key signatures:</h4>
  82 <ol>
  83    <li>from keyserver
  84 <pre>
  85 gpg --keyserver pgpkeys.mit.edu --recv-key 974E7D68
  86 gpg -a --export 974E7D68 | sudo apt-key add -
  87 </pre>
  88    <li>from keyring
  89 <pre>
  90 apt-get -y --force-yes install corvix-keyring
  91 </pre>
  92    </ol>
  93
  94
  95 <h4>Component: meta</h4>
  96 <pre>
  97 deb http://corvix.eu testing meta
  98 </pre>
  99 Contains virtual packages for quick configuration of lots of pcs.
 100
 101 <h4>Component: cluster</h4>
 102 <pre>
 103 deb http://corvix.eu testing cluster
 104 </pre>
 105 Countains cluster installation packages.
 106
 107
 108 <h2>Installation</h2>
 109 The installation procedure is done over network. You may download
 110 a bunch of scripts, which will build the bootable CD, USB stick
 111 or setup a PXE boot server for you. If you want to save bandwidth
 112 you may also choose to setup a mirror for the packages;
 113 this can be a webserver, a CD or any other storage medium you
 114 may access during the installation.
 115
 116 <h3>Features of the installer</h3>
 117 Linux is about choice. Here, we restrict the features strictly
 118 to some working configurations:
 119 <ul>
 120    <li>Workstation
 121       <ul>
 122          <li>debootstrapped debian/etch
 123          <li>usual debian/etch bootcd
 124       </ul>
 125    <li>Cluster nodes:
 126       <ul>
 127          <li>Head server
 128          <li>Nodes: will install within under 4 minutes.
 129       </ul>
 130 </ul>
 131 Everything you want to do different: you can work as usual
 132 with debian. For custom compiled stuff have a look at the
 133 ebuilds.
 134
 135
 136 <h3>Obtaining the software</h3>
 137 Download the source
 138
 139 <h4>Corvix Bootstrap</h4>
 140 <ul>
 141    <li>Offical releases<br>
 142       <a href="http://gitweb.corvix.eu/corvix.git?a=snapshot;h=26ba9b9d6094f6abfe923cb34cecfe319e51d862;sf=tgz">v0.1.2</a>
 143    <li>GIT repository<br>
 144       You may download the
 145       <a href="http://gitweb.corvix.eu/corvix.git?a=snapshot">Latest Snapshot</a>
 146       via browser or clone the public repository
 147       <pre>
 148 git clone git://git.corvix.eu/corvix.git
 149       </pre>
 150       The repositories are hosted on <a href=http://repo.or.cz>repo.or.cz</a>.
 151 </ul>
 152
 153 <h3>Howto install?</h3>
 154 ...
 155
 156 <h4>Building the universal boot image</h4>
 157 You have to build an universal boot image at first:
 158 <pre>
 159 cd lib/install
 160 ./make_image
 161 </pre>
 162 afterwards there will be a directory structure in <tt class=file>tmp/isoimage</tt>.
 163 Now you can decide what boot method you'd like to use:
 164 <ol>
 165    <li>Boot from CD<br>
 166       <pre>./makebootiso</pre>
 167       and the new boot image will reside in <tt class=file>~/corvix.iso</tt>.
 168       If you have <tt class=file>qemu</tt> installed you may test the image using
 169       <pre>./testbootiso</pre>
 170    <li>Boot from USB stick<br>
 171       <pre>./makebootstick /dev/sdWHATEVERYOURUSBSTICKIS</pre>
 172    <li>PXE boot server<br>
 173
 174 <h4>Mirror options</h4>
 175 <ul>
 176    <li>local (nfs, hdd, cd)
 177    <li>mirror.corvix.eu
 178 </ul>
 179
 180
 181
 182
 183 <h3>Preconfigured installation profiles</h3>
 184
 185 <h4>debian/etch</h4>
 186
 187 <h4>debian/etch preseef</h4>
 188
 189 <h4>corvix</h4>
 190
 191 <h4>corvix - cluster head server</h4>
 192
 193 <h4>corvix - cluster node</h4>
 194
 195
 196 <h2>High-Performance Cluster </h2>
 197
 198 <h3>Planning</h3>
 199
 200 <h4>Choosing the hardware</h4>
 201 In our case we want to run molecular dynamic dimulations; these systems are coupled
 202 loosely and if we choose a good partition (which holds for any task, of course), we can
 203 use cheap hardware to build our cluster.
 204 <br>
 205 Most computers avaliable today have two gigabit ports. As low-latency networking
 206 is still much more costly (~same cost, as one additional node), we choose to
 207 maximize the number of nodes instead. So our choice is dual-gigabit networking.
 208 <br>
 209 Another thing to consider is accessibility of the nodes. For a very small number of nodes,
 210 it is not necessary to have them mounted in a rack. There are very comfortable choices
 211 for terminal access (remote BIOS, KVM swichtes etc). We go the hard way, drop the
 212 costs per node again and have only one screen and one keyboard to plug in; more
 213 was never needed yet. Once the node tries to boot from hdd with pxeboot fallback,
 214 there is no need for terminal access any more. And everything else will be done via
 215 remote access (why should one want to stay in the loud server room?).
 216
 217 <h4>Structural Layout</h4>
 218 We choose the following setup for the server overhead. Firstly we have a head server,
 219 which will provide dhcp, dns, a debian mirror, nis, ganglia and the queue. Furthermore
 220 this is the node, where the administrative stuff is done. It has two gigabit connections
 221 to inside and one connection to outside.
 222 <br>
 223 Two identical login nodes provide user access; each has only one gigabit connection to inside
 224 and one to outside.
 225 <br>
 226 A sum of three nas stations with hard-/software raid provide space. Each of them is connected
 227 with two gigabit ports to the inside.
 228
 229 <h4>Dual Gigabit Networking</h4>
 230 All nodes have two gigabit connections to inside. Each of them is connected in a different
 231 network. These two different networks will decrease the network load.
 232 The dual gigabit network is used in modulo2 mode. Every 2nd node will go through the 2nd
 233 network to the nas stations; this way we circumvene the bottleneck at the nas without loosing
 234 latency through channel bonding.
 235 <br>
 236 We can drop the costs even further by not using stackable switches. The only relevant drawback
 237 will be, that one cannot use all nodes at once for one big task any more. By configuring
 238 the partitions of nodes in the queue performance loss can be prevented.
 239 <br>
 240 Our cluster uses two 48 port and two 24 port switches. Both are linked, because the servers
 241 then only once have to inject into each network. Two sets of nodes are now defined: one for
 242 fast i/o to the nas stations and one for slow (link-bottleneck) i/o.
 243
 244 <h4>Network booting diskless clients</h4>
 245 It may sound nice to have a complete root-over-nfs system booted via pxe. But the documentation
 246 avaliable is very poor. Besides you may observe serious performance problems in booting more
 247 then ten nodes at ones and even worse in running state.
 248 <br>
 249 Our conclusion is here: Use pxeboot to install the nodes automatically. A small local hdd
 250 can also serve as local scratch for fast i/o.
 251
 252 <h4>Hardware Installation</h4>
 253 riefly we summarize some lessons learned. If you choose to use different networks: use
 254 different colors (we didn't). As with number of particles entropy will increase dramatically,
 255 use cable straps and try to fix them at the rack asap. As hdds tend to break notoriously
 256 consider using telescope mounts in a rack. It's also very useful to have spare hdds at hand.
 257
 258
 259 <h4>Software Installation</h4>
 260 We have chosen debian, because we thought it is the most stable distribution and easy
 261 to maintain. After all it doesn't matter at all, what distribution you choose; all
 262 new nodes will be installed from a golden node image using tar.
 263 <br>
 264 Following
 265 the saying <i>If it ain't broke, don't fix it</i>: Our policy is to not change anything
 266 after primary installation. In particular: don't update the distribution. We kept
 267 a snapshot mirror of our distribution on the head server for future software installation.
 268 <br>
 269 From security point of view we have a locked rack with a trusted lan, so security doesn't
 270 matter there. The login nodes have outwards connection and have the recent security
 271 updates out course.
 272 <br>
 273 All important configuration files can be found in corvix. As in such a
 274 cluster much stuff is done hardware specifically we can only provide the important
 275 (portable) configuration files.
 276
 277
 278 <h3>Configurations</h3>
 279
 280 <h4>All Configuration from DHCP</h4>
 281 Our dhcp server supports pxeboot and provides hostnames to the known nodes. Because
 282 we don't want to have redundant configurations, other configurations can be
 283 created automatically from the dhcpd.conf.
 284
 285 <h4>Queue system</h4>
 286 We choose torque and in combination with the maui scheduler. Jobs may have low or high
 287 qos and there is a standing reservation for quickshots in case of an overloaded queue.
 288
 289 <h4>SSH server</h4>
 290 We use hostbased authentification inside the cluster and public key authentification
 291 outside.
 292
 293 <h4>PXE Bootserver</h4>
 294 Using the linux pxeboot, the head server provides debian network install and a custom
 295 install script for adding new nodes in under 4 minutes automatically.
 296
 297
 298 <h3>Further Software</h3>
 299
 300 <h4>Ganglia</h4>
 301
 302 <h4>Rgang</h4>
 303
 304 <h3>Our Cluster</h3>
 305 Here is a list of the hardware we use (not mentioning the switches etc).
 306 This cluster is property of the
 307 <a href="http://www.physik.uni-kl.de/urbassek/">Computational Material Science</a> group.
 308
 309 <table>
 310    <tr>
 311       <td>#</td>
 312       <td>Filing</td>
 313       <td>CPU</td>
 314       <td>Memory</td>
 315       <td>HDDs</td>
 316    </tr>
 317    <tr>
 318       <td>1</td>
 319       <td>Head server</td>
 320       <td>1x AMD Athlon DualCore 1.8GHz</td>
 321       <td>1GB</td>
 322       <td>80GB, 250GB Raid-0</td>
 323    </tr>
 324    <tr>
 325       <td>40</td>
 326       <td>Nodes</td>
 327       <td>2x AMD Opteron DualCore 2.4GHz</td>
 328       <td>8GB</td>
 329       <td>160GB</td>
 330    </tr>
 331    <tr>
 332       <td>1</td>
 333       <td>NAS</td>
 334       <td>1x AMD Opteron DualCore 2GHz</td>
 335       <td>4GB</td>
 336       <td>250GB, 4.1TB Raid-5</td>
 337    </tr>
 338    <tr>
 339       <td>1</td>
 340       <td>NAS</td>
 341       <td>1x Intel Pentium4 2.8GHz</td>
 342       <td>2GB</td>
 343       <td>80GB, 2.1TB Raid-5</td>
 344    </tr>
 345    <tr>
 346       <td>1</td>
 347       <td>NAS</td>
 348       <td>AMD Opteron DualCore 2GHz</td>
 349       <td>1GB</td>
 350       <td>250GB, 1.1TB Raid-5</td>
 351    </tr>
 352    <tr>
 353       <td>2</td>
 354       <td>Login</td>
 355       <td>Intel Pentium 4 2GHz</td>
 356       <td>1 GB</td>
 357       <td>250GB</td>
 358    </tr>
 359 </table>
 360
 361 <h4>Some Pictures</h4>
 362 <table>
 363    <tr align=top>
 364       <td><img src=img/Cluster_1stboot.jpg></td>
 365       <td><img src=img/Cluster_Networking.jpg></td>
 366       <td><img src=img/Cluster_Final.jpg></td>
 367       <td><img src=img/Cluster_Waste.jpg></td>
 368    </tr>
 369    <tr>
 370       <td>The cluster boots up for the 1st time</td>
 371       <td>The more cables, the more entropy you have to fight...</td>
 372       <td>Final cluster</td>
 373       <td>Drawback besides energy consumption: a lot of packing</td>
 374    </tr>
 375 </table>
 376
 377
 378 <h3>Profile: Head Server</h3>
 379
 380 <h3>Profile: NFS Servers</h2>
 381
 382 <h3>Profile: Nodes</h3>
 383
 384
 385 <h2>Legal Stuff</h2>
 386 Of course we reject any kind of warranty. Expect your hardware to explode
 387 and all your data to be deleted.
 388 <br>
 389 For the software snipplet, which are halfway included we have put the license
 390 stuff under /doc/licenses. All software will be downloaded during the build
 391 process and you are responsible on your own not to hurt any lawyers.
 392
 393
 394 <div class=legal>
 395 (C)opyleft G. Ziegenhain, Y. Rosandi 2008-2009<br>
 396 Released under GPLv3 <a href="http://www.gnu.org">gnu.org</a>
 397 </div>
 398
 399 </body>
 400 </html>