.
[corvix.git] / doc / corvix / index.html
blobc55dcbd5d20f99f8814972f07ec5d6a178dbba3e
1 <html>
2 <head>
3 <title>corvix GNU/Linux</title>
4 <meta http-equiv="content-type" content="text/html; charset=latin1"/>
5 <link href="style.css" rel=stylesheet type=text/css>
6 <base target="main">
7 <meta name="author" content="Gerolf Ziegenhain">
8 <meta name="keywords" content="corvix linux gnu distribution debian egatrop">
9 </head>
10 <body>
13 <h1>corvix GNU/Linux</h1>
14 Welcome to the webpage of the corvix distribution.
15 <br>
16 <b>Current status:</b>
17 We are merging our working installation
18 into a reproduceable framework. This project is under heavy development
19 and <i>not yet working</i> for the end-user. Nevertheless you already may
20 check out our source tree or add the repositories to your current debian installation.
23 <h2>What is corvix?</h2>
24 Corvus corvax likes to steal; this distribution steals ideas :)
25 It steals ideas in a positive way - we contribute a set of
26 script, which you may use to build up your own high-performance
27 cluster without the need to search for all the packages / documentation
28 yourself. This distribution is the result of the lessons learned by building
29 up a 160+ CPU Linux cluster with several NFS servers from scratch.
30 <br>
31 In fact corvix is less a distribution and more a cheat sheet for installing
32 debian in a scientific environment. After you install the
33 corvix distribution, you have a debian/etch with additional
34 features.
35 <br>
36 Corvix GNU/Linux provides preconfigured profiles for both workstations
37 (i.e. your laptop) and complete clusters. Nevertheless: If you expect
38 a colorful eyecandy system... forget it. Corvix is an advanced
39 distribution; or call it a dirty hack... It works for us ;)
42 <h2>Introduction</h2>
43 Probably you want to read this first, before you decide to try out
44 our stuff.
46 <h3>Motivation</h3>
47 Usually in a scientific environment one likes to have a stable
48 base system (this also could be FreeBSD) and then only update
49 a couple of really important packages to the bleeding edge.
50 Here, debian is chosen as the basis system. The corvix
51 distribution will install a straight debian/etch with a set
52 of common packages selected.
53 <br>
54 Any additional software will be compiled from scratch using
55 a portrage (Gentoo) alike system called <i>egatrop</i>.
58 <h3>Concept</h3>
59 The main concept is: <i>console rocks</i>.
60 <br>
61 There is no further documentation, than this file (which is included
62 in the distribution). For any changes / configuration you may want to
63 do: RTFM.
64 <br>
65 <ul>
66 <li>advanced user
67 <li>textmode
68 <li>full documentation: download git source tree, and
69 read <pre>/doc</pre>
70 </ul>
73 <h2>Package Management</h2>
75 <h3>Debian</h3>
76 For the stable packages we rely completely on the Debian GNU/Linux
77 distribution.
78 <br>
79 You have to put the components in your /etc/apt/sources.list.
81 <h4>Key signatures:</h4>
82 <ol>
83 <li>from keyserver
84 <pre>
85 gpg --keyserver pgpkeys.mit.edu --recv-key 974E7D68
86 gpg -a --export 974E7D68 | sudo apt-key add -
87 </pre>
88 <li>from keyring
89 <pre>
90 apt-get -y --force-yes install corvix-keyring
91 </pre>
92 </ol>
95 <h4>Component: meta</h4>
96 <pre>
97 deb http://corvix.eu testing meta
98 </pre>
99 Contains virtual packages for quick configuration of lots of pcs.
101 <h4>Component: cluster</h4>
102 <pre>
103 deb http://corvix.eu testing cluster
104 </pre>
105 Countains cluster installation packages.
108 <h2>Installation</h2>
109 The installation procedure is done over network. You may download
110 a bunch of scripts, which will build the bootable CD, USB stick
111 or setup a PXE boot server for you. If you want to save bandwidth
112 you may also choose to setup a mirror for the packages;
113 this can be a webserver, a CD or any other storage medium you
114 may access during the installation.
116 <h3>Features of the installer</h3>
117 Linux is about choice. Here, we restrict the features strictly
118 to some working configurations:
119 <ul>
120 <li>Workstation
121 <ul>
122 <li>debootstrapped debian/etch
123 <li>usual debian/etch bootcd
124 </ul>
125 <li>Cluster nodes:
126 <ul>
127 <li>Head server
128 <li>Nodes: will install within under 4 minutes.
129 </ul>
130 </ul>
131 Everything you want to do different: you can work as usual
132 with debian. For custom compiled stuff have a look at the
133 ebuilds.
136 <h3>Obtaining the software</h3>
137 Download the source
139 <h4>Corvix Bootstrap</h4>
140 <ul>
141 <li>Offical releases<br>
142 <a href="http://gitweb.corvix.eu/corvix.git?a=snapshot;h=26ba9b9d6094f6abfe923cb34cecfe319e51d862;sf=tgz">v0.1.2</a>
143 <li>GIT repository<br>
144 You may download the
145 <a href="http://gitweb.corvix.eu/corvix.git?a=snapshot">Latest Snapshot</a>
146 via browser or clone the public repository
147 <pre>
148 git clone git://git.corvix.eu/corvix.git
149 </pre>
150 The repositories are hosted on <a href=http://repo.or.cz>repo.or.cz</a>.
151 </ul>
153 <h3>Howto install?</h3>
156 <h4>Building the universal boot image</h4>
157 You have to build an universal boot image at first:
158 <pre>
159 cd lib/install
160 ./make_image
161 </pre>
162 afterwards there will be a directory structure in <tt class=file>tmp/isoimage</tt>.
163 Now you can decide what boot method you'd like to use:
164 <ol>
165 <li>Boot from CD<br>
166 <pre>./makebootiso</pre>
167 and the new boot image will reside in <tt class=file>~/corvix.iso</tt>.
168 If you have <tt class=file>qemu</tt> installed you may test the image using
169 <pre>./testbootiso</pre>
170 <li>Boot from USB stick<br>
171 <pre>./makebootstick /dev/sdWHATEVERYOURUSBSTICKIS</pre>
172 <li>PXE boot server<br>
174 <h4>Mirror options</h4>
175 <ul>
176 <li>local (nfs, hdd, cd)
177 <li>mirror.corvix.eu
178 </ul>
183 <h3>Preconfigured installation profiles</h3>
185 <h4>debian/etch</h4>
187 <h4>debian/etch preseef</h4>
189 <h4>corvix</h4>
191 <h4>corvix - cluster head server</h4>
193 <h4>corvix - cluster node</h4>
196 <h2>High-Performance Cluster </h2>
198 <h3>Planning</h3>
200 <h4>Choosing the hardware</h4>
201 In our case we want to run molecular dynamic dimulations; these systems are coupled
202 loosely and if we choose a good partition (which holds for any task, of course), we can
203 use cheap hardware to build our cluster.
204 <br>
205 Most computers avaliable today have two gigabit ports. As low-latency networking
206 is still much more costly (~same cost, as one additional node), we choose to
207 maximize the number of nodes instead. So our choice is dual-gigabit networking.
208 <br>
209 Another thing to consider is accessibility of the nodes. For a very small number of nodes,
210 it is not necessary to have them mounted in a rack. There are very comfortable choices
211 for terminal access (remote BIOS, KVM swichtes etc). We go the hard way, drop the
212 costs per node again and have only one screen and one keyboard to plug in; more
213 was never needed yet. Once the node tries to boot from hdd with pxeboot fallback,
214 there is no need for terminal access any more. And everything else will be done via
215 remote access (why should one want to stay in the loud server room?).
217 <h4>Structural Layout</h4>
218 We choose the following setup for the server overhead. Firstly we have a head server,
219 which will provide dhcp, dns, a debian mirror, nis, ganglia and the queue. Furthermore
220 this is the node, where the administrative stuff is done. It has two gigabit connections
221 to inside and one connection to outside.
222 <br>
223 Two identical login nodes provide user access; each has only one gigabit connection to inside
224 and one to outside.
225 <br>
226 A sum of three nas stations with hard-/software raid provide space. Each of them is connected
227 with two gigabit ports to the inside.
229 <h4>Dual Gigabit Networking</h4>
230 All nodes have two gigabit connections to inside. Each of them is connected in a different
231 network. These two different networks will decrease the network load.
232 The dual gigabit network is used in modulo2 mode. Every 2nd node will go through the 2nd
233 network to the nas stations; this way we circumvene the bottleneck at the nas without loosing
234 latency through channel bonding.
235 <br>
236 We can drop the costs even further by not using stackable switches. The only relevant drawback
237 will be, that one cannot use all nodes at once for one big task any more. By configuring
238 the partitions of nodes in the queue performance loss can be prevented.
239 <br>
240 Our cluster uses two 48 port and two 24 port switches. Both are linked, because the servers
241 then only once have to inject into each network. Two sets of nodes are now defined: one for
242 fast i/o to the nas stations and one for slow (link-bottleneck) i/o.
244 <h4>Network booting diskless clients</h4>
245 It may sound nice to have a complete root-over-nfs system booted via pxe. But the documentation
246 avaliable is very poor. Besides you may observe serious performance problems in booting more
247 then ten nodes at ones and even worse in running state.
248 <br>
249 Our conclusion is here: Use pxeboot to install the nodes automatically. A small local hdd
250 can also serve as local scratch for fast i/o.
252 <h4>Hardware Installation</h4>
253 riefly we summarize some lessons learned. If you choose to use different networks: use
254 different colors (we didn't). As with number of particles entropy will increase dramatically,
255 use cable straps and try to fix them at the rack asap. As hdds tend to break notoriously
256 consider using telescope mounts in a rack. It's also very useful to have spare hdds at hand.
259 <h4>Software Installation</h4>
260 We have chosen debian, because we thought it is the most stable distribution and easy
261 to maintain. After all it doesn't matter at all, what distribution you choose; all
262 new nodes will be installed from a golden node image using tar.
263 <br>
264 Following
265 the saying <i>If it ain't broke, don't fix it</i>: Our policy is to not change anything
266 after primary installation. In particular: don't update the distribution. We kept
267 a snapshot mirror of our distribution on the head server for future software installation.
268 <br>
269 From security point of view we have a locked rack with a trusted lan, so security doesn't
270 matter there. The login nodes have outwards connection and have the recent security
271 updates out course.
272 <br>
273 All important configuration files can be found in corvix. As in such a
274 cluster much stuff is done hardware specifically we can only provide the important
275 (portable) configuration files.
278 <h3>Configurations</h3>
280 <h4>All Configuration from DHCP</h4>
281 Our dhcp server supports pxeboot and provides hostnames to the known nodes. Because
282 we don't want to have redundant configurations, other configurations can be
283 created automatically from the dhcpd.conf.
285 <h4>Queue system</h4>
286 We choose torque and in combination with the maui scheduler. Jobs may have low or high
287 qos and there is a standing reservation for quickshots in case of an overloaded queue.
289 <h4>SSH server</h4>
290 We use hostbased authentification inside the cluster and public key authentification
291 outside.
293 <h4>PXE Bootserver</h4>
294 Using the linux pxeboot, the head server provides debian network install and a custom
295 install script for adding new nodes in under 4 minutes automatically.
298 <h3>Further Software</h3>
300 <h4>Ganglia</h4>
302 <h4>Rgang</h4>
304 <h3>Our Cluster</h3>
305 Here is a list of the hardware we use (not mentioning the switches etc).
306 This cluster is property of the
307 <a href="http://www.physik.uni-kl.de/urbassek/">Computational Material Science</a> group.
309 <table>
310 <tr>
311 <td>#</td>
312 <td>Filing</td>
313 <td>CPU</td>
314 <td>Memory</td>
315 <td>HDDs</td>
316 </tr>
317 <tr>
318 <td>1</td>
319 <td>Head server</td>
320 <td>1x AMD Athlon DualCore 1.8GHz</td>
321 <td>1GB</td>
322 <td>80GB, 250GB Raid-0</td>
323 </tr>
324 <tr>
325 <td>40</td>
326 <td>Nodes</td>
327 <td>2x AMD Opteron DualCore 2.4GHz</td>
328 <td>8GB</td>
329 <td>160GB</td>
330 </tr>
331 <tr>
332 <td>1</td>
333 <td>NAS</td>
334 <td>1x AMD Opteron DualCore 2GHz</td>
335 <td>4GB</td>
336 <td>250GB, 4.1TB Raid-5</td>
337 </tr>
338 <tr>
339 <td>1</td>
340 <td>NAS</td>
341 <td>1x Intel Pentium4 2.8GHz</td>
342 <td>2GB</td>
343 <td>80GB, 2.1TB Raid-5</td>
344 </tr>
345 <tr>
346 <td>1</td>
347 <td>NAS</td>
348 <td>AMD Opteron DualCore 2GHz</td>
349 <td>1GB</td>
350 <td>250GB, 1.1TB Raid-5</td>
351 </tr>
352 <tr>
353 <td>2</td>
354 <td>Login</td>
355 <td>Intel Pentium 4 2GHz</td>
356 <td>1 GB</td>
357 <td>250GB</td>
358 </tr>
359 </table>
361 <h4>Some Pictures</h4>
362 <table>
363 <tr align=top>
364 <td><img src=img/Cluster_1stboot.jpg></td>
365 <td><img src=img/Cluster_Networking.jpg></td>
366 <td><img src=img/Cluster_Final.jpg></td>
367 <td><img src=img/Cluster_Waste.jpg></td>
368 </tr>
369 <tr>
370 <td>The cluster boots up for the 1st time</td>
371 <td>The more cables, the more entropy you have to fight...</td>
372 <td>Final cluster</td>
373 <td>Drawback besides energy consumption: a lot of packing</td>
374 </tr>
375 </table>
378 <h3>Profile: Head Server</h3>
380 <h3>Profile: NFS Servers</h2>
382 <h3>Profile: Nodes</h3>
385 <h2>Legal Stuff</h2>
386 Of course we reject any kind of warranty. Expect your hardware to explode
387 and all your data to be deleted.
388 <br>
389 For the software snipplet, which are halfway included we have put the license
390 stuff under /doc/licenses. All software will be downloaded during the build
391 process and you are responsible on your own not to hurt any lawyers.
394 <div class=legal>
395 (C)opyleft G. Ziegenhain, Y. Rosandi 2008-2009<br>
396 Released under GPLv3 <a href="http://www.gnu.org">gnu.org</a>
397 </div>
399 </body>
400 </html>