bootstrap.md

   1 # Bootstrap/Boot
   2
   3 In general bootstrapping (from the idiom "pull yourself up by your bootstraps"), sometimes shortened to just *booting*, refers to a clever process of self-establishing some relatively complex system starting from something very small, without much external help. As an example imagine something like a "civilization bootstrapping kit" that contains only few primitive tools along with instructions on how to use those tools to mine ore, turn it into metal out of which one makes more tools which will be used to obtain more material and so on up until having basically all modern technology and factories set up in relatively short time ([civboot](civboot.md) is a project like this). The term *bootstrapping* is however especially relevant in relation to [computer](computer.md) technology -- here it has two main meanings:
   4
   5 - The process by which a computer starts and sets up the [operating system](operating_system.md) after power on, which often involves several stages of loading various modules, running several bootloaders etc. This is traditionally called **booting** (*rebooting* means restarting the computer).
   6 - Utilizing the principle of bootstrapping for making greatly independent [software](software.md), i.e. software that doesn't [depend](depend.md) on other software as it can set itself up. This is usually what **bootstrapping** (the longer term) means. This is also greatly related to **[self hosting](self_hosting.md)**, another principle whose idea is to "implement technology using itself".
   7
   8 ## Bootstrapping: Making Dependency-Free Software
   9
  10 Bootstrapping as a general principle can aid us in creation of extremely free technology by greatly minimizing all its [dependencies](dependency.md), we are able to create a small amount of code that will self-establish our whole computing environment with only very small effort during the process. The topic mostly revolves around designing [programming language](programming_language.md) [compilers](compiler.md), but in general we may be talking about bootstrapping whole computing environments, operating systems etc.
  11
  12 **Why be concerned with bootstrapping when we already have our systems set up?** There are many reasons, one of the notable ones is that we may lose our current technology due to societal [collapse](collapse.md), which is not improbable, it keeps happening throughout history over and over, so many people fear (rightfully so) that if by whatever disaster we lose our current computers, Internet etc., we will also lose with it all modern art, data, software we so painfully developed, digitized books, inventions and so on; not talking about the horrors that will follow if we're unable to quickly reestablish our computer networks we are so dependent on. Setting up what we currently have completely from scratch would be extremely difficult, a task for centuries -- just take a while to consider all the activity and knowledge that's required around the globe to create a single computer with all its billions of lines of code worth of software that makes it work. Knowledge of old technology gets lost -- to make modern computers we first needed older, primitive computers, but now that we only have modern computers no one remembers anymore how to make the older computers -- if we lose the current ones, we won't be able to make them, we will lack the tools. Another reason for bootstrapping is independence of technology which brings e.g. [freedom](freedom.md) (your operating system being able to be set up anywhere without some corporation's proprietary driver or hardware unit is pursued by many), robustness, simplicity, ability to bring existing software to new platforms and so on, i.e. things that are practical even in current world.
  13
  14 **[Forth](forth.md) has traditionally been used for making bootstrapping environments**; [Dusk OS](duskos.md) is an example of such project. Similarly simple language such as [Lisp](lisp.md) and [comun](comun.md) will probably work too.
  15
  16 **How to do this then?** To make a computing environment that can bootstrap itself you can do it like this:
  17
  18 1. **Make a [simple](kiss.md) [programming language](programming_language.md) L**. You can choose e.g. the mentioned [Forth](forth.md) but you can even make your own, just remember to keep it extremely simple -- simplicity of the base language is the key feature here. The language will serve as tool for writing software for your platform, i.e. it will provide some comfort in programming (so that you don't have to write in assembly) but mainly it will be an **[abstraction](abstraction.md) layer** for the programs, it will allow them to run on any hardware/platform. The language therefore has to be **[portable](portability.md)**; it should probably abstracts things like [endianness](byte_sex.md), native integer size, control structures etc., so as to work nicely on all [CPUs](cpu.md), but it also mustn't have too much abstraction (such as [OOP](oop.md)) otherwise it will quickly get complicated. The language can compile e.g. to some kind of very simple [bytecode](bytecode.md) that will be easy to translate to any [assembly](assembly.md). At first you'll have to temporarily implement L in some already existing language, e.g. [C](c.md). NOTE: in theory you could just make bytecode, without making L, and just write your software in that bytecode, but the bytecode has to focus on being simple to translate, i.e. it will e.g. likely have few opcodes, which will be in conflict with making it at least somewhat comfortable to program on your platform. However one can try to make some compromise and it will save the complexity of translating language to bytecode, so it can be considered ([uxn](uxn.md) seems to be doing this).
  19 2. **Write L in itself, i.e. [self host](self_hosting.md) it**. This means you'll use L to write a [compiler](compiler.md) of L that outputs L's bytecode. Once you do this, you have a completely independent language and can throw away the original compiler of L written in another language. Now compile L with itself -- you'll get the bytecode of L compiler. At this point you can bootstrap L on any platform as long as you can execute the L bytecode on it -- this is why it was crucial to make L and its bytecode very simple. In theory it's enough to just interpret the bytecode but it's better to translate it to the platform's native machine code so that you get maximum efficiency (the nature of bytecode should make it so that it isn't really more diffiult to translate it than to interpret it). If for example you want to bootstrap on an [x86](x86.md) CPU, you'll have to write a program that translates the bytecode to x86 assembly; if we suppose that at the time of bootstrapping you will only have this x86 computer, you will have to write the translator in x86 assembly manually. If your bytecode really is simple and well made, it shouldn't be hard though (you will mostly be replacing your bytecode opcodes with given platform's machine code opcodes).
  20 3. **Further help make L bootstrapable**. This means making it even easier to execute the L bytecode on any given platform -- you may for example write the bytecode translators for common platforms like x86, ARM, RISC-V and so on. At this point you have L bootstrappable without any [work](work.md) on the platform you have translators for and on others it will just take a tiny bit of work to write its own translator.
  21 4. **Write everything else in L**. This means writing the platform itself and software such as various tools and libraries. You can potentially even use L to write a higher level language for yet more comfort in programming. Since everything here is written in L and L can be bootstrapped, everything here can be bootstrapped as well.
  22
  23 ## Booting: Computer Starting Up
  24
  25 Booting as in "staring computer up" is also a kind of setting up a system from the ground up -- we take it for granted but remember it takes some [work](work.md) to get a computer from being powered off and having all RAM empty to having an operating system loaded, hardware checked and initialized, devices mounted etc.
  26
  27 Starting up a **simple computer** -- such as some [MCU](mcu.md)-based [embedded](embedded.md) [open console](open_console.md) that runs [bare metal](bare_metal.md) programs -- isn't as complicated as booting up a mainstream [PC](pc.md) with an [operating system](operating_system.md).
  28
  29 First let's take a look at the simple computer. It may work e.g. like this: upon start the [CPU](cpu.md) initializes its registers and simply starts executing instructions from some given memory address, let's suppose 0 (you will find this in your CPU's data sheet). Here the memory is often e.g. [flash](flash.md) [ROM](rom.md) to which we can externally upload a program from another computer before we turn the CPU on -- in game consoles this can often be done through [USB](usb.md). So we basically upload the program (e.g. a game) we want to run, turn the console on and it starts running it. However further steps are often added, for example there may really be some small, permanently flashed initial boot program at the initial execution address that will handle some things like initializing hardware (screen, speaker, ...), setting up [interrupts](interrupt.md) and so on (which otherwise would have to always be done by the main program itself) and it can also offer some functionality, for example a simple menu through which the user can select to actually load a program from SD card to flash memory (thanks to which we won't need external computer to reload programs). In this case we won't be uploading our main program to the initial execution address but rather somewhere else -- the initial bootloader will jump to this address once it's done its work.
  30
  31 Now for the PC (the "IBM compatibles"): here things are more complicated due to the complexity of the whole platform, i.e. because we have to load an [operating system](operating_system.md) first, of which there can be several, each of which may be loadable from different storages ([harddisk](hdd.md), USB stick, [network](network.md), ...), also we have more complex [CPU](cpu.md) that has to be set in certain operation mode, we have complex peripherals that need complex initializations etcetc. Generally there's a huge [bloated](bloat.md) boot sequence and PCs infamously take longer and longer to start up despite skyrocketing hardware improvements -- that says something about state of technology. Anyway, it usually it works like this:
  32
  33 { I'm not terribly experienced with this, verify everything. ~drummyfish }
  34
  35 1. Computer is turned on, the CPU starts executing at some initial address (same as with the simple computer).
  36 2. From here CPU jumps to an address at which **stage one [bootloader](bootloader.md)** is located (bootloader is just a program that does the booting and as this is the first one in a line of potentially multiple bootloaders, it's called *stage one*). This address is in the [motherboard](motherboard.md) [ROM](rom.md) and in there typically **[BIOS](bios.md)** (or something similar that may be called e.g. [UEFI](uefi.md), depending on what standard it adheres to) is uploaded, i.e. BIOS is stage one bootloader. BIOS is the first software (we may also call it [firmware](firmware.md)) that gets run, it's uploaded on the motherboard by the manufacturer and isn't supposed to be rewritten by the user, though some based people still rewrite it (ignoring the  "read only" label :D), often to replace it with something more [free](free_software.md) (e.g. [libreboot](libreboot.md)). BIOS is the most basic software that serves to make us able to use the computer at the most basic level without having to flash programs externally, i.e. to let us use keyboard and monitor, let us install an operating system from a CD drive etc. (It also offers a basic environment for programs that want to run before the operating system, but that's not important now.) BIOS is generally different on each computer model, it normally allows us to set up what (which device) the computer will try to load next -- for example we may choose to boot from harddisk or USB flash drive or from a CD. There is often some countdown during which if we don't intervene, the BIOS automatically tries to load what's in its current settings. Let's suppose it is set to boot from harddisk.
  37 3. BIOS performs the *power on self test* (POST) -- basically it makes sure everything is OK, that hardware works etc. If it's so, it continues on (otherwise halts).
  38 4. BIOS loads the **[master boot record](mbr.md)** (MBR, the first sector of the device) from harddisk (or from another mass storage device, depending on its settings) into [RAM](ram.md) and executes it, i.e. it passes control to it. This will typically lead to loading the **second stage bootloader**.
  39 5. The code loaded from MBR is limited by size as it has to fit in one HDD [sector](sector.md) (which used to be only 512 bytes for a long time), so this code is here usually just to load the bigger code of the second stage bootloader from somewhere else and then again pass control to it.
  40 5. Now the second stage bootloader starts -- this is a bootloader whose job it is normally to finally load the actual operating system. Unlike BIOS this bootloader may quite easily be reinstalled by the user -- oftentime installing an operating system will also cause installing some kind of second stage bootloader -- example may be **[GRUB](grub.md)** which is typically installed with [GNU](gnu.md)/[Linux](linux.md) systems. This kind of bootloader may offer the user a choice of multiple operating systems, and possibly have other settings. In any case here the OS [kernel](kernel.md) code is loaded and run.
  41 6. Voila, the kernel now starts running and here it's free to do its own initializations and manage everything, i.e. Linux will start the [PID 1](init.md) process, it will mount filesystems, run initial scripts etcetc.