README.md

   1 <div id="table-of-contents">
   2 <h2>Table of Contents</h2>
   3 <div id="text-table-of-contents">
   4 <ul>
   5 <li><a href="#sec-1">1. Building the project</a>
   6 <ul>
   7 <li><a href="#sec-1-1">1.1. Dependencies</a></li>
   8 </ul>
   9 </li>
  10 <li><a href="#sec-2">2. Running</a>
  11 <ul>
  12 <li><a href="#sec-2-1">2.1. Running in Qemu</a></li>
  13 <li><a href="#sec-2-2">2.2. Running on real hardware.</a></li>
  14 </ul>
  15 </li>
  16 <li><a href="#sec-3">3. Makefile</a>
  17 <ul>
  18 <li><a href="#sec-3-1">3.1. Targets</a></li>
  19 <li><a href="#sec-3-2">3.2. Aliased Rules</a></li>
  20 </ul>
  21 </li>
  22 <li><a href="#sec-4">4. Project structure</a>
  23 <ul>
  24 <li><a href="#sec-4-1">4.1. Most significant directories and files</a></li>
  25 </ul>
  26 </li>
  27 <li><a href="#sec-5">5. Boot Process</a>
  28 <ul>
  29 <li><a href="#sec-5-1">5.1. Loader</a></li>
  30 <li><a href="#sec-5-2">5.2. Kernel</a>
  31 <ul>
  32 <li><a href="#sec-5-2-1">5.2.1. Stage 1</a></li>
  33 <li><a href="#sec-5-2-2">5.2.2. Stage 2</a></li>
  34 </ul>
  35 </li>
  36 <li><a href="#sec-5-3">5.3. Notes</a></li>
  37 </ul>
  38 </li>
  39 <li><a href="#sec-6">6. MMU</a>
  40 <ul>
  41 <li><a href="#sec-6-1">6.1. Coprocessor 15</a></li>
  42 <li><a href="#sec-6-2">6.2. Translation table</a></li>
  43 <li><a href="#sec-6-3">6.3. Page Table</a></li>
  44 <li><a href="#sec-6-4">6.4. Project specific information</a></li>
  45 <li><a href="#sec-6-5">6.5. Setting up MMU and FlatMap</a></li>
  46 </ul>
  47 </li>
  48 <li><a href="#sec-7">7. Program Status Register</a></li>
  49 <li><a href="#sec-8">8. Ramfs</a>
  50 <ul>
  51 <li><a href="#sec-8-1">8.1. Specification</a></li>
  52 <li><a href="#sec-8-2">8.2. Implementations</a></li>
  53 </ul>
  54 </li>
  55 <li><a href="#sec-9">9. IRQ</a></li>
  56 <li><a href="#sec-10">10. Processor modes</a></li>
  57 <li><a href="#sec-11">11. Process management</a>
  58 <ul>
  59 <li><a href="#sec-11-1">11.1. Scheduler functions</a></li>
  60 </ul>
  61 </li>
  62 <li><a href="#sec-12">12. Linking</a></li>
  63 <li><a href="#sec-13">13. Miscellaneous topics</a>
  64 <ul>
  65 <li><a href="#sec-13-1">13.1. Supervisor calls</a></li>
  66 <li><a href="#sec-13-2">13.2. Utilities</a></li>
  67 <li><a href="#sec-13-3">13.3. Timers</a></li>
  68 <li><a href="#sec-13-4">13.4. UARTs</a></li>
  69 </ul>
  70 </li>
  71 <li><a href="#sec-14">14. Problems faced</a>
  72 <ul>
  73 <li><a href="#sec-14-1">14.1. Ramfs alignment</a></li>
  74 <li><a href="#sec-14-2">14.2. <i>COM</i> section</a></li>
  75 <li><a href="#sec-14-3">14.3. Bare-metal position indeppendent code</a></li>
  76 <li><a href="#sec-14-4">14.4. Linker section naming</a></li>
  77 <li><a href="#sec-14-5">14.5. Context switches</a></li>
  78 <li><a href="#sec-14-6">14.6. Different modes' sp register</a></li>
  79 <li><a href="#sec-14-7">14.7. Swithing between system mode and user mode</a></li>
  80 <li><a href="#sec-14-8">14.8. UART interrupt masking</a></li>
  81 <li><a href="#sec-14-9">14.9. Terminal stdin breaking</a></li>
  82 </ul>
  83 </li>
  84 <li><a href="#sec-15">15. Afterword</a></li>
  85 <li><a href="#sec-16">16. Sources of Information</a></li>
  86 </ul>
  87 </div>
  88 </div>
  89
  90
  91 # Building the project<a id="sec-1" name="sec-1"></a>
  92
  93 ## Dependencies<a id="sec-1-1" name="sec-1-1"></a>
  94
  95 1.  Native GCC (+ binutils)
  96 2.  ARM cross-compiler GCC (+ binutils) (arm-none-eabi works - others
  97     might or might not)
  98 3.  GNU Make
  99 4.  rpi-open-firmware (for running on the Pi)
 100 5.  GNU screen (for communicating with the kernel when running on the Pi)
 101 6.  socat (for communicating with the bootloader when running on the Pi)
 102 7.  Qemu ARM (for emulating the Pi).
 103
 104 For building rpi-open-firmware one will need more tools (not listed
 105 here).
 106
 107 The project has been tested only in Qemu emulating Pi 2 and on real Pi 3 model B.
 108
 109 Running on Pis other than Pi 2 and Pi 3 is sure to require changing the definition in global.h (because peripheral base addresses differ between Pi versions) and might also require other modifications, not known at this time.
 110
 111 Assuming make, gcc, arm-none-eabi-gcc and its binutils are in the PATH, the kernel can be built with:
 112
 113     $ make kernel.img
 114
 115 which is the same as:
 116
 117     $ make
 118
 119 The bootloader can be built with:
 120
 121     $ make loader.img
 122
 123 Both loader and kernel can then be found in build/
 124
 125 # Running<a id="sec-2" name="sec-2"></a>
 126
 127 ## Running in Qemu<a id="sec-2-1" name="sec-2-1"></a>
 128
 129 To run the kernel (passed as elf file) in qemu:
 130
 131     $ make qemu-elf
 132
 133 If You want to pass a binary image to qemu:
 134
 135     $ make qemu-bin
 136
 137 To pass loader image to qemu and pipe kernel to it through emulated uart:
 138
 139     $ make qemu-loader
 140
 141 With qemu-loader the kernel will run, but will be unable to receive any keyboard input.
 142
 143 The timer used by this project is the ARM timer ("based on an ARM
 144 AP804", with registers mapped at 0x7E00B000 in the GPU address space).
 145 It's absent in emulated environment, so no timer interrupts can be
 146 witnessed in qemu.
 147
 148 ## Running on real hardware.<a id="sec-2-2" name="sec-2-2"></a>
 149
 150 First, the rpi-open-firmware has to be built. Then, kernel.img (or
 151 loader.img) should be copied to the SD card (next to bootcode.bin) and renamed to
 152 zImage. Also, the .dtb file corresponding to the Pi model (actually, any .dtb
 153 would do, it is not used right now) from stock firmware files has to be put to the SD
 154 card and renamed as rpi.dtb. Finally, a cmdline.txt has to be present on the SD card
 155 (content doesn't matter).
 156
 157 Now, RaspberryPi can be connected via UART to the development machine. GPIO on the Pi works
 158 with 3.3V, so one should make sure, that UART device on the other end is
 159 also working wih 3.3V. This is the pinout of the RaspberyPi 3 model B
 160 that has been used for testing so far:
 161
 162     Top left of the board is here
 163         |
 164         V
 165         +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 166         | 2| 4| 6| 8|10|12|14|16|18|20|22|24|26|28|30|32|34|36|38|40|
 167         +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 168         | 1| 3| 5| 7| 9|11|13|15|17|19|21|23|25|27|29|31|33|35|37|39|
 169         +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 170
 171 Under rpi-open-firmware (stock firmware might map UARTs differently):
 172
 173 1.  pin 6 is Ground
 174 2.  pin 8 is TX
 175 3.  pin 10 is RX
 176
 177 Once UART is connected, the board can be powered on.
 178
 179 It is assumed, that USB to UART adapter is used and it is seen by the system as /dev/ttyUSB0.
 180
 181 If one copied the kernel to the SD card, they can start communicating
 182 with the board by running:
 183
 184     $ screen /dev/ttyUSB0 115200,cs8,-parenb,-cstopb,-hupcl
 185
 186 If one copied the loader, they can send it the kernel image and start
 187 communicating with the system by running:
 188
 189     $ make run-on-rpi
 190
 191 To run again, one can replug USB to UART adapter and Pi's power supply (order
 192 matters!) and re-enter the command.
 193
 194 Running under stock firmware has not been performed. In particular, the
 195 default configuration on RaspberryPi 3 seems to map other UART than used
 196 by the kernel (so-called miniUART) to pins 6, 8 and 10. This is supposed
 197 to be configurable through the use of overlays.
 198
 199 # Makefile<a id="sec-3" name="sec-3"></a>
 200
 201  To maintain order, all files created with the use of make, that is binaries, object
 202 files, natively executed helper programs, etc. get placed in build/.
 203
 204 Our project contains 2 Makefiles: one in it's root directory and one in
 205 build/. The reason is that it is easier to use Makefile to simply,
 206 elegantly and efficiently produce files in the same directory where it
 207 is. To produce files in directory other than Makefile's own, it requires
 208 this directory to be specified in many rules across the Makefile and in
 209 general it complicates things. Also, a problem arises when trying to
 210 link objects not from within the current directory. If an object is
 211 referenced by name in linker script (which is a frequent practice in our
 212 scripts) and is passed to gcc with a path, then it'd need to also appear
 213 with that path in the linker script. Because of that a Makefile in
 214 build/ is present, that produces files into it's own directory and the
 215 Makefile in project's root is used as a proxy to that first one - it
 216 calls make recursively in build/ with the same target it was called
 217 with. These changes makes it easier to read.
 218
 219 From now on only Makefile in build/ will be discussed.
 220
 221 In the Makefile, variables with the names of certain tools and their
 222 command line flags are defined (using =? assignment, which allows one to
 223 specify their own value of that variable on the command line). In case a
 224 cross-compiler with a different triple should be used, ARM\\<sub>BASE</sub>,
 225 normally set to arm-none-eabi, can be set to something like
 226 arm-linux-gnueabi or even /usr/local/bin/arm-none-eabi.
 227
 228 All variables discussed below are defined using := assignment, which
 229 causes them to only be evaluated once instead of on every reference to
 230 them.
 231
 232 Objects that should be linked together to create each of the .elf files
 233 are listed in their respective variables. I.e. objects to be used for
 234 creating kernel\\<sub>stage2</sub>.elf are all listed in KERNEL\\<sub>STAGE2\\</sub><sub>OBJECTS</sub>.
 235 When adding a new source file to the kernel, it is enough to add it's
 236 respective .o file to that list to make it compile and link properly. No
 237 other Makefile modifications are needed. In a similar fashion,
 238 RAMFS\\<sub>FILES</sub> variable specifies files, that should be put in the ramfs
 239 image, that will be embedded in the kernel. Adding another file only
 240 requires listing it there. However, if the file is to be found somewhere
 241 else that build/, it might be useful to use the vpath directive to tell
 242 make where to look for it.
 243
 244 Variables dirs and dirs\\<sub>colon</sub> are defined to store list of all
 245 directories within src/, separated with spaces and colons, respectively.
 246 dirs\\<sub>colons</sub> are used for vpath directive. 'dirs' variable is used in
 247 ARM\\<sub>FLAGS</sub> to pass all the directories as include search paths to gcc.
 248 empty and space are helper variables - defining dirs\\<sub>colon</sub> could be
 249 achieved without them (but it's clearer this way).
 250
 251 The vpath directive tells make to look for assembler sources, C sources
 252 and linker scripts in all direct and indirect subdirectories of src/
 253 (including itself). All other files shall be found/created in build/.
 254
 255 ## Targets<a id="sec-3-1" name="sec-3-1"></a>
 256
 257 The default target is the binary image of the kernel.
 258
 259 The generic rule for compiling C sources uses cross-compiler or native
 260 compiler with appropriate flags depending on whether the source file is
 261 located somewhere under arm/ directory (which lies in src/) or enywhere
 262 else.
 263
 264 The generic rules for making a stripped binary image out of elf file,
 265 for assembling an assembly file, for making an arbitrary file a linkable
 266 object and for linking objects are ARM-only.
 267
 268 In C world it is possible to embed a file in an executable by using
 269 objcopy to create an object file from it and then linking that object
 270 file into the executable. In this project, at the current time, this is
 271 used only for embedding ramfs in the kernel (incbin is used for
 272 embedding kernel and loader second stages in their first stages).
 273 Generic rule for making a binary image into object file is present, in
 274 case it is needed somewhere else again.
 275
 276 To link elf files, the generic rule is combined with a rule that
 277 specifies the elf's objects. Objects are listed in variables whenever
 278 more than one of them is needed.
 279
 280 At this point in the Makefile, the dependence of objects created from
 281 assembly on files referenced in the assembly source via incbin is
 282 marked.
 283
 284 Simple ram filesystem is created from files it should contain with the
 285 use of our own simple tool - makefs.
 286
 287 Another 2 rules specify how native programs (for the machine we're
 288 working on) are to be linked.
 289
 290 ## Aliased Rules<a id="sec-3-2" name="sec-3-2"></a>
 291
 292 Rule qemu-elf runs the kernel in qemu emulating RaspberryPi 2 with
 293 256MiB of memory by passing the elf file of the kernel to the emulator.
 294
 295 Rule qemu-bin does the same, but passes the binary image of the kernel
 296 to qemu.
 297
 298 Rule qemu-loader does the same, but first passes the binary image of the
 299 bootloader to qemu and the actual kernel is piped to qemu's standard
 300 input, received by bootloader as uart data and run. This method
 301 currently makes it impossible to pass any keyboard input to kernel once
 302 it's running.
 303
 304 Rule run-on-rpi pipes the kernel through uart, assuming it is available
 305 under /dev/ttyUSB0, and then opens a screen session on that interface.
 306 This allows for executing the kernel on the Pi connected through UART,
 307 provided that our bootloader is running on the board.
 308
 309 Rule clean removes all the files generated in build/.
 310
 311 Rules that don't generate files are marked as PHONY.
 312
 313 # Project structure<a id="sec-4" name="sec-4"></a>
 314
 315 Directory structure of the project:
 316
 317     doc/
 318     build/
 319           Makefile
 320     Makefile
 321     src/
 322         lib/
 323             rs232/
 324                   rs232.c
 325                   rs232.h
 326         host/
 327              pipe_image.c
 328              makefs.c
 329         arm/
 330             common/
 331                    svc_interface.h
 332                    strings.c
 333                    io.h
 334                    io.c
 335                    strings.h
 336             PL0/
 337                 PL0_utils.h
 338                 svc.S
 339                 PL0_utils.c
 340                 PL0_test.c
 341                 PL0_test.ld
 342             PL1/
 343                 loader/
 344                        loader_stage2.ld
 345                        loader_stage2.c
 346                        loader_stage1.S
 347                        loader.ld
 348                 kernel/
 349                        demo_functionality.c
 350                        paging.h
 351                        setup.c
 352                        interrupts.h
 353                        interrupt_vector.S
 354                        kernel.ld
 355                        scheduler.h
 356                        atags.c
 357                        translation_table_descriptors.h
 358                        bcmclock.h
 359                        ramfs.c
 360                        kernel_stage1.S
 361                        paging.c
 362                        ramfs.h
 363                        interrupts.c
 364                        armclock.h
 365                        atags.h
 366                        kernel_stage2.ld
 367                        cp_regs.h
 368                        psr.h
 369                        scheduler.c
 370                        memory.h
 371                        demo_functionality.h
 372                 PL1_common/
 373                            global.h
 374                            uart.h
 375                            uart.c
 376
 377 ## Most significant directories and files<a id="sec-4-1" name="sec-4-1"></a>
 378
 379 doc/ Contains documentation of the project.
 380
 381 build/ Contains main Makefile of the project. All objects created during
 382 the build process are placed there.
 383
 384 Makefile Proxies all calls to Makefile in build/.
 385
 386 src/ Contains all sources of the project.
 387
 388 src/host/ Contains sources of helper programs to be compiled using
 389 native GCC and run on the machine where development takes place.
 390
 391 src/arm/ Contains sources to be compiled using ARM cross-compiler GCC
 392 and run on the RaspberryPi.
 393
 394 src/arm/common Contains sources used in both: privileged mode and
 395 unprivileged mode.
 396
 397 src/arm/PL0 Contains sources used exclusively in unprivileged, user-mode
 398 (PL0) program, as well as the program's linker script.
 399
 400 src/arm/PL1 Contains sources used exclusively in privileged (PL1) mode.
 401
 402 src/arm/PL1/loader Contains sources used exclusively in the bootloader,
 403 as well as linker scripts for stages 1 and 2 of this bootloader.
 404
 405 src/arm/PL1/kernel Contains sources used exclusively in the kernel, as
 406 well as linker scripts for stages 1 and 2 of this kernel.
 407
 408 src/arm/PL1/PL1\\<sub>common</sub> Contains sources used in both: kernel and
 409 bootloader.
 410
 411 TODOs Contains what the name suggests, in plain text. It lists things
 412 that still can be implemented or improved, as well as tasks, that were
 413 once listed and have since been completed (in which case they're marked
 414 as done).
 415
 416 # Boot Process<a id="sec-5" name="sec-5"></a>
 417
 418  When RaspberryPi boots, it searches the first
 419 partition on SD card (which should be formatted FAT) for its firmware
 420 and configuration files, loads them and executes them. The firmware then
 421 searches for the kernel image file. The name of the looked for file can
 422 be kernel.img, kernel7.img, kernel8.img (for 64-bit mode) or something
 423 else, depending on configuration and firmware used (rpi-open-firmware
 424 looks for zImage).
 425
 426 The image is then copied to some address and jumped to on all cores.
 427 Address should be 0x8000 for 32-bit kernel, but in reality is 0x2000000
 428 in rpi-open-firmware and 0x10000 in qemu (version 2.9.1). 3 arguments
 429 are passed to the kernel: first (passed in r0) is 0; second (passed in
 430 r1) is machine type; third (passed in r2) is the address of FDT or ATAGS
 431 structure describing the system or 0 as default.
 432
 433 PIs that support aarch64 can also boot directly into 64-bit mode. Then,
 434 the image gets loaded at 0x80000. We're not using 64-bit mode in this
 435 project.
 436
 437 Qemu can be used to emulate RaspberryPi, in which case kernel image and
 438 memory size are provided to the emulator on the command line. Qemu can
 439 also load kernel in the form of an elf file, in which case its load
 440 address is determined based on information in the elf.
 441
 442 Our kernel has been executed on qemu emulating RaspberryPi 2 as well as
 443 on real RaspberryPi 3 running rpi-open firmware (although not every
 444 functionality works everywhere).
 445
 446 ## Loader<a id="sec-5-1" name="sec-5-1"></a>
 447
 448 To quicken running new images of the
 449 kernel on the board, a simple bootloader has been written by us, which
 450 can be run from the SD card instead of the actual kernel. It reads the
 451 kernel image from uart, and executes it. The bootloader can also be used
 452 within qemu, but there are several problems with passing keyboard input
 453 to the kernel once it's running.
 454
 455 It is worth noting, that a project named raspbootin (<https://github.com/mrvn/raspbootin>) exists, which does a very simillar thing.
 456 We did, however, choose to write our own bootloader, which we did.
 457
 458 Bootloader is split into 2 stages.
 459
 460 This is due to the fact, that the the actual kernel
 461 read by it from UART is supposed to be written at 0x8000. If the loader
 462 also ran from 0x8000 or a close address, it could possibly overwrite
 463 it's own code while writing kernel to memory. To avoid this, the first
 464 stage of the loader first copies its second stage embedded in it to
 465 address 0x4000. Then, it jumps to that second stage, which reads kernel
 466 image from uart, writes it at 0x8000 and jumps to it. Arguments (r0, r1,
 467 r2) are preserved and passed to the kernel. Second stage of the
 468 bootloader is intended to be kept small enough to fit between 0x4000 and
 469 0x8000. Atags structure, if present, is guaranteed to end below 0x4000,
 470 so it should not get overwritten by loader's stage2.
 471
 472 The loader protocol is simple: first, size of the kernel is sent through
 473 UART (4 bytes, little endian). Then, the actual kernel image. Our
 474 program pipe\\<sub>image</sub> is used to prepend kernel image with its size.
 475
 476 ## Kernel<a id="sec-5-2" name="sec-5-2"></a>
 477
 478 The kernel is, just like bootloader, split into 2 stages.
 479 It is desired to have image run from 0x0, because that's where the exception vector table is under default
 480 settings. This was the main reason for splitting kernel into 2 parts.
 481
 482 ### Stage 1<a id="sec-5-2-1" name="sec-5-2-1"></a>
 483
 484  Stage 1 is loaded at some higher address. It has second stage
 485 image embedded in it. It copies it to 0x0 and jumps to it. What gets
 486 more complicated compared to loader, is the handling of ATAGS structure.
 487 Before copying stage 2 to 0x0, stage 1 first checks if atags is present
 488 and if so, it is copied to some location high enough, that it won't be
 489 overwritten by stage 2 image. Whenever the memory layout is modified, it
 490 should be checked, if there is a danger of ATAGS being overwritten by
 491 some kernel operations before it is used. In current setup, new location
 492 chosen for ATAGS is always below the memory later used as the stack and
 493 it might overlap memory later used for translation table, which is not a
 494 problem, since kernel only uses ATAGS before filling that table.
 495
 496 When stage 1 of the kernel jumps to second stage, it passes modified
 497 arguments: first argument (r0) remains 0 if ATAGS was found and is set
 498 to 3 to indicate, that ATAGS was not found. Second argument (r2) remains
 499 unchanged. Third argument (r2) is the current address of ATAGS (or
 500 remains unchanged if no ATAGS was found). If support for FDT is added in
 501 the future, it must also be done carefully, so that FDT doesn't get
 502 overwritten.
 503
 504 ### Stage 2<a id="sec-5-2-2" name="sec-5-2-2"></a>
 505
 506  At the start of the stage 2 of the kernel,
 507 there is the interrupt vector table. It's first entry is the reset
 508 vector, which is not normally unused. In our case, when stage 1 jumps to
 509 0x0, first instruction of stage 2, it jumps to that vector, which then
 510 calls the setup routine.
 511
 512 ## Notes<a id="sec-5-3" name="sec-5-3"></a>
 513
 514 In both loader and the kernel, at the beginning of stage1 it is ensured,
 515 that only one ARM core is executing.
 516
 517 It's worth noting, that in first stages the loop that copies the
 518 embedded second stage is intentionally situated after the blob in the
 519 image. This way, this loop will not overwrite itself with the data it is
 520 copying, since the stage 2 is always copied to some lower address. It
 521 copies to 0x0 in case of kernel and to 0x4000 in case of loader - we
 522 assume stage 1 won't be loaded below 0x4000.
 523
 524 Qemu, stock RaspberryPi firmware and rpi-open-firmware all load image at
 525 different addresses. Although stock firmware is not used in this
 526 project, our loader loads kernel at 0x8000, where the stock firmware
 527 would. Because of that, it is desired, that image is able to run,
 528 regardless of where it was loaded at. This was realized by writing first
 529 stages of loader and kernel in careful, position-independent assembly.
 530 The starting address in corresponding linker scripts is irrelevant. The
 531 stage 2 blobs are embedded using .incbin assembly directive. Second
 532 stages are written normally in C and compiled as position-dependent for
 533 their respective addresses.
 534
 535 # MMU<a id="sec-6" name="sec-6"></a>
 536
 537 Here's an explanation of steps we did to enable the MMU and how the MMU
 538 works in general.
 539
 540 MMU stands for Memory Management Unit. It does 2 important things:
 541
 542 1.  It allows programs to use virtual memory addressing. Virtual
 543     addresses are translated by the MMU to physical addresses with the
 544     help of translation table.
 545 2.  It guards against unallowed memory access. Element that only
 546     implements this functionality is called MPU (Memory Protection Unit)
 547     and is also found in some ARM cores.
 548
 549 Without MMU code executing on a processor sees the memory as it really
 550 is.
 551
 552 When it tries to load data from address 0x00AA0F3C it indeed loads data
 553 from 0x00AA0F3C. This doesn't mean address 0x00AA0F3C is in RAM: RAM can
 554 be mapped into the address space in an arbitrary way.
 555
 556 MMU can be configured to "redirect" some range of addresses to some
 557 other range. Let's assume we configured the MMU to translate address
 558 range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now,
 559 code trying to perform operation on address 0x00AA0F3C would have the
 560 address transparently translated to 0x002A0F3C, on which the operation
 561 would actually take place.
 562
 563 The translation affects all (stack and non-stack) data accesses as well
 564 as instruction fetches, hence an entire program can be made to work as
 565 if it was running from some memory address, while in fact it runs from a
 566 different one!
 567
 568 The addresses used by program code are referred to as virtual addresses,
 569 while addresses actually used by the processor - as physical addresses.
 570
 571 This aids operating system's memory management in several ways
 572
 573 1.  A program may by compiled to run from some fixed address and the OS
 574     is still free to choose any physical location to store that program's
 575     code - only a translation of program's required address to that
 576     location's address has to be configured. A problem of simultaneous
 577     execution of multiple programs compiled for the same address is also
 578     avoided in this way.
 579 2.  A consecutive memory region might be required by some program. For
 580     example: due to earlier allocations and deallocactions there isn't a
 581     big enough (no pun intended) free consecutive region of physical
 582     memory. Smaller regions can be mapped to become accessible as a
 583     single region in virtual address space, thus avoiding the need for
 584     defragmentation.
 585
 586 A given mapping can be made valid for only one execution mode (i.e.
 587 region only accessible from privileged mode) or only certain types of
 588 accesses . A memory region can be made non-executable, which guards
 589 against accidental jumping there by program code. That is important for
 590 countering buffer-overflow exploits. An unallowed access triggers a
 591 processor exception, which passes control to an appropriate interrupt
 592 service routine.
 593
 594 In RaspberryPi environments used by us, there are ARMv7-A compatible
 595 processors, which we currently use only in 32-bit mode. Information here
 596 is relevant to those systems (there are Pi boards with both older and
 597 newer processors, with more or less functionality and features
 598 available).
 599
 600 If MMU is present, general configuration of it is done through registers
 601 of the appropriate coprocessor (cp15). Translations are managed through
 602 translation table. It is an array of 32-bit or 64-bit entries (also
 603 called descriptors) describing how their corresponding memory regions
 604 should be mapped. A number of leftmost bits of a virtual address
 605 constitutes an index into the translation table to be used for
 606 translating it. This way no virtual addresses need to be stored in the
 607 table and MMU can perform translations in O(1) time.
 608
 609 ## Coprocessor 15<a id="sec-6-1" name="sec-6-1"></a>
 610
 611 Coprocessor 15 contains several registers, that control the behaviour of
 612 the MMU. They are all accessed through mcr and mrc arm instructions.
 613
 614 1.  SCTLR, System Control Register - "provides the top level control of
 615     the system, including its memory system". Bits of this register
 616     control, among other things, whether the following are enabled:
 617     1.  the MMU
 618     2.  data cache4. TEX remap
 619     3.  instruction cache
 620     4.  TEX remap (changes how some translation table entry bit fields
 621         (called C, B and TEX) are used - not in the project)
 622     5.  access flags (enabling causes one translation table descriptor bit
 623         normally used to specify access permissions of a region to be used
 624         as access flag - not used either)
 625
 626 2.  DACR, Domain Access Control Register - "defines the access permission
 627     for each of the sixteen memory domains". Entries in translation table
 628     define which of available 16 memory domains a memory region belongs
 629     to. Bits of DACR specify what permissions apply to each of the
 630     domains. Possible settings are to allow accesses to regions based on
 631     settings in translation table descriptor or to allow/disallow all
 632     accesses regardless of access permission bits in translation table.
 633
 634 3.  TTBR0, Translation Table Base Register 0 - "holds the base address of
 635     translation table 0, and information about the memory it occupies".
 636     System mode programmer can choose (with respect to some alignment
 637     requirements) where in the physical memory to put the translation
 638     table. Chosen address (actually, only a number of it's leftmost bits)
 639     has to be put in TTBR for the MMU to know where the table lies. Other
 640     bits of this register control some memory attributes relevant for
 641     accesses to table entries by the MMU
 642
 643 4.  TTBR1, Translation Table Base Register 1 - simillar function to TTBR0
 644     (see below for explaination of dual TTBR)
 645 5.  TTBCR, Translation Table Base Control Register, which controls:
 646     1.  How TLBs (Translation Lookaside Buffers) are used. TLBs are a
 647         mechanism of caching translation table entries.
 648     2.  Whether to use some extension feature, that changes traslation
 649         table entries and TTBR\* lengths to 64-bit (we're not using this,
 650         so we won't go into details)
 651     3.  How a translation table is selected.
 652
 653 There can be 2 translation tables and there are 2 cp15 registers (TTBR0
 654 and TTBR1) to hold their base addresses. When 2 tables are in use, then
 655 on each memory access some leftmost bits of virtual address determine
 656 which one should be used. If the bits are all 0s - TTBR0-pointed table
 657 is used. Otherwise - TTBR1 is used. This allows OS developer to use
 658 separate translation tables for kernelspace and userspace (i.e. by
 659 having the kernelspace code run from virtual addresses starting with 1
 660 and userspace code run from virtual addresses starting with 0). A field
 661 of TTBCR determines how many leftmost bits of virtual address are used
 662 for that (and also affects TTBR0 format). In the simplest setup (as in
 663 our project) this number is 0, so only the table specified in TTBR0 is
 664 used.
 665
 666 ## Translation table<a id="sec-6-2" name="sec-6-2"></a>
 667
 668 Translation table consists of 4096 entries, each describing a 1MB memory
 669 region. An entry can be of several types:
 670
 671 1.  Invalid entry - the corresponding virtual addresses can not be used
 672 2.  Section - description of a mapping of 1MB memory region
 673 3.  Supersection - description of a mapping of 16MB memory region, that
 674     has to be repeated 16 times in consecutive memory sections . This can
 675     be used to map to physical addresses higher than 2\\<sup>32</sup>.
 676 4.  Page table - no mapping is given yet, but a page table is pointed.
 677     See below.
 678
 679 Besides, translation table descriptor also specifies:
 680
 681 1.  Access permissions.
 682 2.  Other memory attributes (cacheability, shareability).
 683 3.  Which domain the memory belongs to.
 684
 685 ## Page Table<a id="sec-6-3" name="sec-6-3"></a>
 686
 687 Page table is something simillar to translation table, but it's entries
 688 define smaller regions (called, well - pages). When a translation table
 689 descriptor describing a page table gets used for translation, then entry
 690 in that page table is fetched and used along with some middle bits of
 691 the virtual address used as index. This allows for better granularity of
 692 mappings, as it doesn't require the page tables to occupy space if small
 693 pages are not needed. We could say, that 2-level translations are
 694 performed. On some versions of ARM translations can have more levels
 695 than that. This means the MMU might sometimes need to fetch several
 696 entries from different level tables to compute the physical address.
 697 This is called a translation table walk.
 698
 699 As of 15.01.2020 page tables and small pages are not used in the project
 700 (although programming them is on the TODO list).
 701
 702 ## Project specific information<a id="sec-6-4" name="sec-6-4"></a>
 703
 704 Despite the overwhelming amount of configuration options available, most
 705 can be left deafult and this is how it's done in this project. Those
 706 default settings usually make the MMU behave like it did in older ARM
 707 versions, when some options were not yet available and hence, the entire
 708 system was simpler.
 709
 710 Our project uses C bitfield structs for operating on SCTLR and TTBCR
 711 contents and translation table descriptors. With DACR - bit shifts are
 712 more appropriate and with TTBCR - our default configuration means we're
 713 writing '0' to that register. This is an elegant and readable approach,
 714 yet little-portable across compilers. Current struct definitions work
 715 properly with GCC.
 716
 717 Structs describing SCTLR, DACR and TTBCR are defined in
 718 src/arm/PL1/kernel/cp\\<sub>regs</sub>.h. Structs describing translation table
 719 descriptors are defined in
 720 src/arm/PL1/kernel/translation\\<sub>table\\</sub><sub>descriptors</sub>.h.
 721
 722 Before the MMU is enabled, all memory is seen as it really is.
 723 Therefore, the only feasible way of enabling it is by initially setting
 724 the descriptors in translation table to map all addresses (mapping just
 725 addresses used by the kernel would be enough) to themselves. It is
 726 called a flat map.
 727
 728 ## Setting up MMU and FlatMap<a id="sec-6-5" name="sec-6-5"></a>
 729
 730 How setting up a flat map and turning on the MMU and management of
 731 memory sections is done in our project:
 732
 733 1.  Translation table is defined in the linker script
 734     src/arm/PL1/kernel/kernel\\<sub>stage2</sub>.ld as a NOLOAD section. C code gets
 735     the table's start and end addresses from symbols defined in that
 736     linker script (see arm/PL1/kernel/memory.h).
 737 2.  Function setup\\<sub>flat\\</sub><sub>map</sub>() defined in arm/PL1/kernel/paging.c
 738     enables MMU with a flat map. It prints relevant information to uart
 739     while performing the following procedure:
 740     1.  In a loop write all descriptors to the translation table, set them
 741         as sections, accessible from PL1 only, belonging to domain 0.
 742     2.  Set DACR to allow domain 0 memory accesses, based on translation
 743         table descriptor permissions and block accesses to other domains,
 744         as only domain 0 is used in this project.
 745     3.  Make sure TEX remap, access flag, caches and the MMU are disabled
 746         in SCTLR. Disabling some of them might be unnecessary, because MMU
 747         is assumed to be disabled from the start and enabled caches might
 748         cause no problems as long as only flat map is used. Still, the way
 749         it is done right now is known to work well and optimizations are
 750         not needed.
 751     4.  Clear all caches and TLBs (again, it is suspected that some of
 752         this is unnecessary).
 753     5.  Write TTBCR setting such that only 32-bit translation table is
 754         used.
 755     6.  Make TTBR0 point to the start of translation table. Rest of
 756         attributes in TTBR0 (concerning how table entries are being
 757         accessed) are left as 0s (defaults).
 758     7.  Enable the MMU and caches by setting the appropriate bits in
 759         SCTLR.
 760
 761 After some cp15 register writes, the isb assembly instruction is used,
 762 which causes ARM core to wait until changes take effect. This is done to
 763 prevent some later instructions from being executed before the changes
 764 are applied.
 765
 766 In arm/PL1/kernel/paging.c the function claim\\<sub>and\\</sub><sub>map\\</sub><sub>section</sub>() can
 767 be used to modify an entry in translation table to create a new mapping.
 768 Memory allocation also done in that source file uses some lists to
 769 describe free and taken sections, but has nothing to do with with the
 770 MMU.
 771
 772 # Program Status Register<a id="sec-7" name="sec-7"></a>
 773
 774 CPSR (Current Program Status Register) is a register, bits of which contain and/or determine various aspects of
 775 execution, i.e. condition flags, execution state (arm, thumb or
 776 jazelle), endianness state, execution mode and interrupt mask. This register is readable and writeable with
 777 the use of mrs and msr instructions from any PL1 mode, thus it is
 778 possible to change things like mode or interrupt mask by writing to this
 779 register.
 780
 781 Additionally, there are other registers with the same or simillar bit
 782 fields as CPSR. Those PSRs (Program Status Registers) are:
 783
 784 1.  APSR (Application Program Status Register)
 785 2.  SPSRs (Saved Program Status Registers)
 786
 787 APSR is can be considered the same as CPSR or a view of CPSR, with some
 788 limitations - some bit fields from CPSR are missing (reserved) in APSR.
 789 APSR can be accessed from PL0, while CPSR should only be accessed from
 790 PL1. This was an application program executing in user mode can learn
 791 some of the settings in CPSR without accessing CPSR directly.
 792
 793 SPSR is used for exception handling. Each exception-taking mode has it's
 794 own SPSR (they can be called SPSR\\<sub>sup</sub>, SPSR\\<sub>irq</sub>, etc.). On exception
 795 entry, old contents of CPSR are backed up in entered mode's SPSR.
 796 Instructions used for exception return (subs and ldm \\^), when writing
 797 to the pc, have the important additional effect of copying the SPSR to
 798 CPSR. This way, on return from an exception, processor returns to the
 799 state from before the exception. That includes endianess settings,
 800 execution state, etc.
 801
 802 In our project, the structure of PSRs is defined in terms of C bitfield
 803 structs in src/arm/PL1/kernel/psr.h.
 804
 805 # Ramfs<a id="sec-8" name="sec-8"></a>
 806
 807 A simple ram file system has been introduced to avoid having to embed
 808 too many files in the kernel in the future.
 809
 810 The ram filesystem is created on the development machine and then
 811 embedded into the kernel. Kernel can then parse the ramfs and access
 812 files in it.
 813
 814 Ramfs contains a mapping from file's name to it's size and contents.
 815 Directories, file permissions, etc. as well as writing to filesystem are
 816 not supported.
 817
 818 Currently this is used to access the code of PL0 test program by the
 819 kernel, which it then copies to the appropriate memory location. In case
 820 more user mode programs are later written, they can all be added to
 821 ramfs to enable the kernel to access them easily.
 822
 823 ## Specification<a id="sec-8-1" name="sec-8-1"></a>
 824
 825 When ramfs is accessed in memory, it MUST be aligned to a multiple of 4.
 826
 827 The filesystem itself consists of blocks of data, each containing one
 828 file. Blocks of data in the ramfs come one after another, with the
 829 requirement, that each block starts at a 4-aligned offset/address. If a
 830 block doesn't end at a 4-aligned address, there shall be up to 3
 831 null-bytes of padding after it, so that the next block is properly
 832 aligned.
 833
 834 Each block start with a C (null-terminated) string with the name of the
 835 file it contains. At the first 4-aligned offset after the string, file
 836 size is stored on 4 bytes in little endian. Null-bytes are used for
 837 padding between file name and file size if necessary. Immediately after
 838 the file size reside file contents, that take exactly the amount of
 839 bytes specified in file size.
 840
 841 As obvious from the specification, files bigger than 4GB are not
 842 supported, which is not a problem in the case of this project.
 843
 844 ## Implementations<a id="sec-8-2" name="sec-8-2"></a>
 845
 846 Creation of ramfs is done by the makefs program (src/host/makefs.c). The
 847 program accepts file names as command line arguments, creates a ramfs
 848 containing all those files and writes it to stdout. As makefs is a very
 849 simple tool (just as our ramfs is a simple format), it puts files in
 850 ramfs under the names it got on the command line. No stripping or
 851 normalizing of path is performed. In case of errors (i.e. io errors)
 852 makefs prints information to stderr and exits.
 853
 854 Parsing/reading of ramfs is done by a kernel driver
 855 (src/arm/PL1/kernel/ramfs.c). The driver allows for finding a file in
 856 ramfs by name. File size and pointers to file name string and file
 857 contents are returned through a structure from function find\\<sub>file</sub>.
 858
 859 As ramfs is embedded in kernel image, it is easily accessible to kernel
 860 code. The alignment of ramfs to a multiple of 4 is assured in kernel's
 861 linker script (src/arm/PL1/kernel/kernel\\<sub>stage2</sub>.ld). ## Exceptions
 862 Whenever some illegal operation (attempt to execute undefined
 863 instruction, attempt to access memory with insufficient permission,
 864 etc.) happens or some peripheral device "messages" the ARM core, that
 865 something important happened, an exception occurs. Exception is
 866 something, that pauses normal execution and passes control to the
 867 (specific part of) operating system. Upon an exception, several things
 868 happen:
 869
 870 1.  Change of proocessor mode.
 871 2.  CPSR gets saved into new mode's [SPSR](./PSRs-explained.txt).
 872 3.  pc (incremented by some value) is saved into new mode's lr.
 873 4.  Execution jumps to an entry in the exception vectors table specific
 874     to the exception.
 875
 876 Each exception type is taken to it's specific mode. Types and their
 877 modes are:
 878
 879 1.  Reset and supervisor mode.
 880 2.  Undefined instruction and undefined mode.
 881 3.  Supervisor call and supervisor mode.
 882 4.  Prefetch abort and abort mode.
 883 5.  Data abort and abort mode.
 884 6.  Hypervisor trap and hypervisor mode (not used normally, only with
 885     extensions).
 886 7.  IRQ and IRQ mode.
 887 8.  FIQ and FIQ mode.
 888
 889 The new value of the pc (the address, to which the exception "jumps") is
 890 the address of nth instruction from exceptiom base address, which, under
 891 simplest settings, is 0x0 (bottom of virtual address space). N depends on the exception type. It is:
 892
 893 1.  reset
 894 2.  undefined instruction
 895 3.  supervisor call
 896 4.  prefetch abort
 897 5.  data abort
 898 6.  hypervisor trap (not used here)
 899 7.  IRQ
 900 8.  FIQ
 901
 902 Those 8 instructions constitute the exception vectors table. As the
 903 instruction follow one another, each of them should be a branch to some
 904 exception-handling routine. In fact, on other architectures often the
 905 exception vector table holds raw addresses of where to jump instead of
 906 actual instructions, as here.
 907
 908 Bottom of virtual address space can be changed to some other value by
 909 manipulating the contents of SCTLR and VBAR coprocessor registers.
 910
 911 On exception entry, the registers r0-r12 contain values used by the code
 912 that was executing before. In order for the exception handler to perform
 913 some action and return to that code, those registered can be preserved
 914 in memory. Some compilers can automatically generate appropriate
 915 prologue and epilogue for handler-functions, that will preserve the
 916 right registers (we're not using this feature in our project).
 917
 918 Having old CPSR in SPSR and old pc in lr is helpful, when after handling
 919 the exception, the handler needs to return to the code that was
 920 executing before. There are 2 special instructions, subs and ldm \\^
 921 (load multiple with a dash \\^), that, when used to change the pc (and
 922 therefore perform a jump) cause the SPSR to be copied into CPSR. As bits
 923 of CPSR determine the current execution mode, this causes the mode to be
 924 change to that from before the exception. In short, subs and ldm \\^ are
 925 the instructions to use to return from exceptions.
 926
 927 As noted eariler, upon exception entry an incremented value of pc is
 928 stored in lr. By how much it is incremented, depends on exception type
 929 and execution state. For example, entering undefined instruction
 930 exception for thumb state places in undef's lr the problematic
 931 instruction's address + 2, while taking this exception from ARM state
 932 places in undef's lr that instruction's address + 4 (see full table in
 933 paragraph B1.8.3 of [ARMv7-ar\\<sub>arm</sub>](https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf)).
 934
 935 It's worth noting, that while our
 936 implementation of exception handlers also sets the stack pointer (sp) upon each
 937 exception entry, a kernel could be written, where this wouldn't be done,
 938 as each mode enterable by exception has it's own sp.
 939
 940 # IRQ<a id="sec-9" name="sec-9"></a>
 941
 942   2 of out of all possible exceptions in ARM are IRQ (Interrupt Request) and FIQ (Fast
 943   Interrupt Request). The can be caused by external source, such as
 944 peripheral devices and they can be used to inform the kernel about some
 945 action, that happened.
 946
 947 Interrupts offer an economic way of interacting with peripheral devices.
 948 For example, code can probe UART memory-mapped registers in a loop to
 949 see whether transmitting/receiving of a character finished. However,
 950 this causes the processor needlessly execute the loop and makes it
 951 impossible or difficult to perform another tasks at the same time.
 952 Interrupt can be used instead of probing to "notify" the kernel, that
 953 something it was waiting for just happened. While waiting for interrupt,
 954 the system can be put to halt (i.e. wfi instruction), which helps save
 955 power, or it can perform other actions without wasting processor cycles
 956 in a loop.
 957
 958 An interrupt, that is normally IRQ, can be made into FIQ by ARM system
 959 dependent means. FIQ is meant to be able to be handled faster, by not
 960 having to back up registers r8-r12, that FIQ mode has it's own copies
 961 of. This project only uses IRQ.
 962
 963 Some peripheral devices can be configured (through their memory-mapped
 964 registers) to generate an interrupt under certain conditions (i.e. UART
 965 can generate interrupt when received characters queue fills). The
 966 interrupt can then be either masked or unmasked (sometimes in more than
 967 one peripheral register). If interrupts are enabled in CPSR and a
 968 peripheral device tries to generate one, that is not masked, IRQ (or
 969 FIQ) exception occurs (which causes interrupts to be temporarily masked
 970 in CPSR). The code can usually check, whether an interrupt of given kind
 971 from given device is **pending**, by looking at the appropriate bit of the
 972 appropriate peripheral register (mmio). As long as an interrupt is
 973 pending, re-enabling interrupts (for example via return from IRQ
 974 handler) shall cause the exception to occur again. Removing the source
 975 of the interrupt (i.e. removing characters from UART fifo, that filled)
 976 doesn't usually cause the interrupt to stop pending, in which case a
 977 pending-bit has to be cleared, usually by writing to the appropriate
 978 peripheral register (mmio).
 979
 980 IRQs and FIQs can be configured as vectored - the processor then, upon
 981 interrupt, jumps to different location depending on which interrupt
 982 occured, instead of jumping to the standard IRQ/FIQ vector. This can be used
 983 to speed up interrupt handling. Our simple project does not, however,
 984 use this feature.
 985
 986 Currently, IRQs from 2 sources are used:
 987 [ARM timer IRQ](https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf) and UART IRQs. The kernel makes sure, that timer IRQ only
 988 occurs when processor is in user mode. IRQ handler does not return in
 989 this case - it calls scheduler. The kernel makes sure, that UART IRQ
 990 only occurs, when a process is blocked and is waiting for UART IO
 991 operation. The interrupt handler, when called, checks what type of UART
 992 action happened and tries (through calling of appropriate function from
 993 scheduler.c) to handle that action and, possibly, to unblock the waiting
 994 process. UART IRQ might occur when another process is executing (not
 995 possible now, with only one process, but shall be possible when more
 996 processes are added to the project), in which case it the handler
 997 returns, or when kernel is explicitly waiting for interrupts (because
 998 all processes are blocked), in which case it calls schedule() instead of
 999 returning.
1000
1001 # Processor modes<a id="sec-10" name="sec-10"></a>
1002
1003 ARMv7-A core can be executing in one of several modes (not to be
1004 confused with instruction set states or endianness execution state).
1005 Those are:
1006
1007 1.  User
1008 2.  FIQ
1009 3.  IRQ
1010 4.  Supervisor
1011 5.  Abort
1012 6.  Undefined
1013 7.  System
1014
1015 In fact, there are more if the processor implements some extensions, but
1016 this is irrelevant here.
1017
1018 Current processor mode is encoded in the lowest five bits of the CPSR register.
1019
1020 Processor can operate in one of 2 privilege levels (although, again,
1021 extensions exist, that add more levels):
1022
1023 1.  PL0 - privilege level 0
1024 2.  PL1 - privilege level 1
1025
1026 Processor modes have their assigned privilege levels. User mode has
1027 privilege level 0 and all other modes have privilege level 1. Code
1028 executing in one of privileged modes is allowed to do more things, than
1029 user mode code, i.e. writing and reading some of the coprocessor
1030 registers, executing some privileged instructions (i.e. mrs and msr,
1031 when used to reference CPSR, as well as other modes' registers),
1032 accessing privileged memory and changing the mode (without causing an
1033 interrupt). Attempts to perform those actions in user mode result either
1034 in undefined (within some limits) behaviour or an exception (depending
1035 on what action is considered).
1036
1037 User mode is the one, in which application programs usually run. Other
1038 modes are usually used by the operating system's kernel. Lack of
1039 privileges in user mode allows PL1 code to control execution of PL0
1040 code.
1041
1042 While code executing in PL1 can freely (except switching from system to
1043 user mode, which produces undefined behaviour) change mode by either
1044 writing the CPRS or executing cps instruction, user mode can only be
1045 exitted by means of an interrupt.
1046
1047 Some ARM core registers (i.e. r0 - r7) are shared between modes, while
1048 some are not. In this case, separate modes have their private copies of
1049 those registers. For example, lr and sp in supervisor mode are different
1050 from lr and sp in user mode. For full information about shared and not
1051 shared (banked) registers, see paragraph B9.2.1 in
1052 [armv7-a
1053 manual](https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf). The most important things are that user mode and system mode
1054 share all registers with each other and they don't have their own SPSR
1055 (which is used for returning from exceptions and exceptions are
1056 never taken to those 2 modes) and that all other modes have their own
1057 SPSR, sp and lr.
1058
1059 The reason for having multiple copies of the same register in different
1060 modes is that it simplifies writing interrupt handlers. I.e. supervisor
1061 mode code can safely use sp and lr without destroying the contents of
1062 user mode's sp and lr.
1063
1064 The big number of PL1 modes is supposed to aid in handling of
1065 interrupts. Each kind of interrupt is taken to it's specific mode.
1066
1067 Supervisor mode, in addition to being the mode supervisor calls are
1068 taken to, is the mode the processor is in when the kernel boots.
1069
1070 System mode, which uses the same registers as user mode, is said to have
1071 been added to ARM architecture to ease accessing the unprivileged
1072 registers. For example, setting user mode's sp from supervisor mode can
1073 be done by switching to system mode, setting the sp and switching back
1074 to supervisor mode. Other modes' registers can alternatively be accessed
1075 with the use of mrs and msr assembly instructions (but not from user
1076 mode).
1077
1078 Despite the name, system mode doesn't have to be the mode used most
1079 often by operating system's kernel. In fact, prohibition of direct
1080 switching from system mode to user mode would make extensive use of
1081 system mode impractical. This project, for example, uses supervisor mode
1082 for most of the privileged tasks.
1083
1084 # Process management<a id="sec-11" name="sec-11"></a>
1085
1086   An operating system has
1087   to manage user processes. Our system only has one process right now, but
1088 usual actions, such as context saving or context restoring, are
1089 implemented anyways. The following few paragraphs contain information on
1090 how process management looks like in operating systems in general.
1091
1092 Process might return control to the system by executing the svc (eariler
1093 called swi) instruction. System would then perform some action on behalf
1094 of the process and either return from the supervisor call exception or
1095 attempt to schedule another process to run, in which case context of the
1096 old process would need to be saved for later and context of the new
1097 process would need to be restored.
1098
1099 Process has data in memory (such as it's stack, code) as well as data in
1100 registers (r0-r15, CPSR). Together they constitute process' context.
1101 From process' perspective, context should not unexpectedly change, so
1102 when control is taken away from user mode code (via an exception) and
1103 later (possibly after execution of some other processes) given back, it
1104 should be transparent to the process (except when kernel does something
1105 for the process in terms of supervisor call). In particular, the
1106 contents of core registers should be the same as before. For this to be
1107 achievable, the operating system has to back up process' registers
1108 somewhere in memory and later restore them from that memory.
1109
1110 Operating system kernel maitains a queue of processes waiting for
1111 execution. When a process blocks (for example by waiting for IO), it is
1112 removed from the queue. If a process unblocks (for example because IO
1113 completed) it is added back to the queue. In general, some systems might
1114 complicate it, for example by having more queues, but discussing those
1115 variations is out of scope of this documentation. When processor is
1116 free, one of the processes from the queue (determined by some scheduling
1117 algorithm implemented in the kernel) gets
1118 chosen and run on the processor.
1119
1120 As one process could never use a supervisor call, it could occupy the
1121 processor forever. To remedy this, timer interrupts can be used by the
1122 kernel to interrupt the execution of a process after some time. The
1123 process would then have it's context saved and go to the end of the
1124 queue. Another process would be scheduled to run.
1125
1126 Other exceptions might occur when process is running. Depending on
1127 kernel design, handler of an exception (such as IRQ) might return to the
1128 process or cause another one to be scheduled.
1129
1130 If at some time all processes are blocked waiting, the kernel can wait
1131 for some interrupt to happen, which could possibly unblock some process
1132 (i.e. because IO completed).
1133
1134 While not mentioned earlier, switching between processes' contexts
1135 involves not only saving and restoring of registers, but also changing
1136 the translation table entries to properly map memory regions used by
1137 current process.
1138
1139 In our project, process management is implemented in
1140 src/arm/PL1/kernel/scheduler.c.
1141
1142 A "queue" contains data of the only process (variables PL0\\<sub>regs[]</sub>,
1143 PL0\\<sub>sp</sub>, PL0\\<sub>lr</sub> and PL0\\<sub>PSR</sub>).
1144
1145 ## Scheduler functions<a id="sec-11-1" name="sec-11-1"></a>
1146
1147 Function setup\\<sub>scheduler\\</sub><sub>structures</sub> is supposed to be called before
1148 scheduler is used in any way.
1149
1150 Function schedule\\<sub>new</sub>() creates and runs a new process.
1151
1152 Function schedule\\<sub>wait\\</sub><sub>for\\</sub><sub>output</sub>() causes the current process to
1153 have it's context saved and get blocked waiting for UART to send data.
1154 It is called from supervisor call handler. Function
1155 schedule\\<sub>wait\\</sub><sub>for\\</sub><sub>input</sub>() is similar, but process waits for UART to
1156 receive data.
1157
1158 Function schedule() attempts to select a process (currently the only
1159 one) and run it. If process cannot be run, schedule() waits for
1160 interrupt, that could unblock the process. The interrupt handler would
1161 not return in this case, but rather call schedule() again.
1162
1163 Function scheduler\\<sub>try\\</sub><sub>output</sub>() is supposed to be called by IRQ
1164 handler when UART is ready to transmit more data. It can cause a process
1165 to get unblocked. scheduler\\<sub>try\\</sub><sub>input</sub>() is simillar, but relates to
1166 receiving data.
1167
1168 The following are assured in our design:
1169
1170 1.  When processor is in user mode, interrupts are enabled.
1171 2.  When processor is in system mode, interrupts are disabled, except
1172     when explicitly waiting for the interrupt when process is blocked.
1173 3.  When a process is waiting for input/output, the corresponding IRQ is
1174     unmasked. Otherwise, that IRQ is masked.
1175 4.  If an interrupt from UART occurs during execution of user mode code
1176     (not possible here, as we only have one process, but shall become
1177     possible when proper processes are implemented), the handler shall
1178     return. If that interrupt occurs during execution of PL1 code, it
1179     means it occured in scheduler, that was implicitly waiting for it and
1180     the handler calls scheduler() again instead of returning.
1181 5.  Interrupt from timer is unmasked and set to come whenever a process
1182     gets scheduled to run. Timer interrupt is disabled when in PL1 (when
1183     scheduler is waiting for interrupt, only UART one can come).
1184 6.  A supervisor call requesting an UART operation, that can not be
1185     completed immediately, causes the process to block.
1186
1187 # Linking<a id="sec-12" name="sec-12"></a>
1188
1189 [Linking](https://en.wikipedia.org/wiki/Linker_%28computing%29) is a process of creating an executable, library or another
1190 object file out of object files.
1191 During linking, values previously unknown to the compiler (i.e. what
1192 will be the addresses of external functions/variables, from what address
1193 will the code be executing) might be injected into the code.
1194
1195 Linker script is, among others, used to tell the linker, where in memory
1196 the specific parts of the executable should lie.
1197
1198 In a hosted environment (when building a program to run under an
1199 full-featured operting system, like GNU/Linux), a linker script is
1200 usually provided by the toolchain and used if no other script is
1201 provided. In a bare-metal project, the developer usually has to write
1202 their own linker script, in which they specify the binary image's **load
1203 address** and section layout.
1204
1205 Contents of an object code file or executable (our .o or .elf) are
1206 grouped into sections. Sections have names. Common named are .text
1207 (usually contains code), .data (usually contains statically-allocated
1208 variables initialized to non-zero values), .bss (usually used to reserve
1209 memory for statically allocated variables initialized to zero), .rodata
1210 (usually contains statically-allocated variables, that are not going to
1211 be modified).
1212
1213 In a hosted environment, when an executable (say, of elf format) is
1214 executed, contents of it's sections are usually placed in different
1215 memory segments with different access privileges, so that, for example,
1216 code is not writable and variable contents are not executable. This
1217 helps reduce the risk of buffer overflow exploits.
1218
1219 In a bare-environment like ours, we don't execute an elf file directly
1220 (except in qemu, which is the unpreferred approach anyway), but rather a
1221 raw binary image created from an elf file. Still, the notion of section
1222 is used along the way.
1223
1224 During link, one or more object code files are combined into one file
1225 (in our case an executable). Section contents of input files land in
1226 some sections of the output file, in a way defined in the linker script.
1227 In a hosted environment, a linker script would likely put contents of
1228 input .text sections in a .text section, contents of input .data
1229 sections in a .data section, etc. The developer can, however, use
1230 sections with different names (although weird behaviour of some linkers
1231 might occur) and assign their contents in their preferred way using a
1232 linker script.
1233
1234 In linker script it is possible to specify a section as NOLOAD (usually
1235 used for .bss), which, in our case, causes that section not to be
1236 included in the binary image later created with objcopy.
1237
1238 It is also possible to treat same-named input sections differently
1239 depending on what file they came from and even use wildcards when
1240 specifying file names.
1241
1242 Variables can be created, as well as new symbols, which can then be
1243 references from C code.
1244
1245 Defining alignment of specific parts of future image is also easily
1246 achievable.
1247
1248 We made use of all those possibilities in our scripts.
1249
1250 In src/arm/PL1/kernel/kernel\\<sub>stage2</sub>.ld the physical memory layout of
1251 thkernel is defined. Symbols defined there, such as \\<sub>stack\\</sub><sub>end</sub>, are
1252 referenced in C header src/arm/PL1/kernel/memory.h.
1253
1254 While src/arm/PL1/kernel/kernel.ld and src/arm/PL1/loader/loader.ld
1255 define the starting address, it is irrelevant, as the assembly-written
1256 position-independent code for first stages of loader and kernel does not depend on that address.
1257
1258 At the beginning of this project, we had very little understanding of
1259 linker scripts' syntax.
1260 [This article](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION) proved useful and allowed us to learn the required parts in a
1261 short time. As discussing the entire syntax of linker scripts is beyond
1262 the scope of this documentation, we refer the reader to that resource.
1263
1264 # Miscellaneous topics<a id="sec-13" name="sec-13"></a>
1265
1266 ## Supervisor calls<a id="sec-13-1" name="sec-13-1"></a>
1267
1268 Supervisor call happens, when the svc (previously called swi)
1269 instruction get executed. Exception is then entered. Supervisor call is
1270 the standard way for user process to ask the kernel for something. As
1271 user code might request many different things, the kernel must somehow
1272 know which one was requested. The svc instruction takes one immediate
1273 operand. The supervisor call exception handler can check at what address
1274 the execution was, read svc instruction from there and inspect it's
1275 bytes. This way, by executing svc with different immediate values, the
1276 used mode code can request different things from the kernel - the value
1277 in svc shall encode the request's type.
1278
1279 To save time and for the sake of simplicity, we don't make use of
1280 immediades in svc and instead we encode call's type in r0. In our
1281 implementation we decided, that supervisor call will preserve and
1282 clobber the same registers as function call and it will return values
1283 through r0, just as function call. This enables us to use actually
1284 perform the supervisor call as call to function defined in
1285 src/arm/PL0/svc.S. Calls from C are performed in
1286 src/arm/PL0/PL0\\<sub>utils</sub>.c and request type encodings are defined in
1287 src/arm/common/svc\\<sub>interface</sub>.h (they must be known to both user mode
1288 code and handler code).
1289
1290 ## Utilities<a id="sec-13-2" name="sec-13-2"></a>
1291
1292 We've compiled useful utilities (i.e. memcpy(), strlen(), etc.) in
1293 src/arm/common/strings.c. Those Do not depend on the environment and can
1294 be used by both user mode code, kernel code, even bootloader code.
1295 Functions used for io (like puts()) are also defined in common way for
1296 privileged and unprivileged code. They do, however, rely on the
1297 existence of putchar() and getchar(). In PL0 code
1298 (src/arm/PL0/PL0\\<sub>utils</sub>.c), putchar() and getchar() are defined to
1299 perform a supervisor call, that does that operation. In the PL1 code,
1300 they are defined as operations on UART.
1301
1302 ## Timers<a id="sec-13-3" name="sec-13-3"></a>
1303
1304 Several timers are available on the RaspberryPi:
1305
1306 1.  System Timer (with 4 interrupt lines, regarded as the most reliable,
1307     as it is not derived from the system clock and hence is not affecter
1308     by processor power mode changes),
1309     [BCM2837 ARM Peripherals, Chapter 12](https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf)
1310 2.  ARM side Timer (based on a ARM AP804)
1311     [BCM2837 ARM Peripherals, Chapter 14](https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf)
1312 3.  ARM Generic Timer (optional extension to ARMv7-A and ARMv7-R,
1313     configured through coprocessor registers)
1314
1315 At first, we attempted to use the System Timer, some code for which is
1316 still present in src/arm/PL1/kernel/bcmclock.h. The interrupts from that
1317 timer are not, however, routed to any ARM core under rpi-open-firmware,
1318 but rather to the GPU. Because of that, we ended using the ARM side
1319 Timer (programmed in src/arm/PL1/kernel/armclock.h). The ARM side Timer
1320 based on ARM AP804 is currently only available on real hardware and not
1321 in qemu. Programming the ARM Generic Timer (listed in TODOs) could
1322 enable the use of timer interrupts in qemu.
1323
1324 ## UARTs<a id="sec-13-4" name="sec-13-4"></a>
1325
1326 src/arm/PL1/PL1\\<sub>common</sub>/uart.c implements putchar() and getchar() in
1327 terms of UART. Those implementations are blocking - they poll UART
1328 peripheral registers in a loop, checking, if the device is ready to
1329 perform the operation. They are, however, accompanied by functions
1330 getchar\\<sub>non\\</sub><sub>blocking</sub>() and putchar\\<sub>non\\</sub><sub>blocking</sub>(), that check **once**
1331 if the device is ready and only perform the operation if it is.
1332 Otherwise, they return an error value, Their purpose is to use them with
1333 interrupts. In interrupt-driven UART we avoid waiting in a loop -
1334 instead, an IRQ comes when desired UART's operation completes. The code
1335 that wants to write/read from UART, does, however, need to tie it's
1336 operation with IRQ handler and scheduler. Blocking versions should not
1337 be used once UART interrupts are enabled or in exception handlers, that
1338 should always run quickly. However, doing this does not break UART and
1339 might be justified for debugging purposes (like error() function defined
1340 in src/arm/common/io.c and used throughout the kernel code).
1341
1342 There are 2 UARTs in RapsberryPi. One mini UART (also called UART 1) and
1343 one PL011 UART (also called UART 0). The PL011 UART is used exclusively
1344 in this project. The hardware allows some degree of configuration of
1345 which pins which UART is routed to (via so-called alternative
1346 functions). In our project it is assumed, that UART 0's TX and RX are
1347 routed to GPIO pins 14 & 15 by the firmware, which is true for
1348 rpi-open-firmware. With stock Broadcom firmware, either changing the
1349 default configuration (config.txt) or selection of alternative fuctions
1350 as part of uart initialization (present in TODOs list) might be
1351 required.
1352
1353 Before UART can be used, GPIO pins 14 and 15 should have pull up/down
1354 disabled. This is done as part of UART initialization in uart\\<sub>init</sub>() in
1355 src/arm/PL1/PL1\\<sub>common</sub>/uart.c. There is a requirement that UART is
1356 disabled when being configured, which is also fulfilled by uart\\<sub>init</sub>().
1357 The PL011 is toroughly described in
1358 [BCM2837 ARM Peripherals](https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf) as well as [PrimeCell UART (PL011) Technical Reference Manual](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183f/DDI0183.pdf).
1359
1360 # Problems faced<a id="sec-14" name="sec-14"></a>
1361
1362 ## Ramfs alignment<a id="sec-14-1" name="sec-14-1"></a>
1363
1364 Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2\*\*2 alignment. There has to be .=ALIGN(4) line in linker script before ramfs<sub>embeddable</sub>.o. At some point we forgot about it, which caused the ramfs to misbehave.
1365 Bugs located in linker script, like this one, are often non-obvoius. This makes them hard to trace.
1366
1367 ## *COM* section<a id="sec-14-2" name="sec-14-2"></a>
1368
1369 Many sources mention *COMMON* as the section in object files resulting from compilation, that contains some specific kind of uninitialized (0-initialized) data (simillar to .bss). Obviously, it has to be included in the linker script.
1370 Unfortunately, gcc names this section differently, mainly - *COM*. This caused our linker script to not include it in the actual image. Instead, it was placed somewhere after the last section defined in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is. Due to how our memory management algorithm works, this part of physical memory always gets allocated to the first process, which gets it's code copied there.
1371 This bug caused incredibly weird behaviour. The user space code would fail with either abort or undefined instruction, always on the second PL0 instruction. That was because some statically allocated scheduler variable in *COM* was getting mapped at that address. It took probably a few hours of analysing generated assembly in radare2 and modyfying [scheduler.c](../src/arm/PL1/kernel/scheduler.c) and [PL0<sub>test</sub>.c](../src/arm/PL0/PL0<sub>test</sub>.c) to find, that the problem lies in the linker script.
1372
1373 ## Bare-metal position indeppendent code<a id="sec-14-3" name="sec-14-3"></a>
1374
1375 We wanted to make bootloader and kernel able to run regardless of what address they are loaded at (also see comment in [kernel's stage1 linker script](../src/arm/PL1/kernel/kernel.ld)).
1376 To achieve the goal, we added -fPIC to compilation options of all arm code. With this, we decided we can, instead of embedding code in other code using objcopy, put relevant pieces of code in separate linker script sections, link them together and then copy entire sections to some other addresss in runtime. I.e. the exception vector would be linked with the actual kernel (loaded at 0x8000), but the copied along with exception handling routines to 0x0. It did work in 2 cases (of exception vector and libkernel), but once most of the project was modified to use this method of code embedding, it turned out to be faulty and work had to be done to move back to the use of objcopy.
1377 The problem is, -fPIC (as well af -fPIE) requires code to be loaded by some operating system or bootloader, that can fill it's got (global offset table). This is not being done in environment like ours.
1378 It is possible to generate ARM bare-metal position-independent code, that would work without got, but support for this is not implemented in gcc and is not a common feature in general.
1379 The solution was to write stage1 of both bootloader and the kernel in careful, position-independent assembly This required more effort, but was ultimately successful.
1380
1381 ## Linker section naming<a id="sec-14-4" name="sec-14-4"></a>
1382
1383 Weird behaviour occurs, when trying to link object code files with nonstandard section names using GNU linker. Output sections defined in the linker script didn't cause problems in our case. Problems occured when input sections were nonstandard (such as sections generated by using \_<sub>attribute</sub>\_<sub>((section("name")))</sub> in GCC-compiled C code), as they would not be included or would be included in wrong place, despite being explicitly listed for inclusion in the linker script's SECTION command.
1384 At some point, renaming a section from .boot to .text.boot would make the code work properly.
1385
1386 ## Context switches<a id="sec-14-5" name="sec-14-5"></a>
1387
1388 This is a description of a mistake made by us during work on the project.
1389 At first, we didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. Our code would switch to user mode by branching to code in PL0-accessible memory section and having it execute cps instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code.
1390 First improvement was separating that code into "libkernel". Libkernel would be in a PL0-executable but non-writable section and would perform the switch.
1391 It did work, however, it was not the right way.
1392 We later learned how to achieve the same with subs/ldm and removed, making the project a bit simpler.
1393
1394 ## Different modes' sp register<a id="sec-14-6" name="sec-14-6"></a>
1395
1396 System mode has separate stack pointer from supervisor mode, so when upon switch from supervisor to system mode it has to be set to point to the actual stack.
1397 At first we didn't know about that and we had undefined behaviour occur. At some points during the development, changing a line of code in one place would make a bug occur or not occur in some other, unrelated place in the kernel.
1398
1399 ## Swithing between system mode and user mode<a id="sec-14-7" name="sec-14-7"></a>
1400
1401 It is also not allowed (undefined behaviour) to switch from system mode directly to user mode, which we were not aware of and which also caused some problem/bugs.
1402
1403 ## UART interrupt masking<a id="sec-14-8" name="sec-14-8"></a>
1404
1405 Both BCM2835 ARM Peripherals manual and the manual to PL011 UART itself say, that writing 0s to PL011<sub>UART</sub><sub>IMSC</sub> unmasks specific interrupts. Practical experiments showed, that it's the opposite: writing 1s enables specific interrupts and writing 0s disables them.
1406 UART code on wiki.osdev was also written to disable interrupts in the way described in the manuals. The interrrupts were then unmasked instead of masked. This didn't cause problems in practice, as UART interrupts have to also be unmasked elsewhere (register defined ARM<sub>ENABLE</sub><sub>IRQS</sub><sub>2</sub> in [interrupts.h](../src/arm/PL1/kernel/interrupts.h)) to actually occur.
1407
1408 ## Terminal stdin breaking<a id="sec-14-9" name="sec-14-9"></a>
1409
1410 The very simple pipe<sub>image</sub> program breaks stdin when run.
1411 Even other programs run in that same (bash) shell after pipe<sub>image</sub> cannot read from stdin.
1412 In zsh other commands run interactively after pipe<sub>image</sub> do work, but commands executed after pipe<sub>image</sub> inside a shell function still have the problem occur.
1413
1414 # Afterword<a id="sec-15" name="sec-15"></a>
1415
1416 This project has been done as part of the Embedded Systems course on
1417 [AGH University of Science and Technology](https://www.agh.edu.pl/en/). The goal of the project was to investigate and program the
1418 MMU (Memory Management Unit) of the RaspberryPi, but ended up to form a
1419 basis of a small operating system.
1420 [RaspberyPi 3 model B](https://www.raspberrypi.org/products/raspberry-pi-3-model-b/) was the hardware platform used, with stock firmware replaced
1421 with
1422 [rpi-open-firmware](https://github.com/christinaa/rpi-open-firmware).
1423 An emulator, [qemu](https://www.qemu.org/download/) (version 2.9.1)
1424 capable of emulating an older RaspberryPi 2 was also used extensively.
1425
1426 The project was written in C programming language and ARM assembly.
1427 Knowlegde of C is required to understand the code. Knowledge of ARM
1428 assembly is useful, but it should be considered a thing, that can be
1429 learned **while** working with it. Still, the reader should at least have
1430 an idea of what assembly language is and how it is used.
1431
1432 This documentation is intended to provide information on bare-metal
1433 programming on the RapsberryPi and ARM in general, as well as
1434 description of our solutions and implementations. There is a lot of
1435 information available on the topic in online sources, yet it is not always in an
1436 easy-to-understand form and the amount of different options described in
1437 manuals might me overwhelming for people new to the topic. That's why we
1438 attempted to describe our work in a way the audience of bare-metal
1439 programming newcomers will find useful. External resources we used are listed at the end of the documentation.
1440
1441 It is planned, for future years students of the Embedded Systems course,
1442 to have an option to continue or reuse previous projects, such as this
1443 one. We hope this documentation will prove useful to our younger
1444 colleagues who happen to be work with the codebase.
1445
1446 In case on any bugs or questions, the authors can be contacted at kwojtus@protonmail.com.
1447
1448 # Sources of Information<a id="sec-16" name="sec-16"></a>
1449
1450 -   wiki.osdev.org
1451 -   ARM GCC Inline Assembler Cookbook - <http://www.ethernut.de/en/documents/arm-inline-asm.html>
1452 -   ARM Architecture Reference Manual® ARMv7-A and ARMv7-R edition - <https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf> (probably the most useful document of all)
1453 -   dwelch67 repository - <https://github.com/dwelch67/raspberrypi>
1454 -   Booting ARM Linux - <http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html> - very good description of atags
1455 -   BCM2835 ARM Peripherals - <https://github.com/raspberrypi/documentation/blob/master/hardware/raspberrypi/bcm2835/BCM2835-ARM-Peripherals.pdf>
1456     -   BCM2835 datasheet errata -  <https://elinux.org/BCM2835_datasheet_errata>
1457 -   Device Tree Specification - <https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf>
1458 -   online ARM Compiler toolchain Assembler Reference - <http://infocenter.arm.com/help/topic/com.arm.doc.dui0489c/index.html> - useful for it's descriptions of arm instructions, often shows high in search results
1459 -   Christina Brook's rpi-open-firmware - <https://github.com/christinaa/rpi-open-firmware>
1460 -   PrimeCell UART (PL011) Technical Reference Manual - <http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G_uart_pl011_r1p5_trm.pdf>
1461 -   GNU Make Manual - <https://www.gnu.org/software/make/manual/>
1462 -   Red Hat Enterprise Linux 4: Using ld, the Gnu Linker - <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html>