Demilade Sonuga's blog

All posts

In the Beginning is the BIOS

2022-10-24 · 21 min read

What is a computer, exactly, and how does it operate? What is the sequence of steps that were taken before these words you're reading got on the screen? You pressed the power button, then what next? Your OS logo showed up, and in a few seconds, you were looking at your desktop wallpaper. But how was it even able to get that wallpaper to display so clearly on the glass attached to this chunk of metal before you?

While I won't answer all those questions in this post, I will only present to you what you need at the moment (and maybe a bit more).

So it begins when you press that power button. That physical action of yours completes an internal circuit within the computer system which triggers the flow of an electric signal, effectively powering on the computer. A component in this internal circuit has been wired in such a way that when this event occurs, it sends electrical signals to some other part of the circuit to "read" its "contents", which are also just a bunch of electrical signals in some predetermined format that drives the operations of the component. We call this component the processor. This is the main part of the computer system. The manufacturers of this circuit have defined some patterns that groups of electrical signals conform to. When the processor receives signals that conform to this pattern, it performs some "operations" that it has been wired to do. These operations are more along the line of "send a signal to this line", "send a signal to that other line", and "send another signal to this other line", but all this happens very quickly, on the order of millions and millions of times per second that we don't even notice it happening.

The question that comes next: how do we get from "signal here and signal there" to "read contents from memory" and "execute this instruction"? Well, that's not so hard. This circuit does nothing but bounces signals from line to line, but the meaning of those signals is completely determined by us. The component in the circuit which the processor "reads contents" from is just another circuit that has been wired to hold electrical signals for some time. This circuit is the memory. Those signals which conform to the manufacturer-defined patterns that the processor "understands" are the instructions. A bunch of these instructions is what makes up a computer program. So the instructions have meaning because the processor manufacturers have given them meaning. Similarly, memory can hold groups of signals that conform to some other format defined, not by the processor manufacturers, but by us. Just a bunch of signals that have no meaning except that given to it by us. These signals are often called data and they are what the instructions operate on.

And how exactly do the instructions operate on them? Well, that depends on the processor manufacturer's specification of the instruction patterns. The processor could receive instruction from memory and, according to its wiring, interpret some of those signals in the instruction as a memory address. The memory address itself is also just a bunch of electrical signals that when sent to the memory circuit, is interpreted as a unique location where some other signals are stored. The memory then sends those stored signals to the processor, which the processor then interprets as instructions or data, depending on what it has been wired to do upon an encounter with the current instruction.

So I think you should be getting a glimpse of the picture by now. Power flows, signals bounce around and the meaning of those signals depends completely on us (and the manufacturers) and where exactly those signals find themselves in the bigger operations of things.

Okay, that's just a brief merry-go-round on the intricacies of the physical workings of the computer. Now, the point where we begin to elevate into the realms of software is when we begin to symbolize the presence of electrical signals, or their absence, as binary numbers or bits. If you aren't familiar with the binary number system, there's no need to fear. All you need to know is that the binary number system is just another way of writing numbers using only 0s and 1s. So if we define a 1 to mean "presence of an electrical signal" and a 0 to mean "absence of an electrical signal", then any arbitrary string of 0s and 1s, like 11011, can mean signal, signal, no signal, signal, signal. This is how the processor instructions are defined. Remember instructions are just groups of signals in a manufacturer-specified pattern. Well, those patterns of electric signals are defined using binary numbers. So if the processor has been wired to recognize signal, signal, no signal, signal, signal as an instruction, then 11011 will be defined as such an instruction.

All the instructions of a processor pinned down in this binary format are the processor's machine language. At this point, we can consider all these signals in the computer, or their absence, as bits. So memory stores bits. Processors get bits from memory, interpret some as instructions and execute them, and interpret some as data and operate on them. It's bits all the way down.

(The description given above is just a simplified overview of what exactly goes on down there in the hardware. A real description will take us into the realms of physics and electrical engineering. And as many other things in the computer world do, there are exceptions and complications to this simplified description, like microcode for example).

All this is wonderful. But there is a hole in the plot. If processors get instructions from memory after the computer powers on, how did those instructions get there in the first place? Memory holds electrical signals, but how could signals be there since power just began to flow through the circuits? This is, again, elementary. At computer startup, the computer fetches instructions, not from normal memory, but from a special type of read-only memory put in the system by the computer manufacturers. This read-only memory has been pre-loaded with instructions specifically meant to be executed at this early stage in the computer's startup. These instructions make up the program that we call the firmware.

The firmware that we will be dealing with on our computer systems is the BIOS (Basic Input Output System). The BIOS directs the processor to verify the working state of the computer system, then transfers control to another program, the OS. How exactly this control transfer will occur depends on the firmware. In the PC's bad old days, the BIOS transferred control by searching for a bootable drive with some special signature. When it locates one, it loads the first 512 bytes from that drive to some location in memory and loads the processor's instruction address register (super quick processor memory) with the address of the loaded code. This code was typically written in assembly and loaded with an OS written in a higher-level language. This booting process is now called legacy booting. It's legacy because there's now a much better way.

UEFI (Unified Extensible Firmware Interface) is a specification that defines an interface between the OS and the firmware. BIOSes that conform to this specification, rather than loading only 512 bytes of a bootloader, load an EFI application of any size in a specific predefined format. This application could be the OS or a bootloader that will do the job of loading one. It could even be a standalone application, which is what we're aiming to create here. After loading the application, the firmware, which "knows" how to interpret files in this predefined format locates the address of the entry point in the file and loads the processor's instruction register with it, effectively transferring control to the application.

Our Rust compiler which does the actual code generation for us is also familiar with this predefined file format and it's an application in this format that is outputted when we pass the --target=x86_64-unknown-uefi option for compilation. The Rust compiler then determines the entry point of the application by looking for a function with the name efi_main. In the compiler's final output, the address of this function is what is placed into the position in the file defined to hold the application's entry point by the application's format.

Take Away

  • A computer, upon start-up, loads firmware from a read-only memory and executes it.
  • The firmware checks if the computer's components are alright, loads an OS or application, then transfers control of the processor to the OS or application.
  • The entry point of an EFI application in Rust is a function named efi_main.

In the Next Post

We will be running the project in an emulator.

References