Demilade Sonuga's blog

All posts

Refactoring I

2022-11-28 · 55 min read

In the previous posts, we did a lot. We modeled the Graphics Output Protocol, switched to a graphics mode, designed a font and printed "Hello World!" with our font in our graphics mode. But, at this point, there are a lot of things that are just off with the code.

I mean, just take a look at the main.rs's efi_main. It's terrible. It's barely readable. It's repetitive. And worst of all, it uses unsafe in so many places that could have been avoided. This is not how we do things as programmers, and this is certainly not how we do things in Rust. If we continue like this, it's likely we will end up hanging ourselves.

We're going to resolve this with some refactoring. If you aren't familiar with this term, let's just say it means "systematic code cleaning".

Let's start by taking a good look at our messiest code, the efi_main.

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let boot_services = unsafe { (*sys_table).boot_services };
let gop_guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};
// ...OTHERS
}

This first line dereferences a raw pointer (unsafe) to get another raw pointer (which will be unsafe to dereference) to the Boot Services table. This second line instantiates a Guid, which never gets changed.

For now, we aren't going to bother ourselves with the unsafe stuff. We'll leave that for later. Taking a good look at this gop_guid, we need to ask ourselves: is there a better way of expressing this in our code? This gop_guid, once instantiated never changes. In fact, this GUID is nothing but a 128-bit value that always uniquely identifies the Graphics Output Protocol. There is no reason for it to change. Now, we ask ourselves: is there some construct in our code that expresses never changing values?

Of course, there is and we all know it. Rather than declaring gop_guid with let, which is used for regular local variables, we declare it with const, to indicate that this is a never changing value.

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let boot_services = unsafe { (*sys_table).boot_services };
/* DELETED:
let gop_guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};
*/

// NEW:
const gop_guid: Guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};
// ...OTHERS
}

Now, our gop_guid is declared as const, rather than let. This doesn't seem like much of an improvement at the moment, but an accumulation of several improvements will eventually show in the code, so let's not worry about that for now.

Let's check if our code still compiles:

cargo build

And it does, although with a number of lints, but no compiler errors.

As a matter of fact, now that our gop_guid is a const, we no longer need to declare it in the efi_main function. We can declare it outside, reducing the clog in the function.

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let boot_services = unsafe { (*sys_table).boot_services };
/* DELETED:
const gop_guid: Guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};
*/

// ...OTHERS
}

// NEW:
// The Graphics Output Protocol (GOP) GUID
const gop_guid: Guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};

Checking if our code still compiles:

cargo check

And it does. Now, our efi_main is a few lines lesser and the code is still working. We're making progress. There's one more thing we still have to do with our gop_guid. It has to be in uppercase letters like GOP_GUID, Why? Because, in Rust, this is the naming convention for constants.

/* DELETED:
const gop_guid: Guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};
*/

const GOP_GUID: Guid = Guid {
first_chunk: 0x9042a9de,
second_chunk: 0x23dc,
third_chunk: 0x4a38,
other_chunks: [0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a],
};

But now, for our code to compile again, we have to make another change.

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let boot_services = unsafe { (*sys_table).boot_services };
let mut gop: *mut core::ffi::c_void = core::ptr::null_mut();

// DELETED: let guid_ptr = &gop_guid as *const Guid;
let guid_ptr = &GOP_GUID as *const Guid;

let registration = core::ptr::null_mut();
// ...OTHERS
}

Checking if the code compiles:

cargo check

Okay. It does.

Another look at our code:

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

let guid_ptr = &GOP_GUID as *const Guid;
let registration = core::ptr::null_mut();
let gop_ptr = &mut gop as *mut _;
let locate_gop_status =
unsafe { ((*boot_services).locate_protocol)(guid_ptr, registration, gop_ptr) };

if locate_gop_status != 0 {
let mut string_u16 = [0u16; 22];
// The string as a string slice
let string = "Failed to locate GOP\n";
// Converting the string slice to UTF-16 characters and placing the characters
// in the array
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
// Getting the pointer to the Simple Text Output Protocol from the System Table
let simple_text_output = unsafe { (*sys_table).simple_text_output };
// Getting the output_string function from the Simple Text Output Protocol and
// calling it with the required parameters to print "Failed to locate GOP\n"
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}

// ...OTHERS
}

Okay, more messy code.

The locate_gop_status assignment involves dereferencing a raw pointer (unsafe) to the Boot Services table. We'll deal with this later.

This if condition: if locate_gop_status != 0 compares locate_gop_status to 0. But why? If you can't remember, 0, in the UEFI spec, means success. So, what this condition actually says is: "if locate_gop_status is not success". In fact, this comparison with status codes to check if their successful happens quite a few times in our code:

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

if locate_gop_status != 0 {
// ...DO THIS
}

// ...OTHERS

if query_status != 0 {
// ...DO THIS
}

// ...OTHERS

if set_mode_status != 0 {
// ...DO THIS
}

// ...OTHERS
}

You know, it's very easy to forget the meaning of that 0 we're comparing the status codes to. Our code is not immediately understandable from just looking at it. Rather than putting just a 0 that could mean anything, we'll be better off naming the 0 with a name that will instantly remind us of the 0's meaning and use that named 0 whenever we want to compare status codes.

// The status code that UEFI defines as a success
const STATUS_SUCCESS: Status = 0;

Then:

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

// DELETED: if locate_gop_status != 0 {
if locate_gop_status != STATUS_SUCCESS {
// ...DO THIS
}

// ...OTHERS

// DELETED: if query_status != 0 {
if query_status != STATUS_SUCCESS {
// ...DO THIS
}

// ...OTHERS

// DELETED: if set_mode_status != 0 {
if set_mode_status != STATUS_SUCCESS {
// ...DO THIS
}

// ...OTHERS
}

That's so much better. We're still comparing those status codes with 0. As a matter of fact, during compilation, Rust will replace every place we used STATUS_SUCCESS with 0, so it will be like STATUS_SUCCESS never existed. But this is beneficial to us because, now, the code doesn't just tell us what, but also, why. Before, from looking at the code, we knew that the code under those if conditions will execute if the status codes weren't 0, but with this little change, we can tell from the code that the if condition codes will only execute if the status codes weren't successful ones.

Continuing with our code analysis:

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

if locate_gop_status != STATUS_SUCCESS {
let mut string_u16 = [0u16; 22];
let string = "Failed to locate GOP\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}

// ...OTHERS
}

This if's body is clustered unnecessarily and it's not hard to see why. All the code in this body is saying:

  1. Print "Failed to locate GOP\n"
  2. Halt (Loop forever)

And that's it. That's what all the code in this if's body does, but it's so unclear from looking at it, with the array declaration and function calling and unsafe pointer dereferencing. Apart from that, this pattern:

let mut string_u16 = [0u16; STRING_LENGTH+1];
let string = "A string";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}

gets repeated so many times.

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

if locate_gop_status != STATUS_SUCCESS {
let mut string_u16 = [0u16; 22];
let string = "Failed to locate GOP\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}

// ...OTHERS

if query_status != STATUS_SUCCESS {
let mut string_u16 = [0u16; 19];
let string = "query_mode failed\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}

// ...OTHERS

if mode_number == max_mode - 1 {
let mut string_u16 = [0u16; 32];
let string = "Couldn't find the desired mode\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}

// ...OTHERS

if set_mode_status != STATUS_SUCCESS {
let mut string_u16 = [0u16; 32];
let string = "Failed to set the desired mode\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
loop {}
}


// ...OTHERS
}

That is a total of four almost complete duplicates of code and they all have the same meaning:

  1. Print string
  2. Halt

To clean this up, we need to ask ourselves this question: what construct does Rust have to offer that can help us to express these simple ideas more concisely, without so much convolution? Make sure you answer this question yourself.

Functions. That's all we need. Functions. Taking a good look at all these repetitions, you'll see that the only thing that differs throughout is the string that's being printed. So, we need a function that takes a string (string slice to be more specific) and prints out that string. Such a function will turn

let mut string_u16 = [0u16; STRING_LENGTH+1];
let string = "A string";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}

in several places into

print_str("A string")

In the last post, we created such a function, but that function can't be used here because it was written with the graphics mode in mind and the code we're looking at here is pre-graphics mode code.

Now, we need to ask ourselves: apart from the string itself, what else does this function need to carry out the print operation? After answering this question, we need to make a pre_graphics_print_str function that will take all that's needed to print and print out the string.

Stop for a moment, answer the relevant questions and come up with the function yourself. (Remember that the string must be null-terminated (end with 0) for the output_string function to print it).

Writing Our pre_graphics_print_str Function

To answer the question of what this function needs to print:

  1. The string, which is what we're printing.
  2. The Simple Text Output Protocol's output_string function, which is what we'll use to do the actual printing on the screen.

So the function will look something like this:

fn pre_graphics_print_str(simple_text_output, s)
    print the string with simple_text_output.output_string

Now, the next question: how exactly will we print this with the output_string function?

In our previous approach, we printed a string by first allocating a buffer of the string's length + 1. That +1 was necessary because the `output_string`` function expects null-terminated strings, so one more position was needed to add that last 0. But the problem now is the length of arrays needs to be determined at compile time. We can't just do what we did before, that is declare an array of length string-length + 1 and fill it with the UTF-16 encoded version of the string. We don't know the string length at compile time anymore.

Stop and think about how you'll resolve this yourself.

There are many ways we could resolve this, but this is the approach we'll be using here:

Firstly, you have to remember that a string is just a sequence of characters. So, to print "Hello World!", you first print 'H', then print 'e', then print 'l', then print another 'l', and so on until '!'.

So, from this information, it's clear that we don't need to know how to print out the whole string at a time but rather, we just need to know how to print a single character, then use that procedure for printing characters to print the string's characters one by one in order.

Now, we need a print_char procedure, or more specifically, a pre_graphics_print_char procedure.

Writing a pre_graphics_print_char procedure will be much easier than writing a pre_graphics_print_str because, in this case, we already know the length of all the inputs: 1. The length of a string with a single character is one. Since, the output_string function takes a null-terminated string as input, then printing a single character is simply a matter of creating a null-terminated string with that single character as its sole character.

fn pre_graphics_print_char(simple_text_output, c)
    utf16_encoded_buffer = [c encoded as UTF-16, 0]
    simple_text_output.output_string(simple_text_output, utf16_encoded_buffer)

Translating to Rust:

// Prints a single character before the switch to graphics mode
fn pre_graphics_print_char(simple_text_output: *mut SimpleTextOutput, c: char) {
// The UTF-16 encoded string which contains the single character to be printed
let mut utf16_encoded_buffer = [c as u16, 0];
unsafe { ((*simple_text_output).output_string)(simple_text_output, utf16_encoded_buffer.as_mut_ptr()) };
}

And our pre_graphics_print_str procedure will now look like this:

fn pre_graphics_print_str(simple_text_output, s)
    for each character c from left to right in s
        pre_graphics_print_char(simple_text_output, c)

Translating to Rust:

// Prints a string slice before the switch to a graphics mode
fn pre_graphics_print_str(simple_text_output: *mut SimpleTextOutput, s: &str) {
for c in s.chars() {
pre_graphics_print_char(simple_text_output, c);
}
}

And that's it for our pre_graphics_print_str function.

Before we refactor our code with our new pre_graphics_print_str function, let's check if it's working as expected. Temporarily comment out the current efi_main and create a new one.

/* COMMENTED OUT:
#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS
}
*/


// NEW:
#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "It's working!\n");
0
}

Upon building and running, your screen should look like this:

pre_graphics_print_str is Working

Now, delete that little experiment and uncomment our real efi_main function.

// UNCOMMENTED OUT:
#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS
}

/* DELETED:
#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "It's working!\n");
0
}
*/

Refactoring with the pre_graphics_print_str Function

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
// ...OTHERS

if locate_gop_status != STATUS_SUCCESS {
/* DELETED:
let mut string_u16 = [0u16; 22];
let string = "Failed to locate GOP\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
*/

let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}

// ...OTHERS

if query_status != STATUS_SUCCESS {
/* DELETED:
let mut string_u16 = [0u16; 19];
let string = "query_mode failed\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
*/

let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}

// ...OTHERS

if mode_number == max_mode - 1 {
/* DELETED:
let mut string_u16 = [0u16; 32];
let string = "Couldn't find the desired mode\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
*/

let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}

// ...OTHERS

if set_mode_status != STATUS_SUCCESS {
/* DELETED:
let mut string_u16 = [0u16; 32];
let string = "Failed to set the desired mode\n";
string
.encode_utf16()
.enumerate()
.for_each(|(i, letter)| string_u16[i] = letter);
let simple_text_output = unsafe { (*sys_table).simple_text_output };
unsafe {
((*simple_text_output).output_string)(simple_text_output, string_u16.as_mut_ptr());
}
*/

let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to set the desired mode\n");
loop {}
}

// ...OTHERS
}

Building and running, the code still compiles and prints "Hello World!".

Our code is so much better now.

Take Away

  • Refactoring is systematically making your code cleaner (readable, understandable, easier to see meaning).

Code till now:

Directory view:

blasterball/
| .cargo/
| | config.toml
| src/
| | font.rs
| | main.rs
| .gitignore
| Cargo.lock
| Cargo.toml

main.rs contents

// ... Before efi_main

#[no_mangle]
extern "efiapi" fn efi_main(
handle: *const core::ffi::c_void,
sys_table: *mut SystemTable,
) -> usize {
let boot_services = unsafe { (*sys_table).boot_services };
let mut gop: *mut core::ffi::c_void = core::ptr::null_mut();
let guid_ptr = &GOP_GUID as *const Guid;
let registration = core::ptr::null_mut();
let gop_ptr = &mut gop as *mut _;
let locate_gop_status =
unsafe { ((*boot_services).locate_protocol)(guid_ptr, registration, gop_ptr) };

if locate_gop_status != STATUS_SUCCESS {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}

let gop = gop as *mut GraphicsOutput;
let mode = unsafe { (*gop).mode };
let max_mode = unsafe { (*mode).max_mode };
let mut desired_mode = 0;
for mode_number in 0..max_mode {
let size_of_info = core::mem::size_of::<GraphicsModeInfo>();
let mut mode: *const GraphicsModeInfo = core::ptr::null_mut();
let query_mode = unsafe { (*gop).query_mode };
let query_status = (query_mode)(
gop,
mode_number,
&size_of_info as *const _,
&mut mode as *mut _,
);
if query_status != STATUS_SUCCESS {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}

let horizontal_resolution = unsafe { (*mode).horizontal_resolution };
let vertical_resolution = unsafe { (*mode).vertical_resolution };
let pixel_format = unsafe { (*mode).pixel_format };
if horizontal_resolution == 640
&& vertical_resolution == 480
&& pixel_format == PixelFormat::BlueGreenRedReserved
{
desired_mode = mode_number;
break;
}
if mode_number == max_mode - 1 {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to locate GOP\n");
loop {}
}
}

let set_mode_status = unsafe { ((*gop).set_mode)(gop, desired_mode) };

if set_mode_status != STATUS_SUCCESS {
let simple_text_output = unsafe { (*sys_table).simple_text_output };
pre_graphics_print_str(simple_text_output, "Failed to set the desired mode\n");
loop {}
}

let framebuffer_base = unsafe { (*mode).framebuffer_base };

let screen = framebuffer_base as *mut Screen;

print_str(screen, "Hello World!");

0
}

// ... Others

In the Next Post

We'll be taking a closer look at unsafety in Rust

References