All posts

The Global Descriptor Table II

2023-01-26

In the previous post, we took a look at the GDT in a lot of detail. At the end of the post, you were supposed to come up with a Task State Segment Descriptor describing a segment starting at address 0x123fc and is 13Kib in size. Before we get on to modeling the GDT, let's quickly look at one way of doing this:

## Creating a TSS descriptor

Firstly, the base is 0x123fc, since the base is the starting address. The TSS expects a 64-bit base since it provides 64 bits of space for the base. This base address is 4 * 5 == 20 bits (1 hex digit is 4 bits), so the upper 64 - 20 == 44 bits will be 0. The base will then be 0x00000000000123fc. Bits 16..=39 will be set to 0x0123fc, the lower 24 bits of the base. Bits 56..=63 will be set to all 0s, the upper 40 bits of the base.

The segment size is 13Kib. The value of the limit depends on the granularity. Remember that the granularity can be either 1 byte or 4Kib. If the granularity is 1 byte, then the limit will be the segment size in bytes - 1. 1 Kib == 1024 bytes, so the limit would be 13 * 1024 - 1 == 13311. If the granularity is 4Kib, then the limit will have to be a number that when multiplied by 4 will give 13. This is not possible. There is no number that gives 13 when multiplied by 4. Apparently, the only time when a granularity of 4Kib can be used is when the segment size is a multiple of 4. Because of this, we have to stick with the 1-byte granularity.

From the information above, our limit is 13311, which is 0x33ff is hexadecimal (If you don't know how to convert from decimal to hexadecimal, just Google it) and the granularity is 1 byte. The descriptor is expecting a 20-bit limit, so 0x33ff is padded with 0s to the left to fill up the remaining space: 0x033ff. Bits 0..=15 is set to 0x33ff, the lower 16 bits of the limit. Bits 48..=51 is set to the upper 4 bits of the limit.

As for the access byte, the first 4 bits, bits 0..=3, determine the type of the descriptor and for the TSS, this should be 0b1001, the binary equivalent of 9. This is a system segment, so bit 4 should be 0. Bits 5..=6, the privilege level, should be 0b00, the highest privilege level since the game is its own OS. Finally, bit 7 should be 1, to indicate the segment is valid. The access byte is, therefore, 0b1000_1001 == 0x89.

Bit 52 is reserved, so it's left as 0. The TSS is neither a code nor data segment, so leaving a 0 in bit 53, which tells if a segment is a code segment, should do no harm. Bit 54 is left 0 because it's a 64-bit segment. Bit 55 is set to 0 to indicate that the granularity is 1 byte. Bits 52..=55 == 0b0000 == 0x0.

The reserved bits, bit 52 and bits 96..=127, are left as 0s because they don't mean anything.

From the breakdown just given, the descriptor will look like this:

BitsDescriptor Digits
0..=150x33ff
16..=390x0123fc
40..=470x89
48..=510x0
52..=550x0
56..=95All 0s
96..=127All 0s

The descriptor will be 0x890123fc33ff with 0s padded to the left to make it 128-bit.

## Some Things We Overlooked

Now that we understand the GDT, the next step is to model it but before we go on to do that, we first need to get a grasp on some little requirements, some dictated by x86, that we overlooked.

Firstly, the GDT must be "aligned on an 8-byte boundary", meaning that the GDT's address must be a multiple of 8.

Secondly, the first entry in the GDT must be a 64-bit 0 value. This is termed the null descriptor.

Lastly, for our purposes, we need a code segment descriptor, a data segment descriptor and a Task State Segment descriptor in the GDT.

That's it.

Before moving on, come up with a model for the GDT yourself.

## An Attempt to Model the GDT With An Array of Enums

The GDT can be viewed as an array of descriptors and there are two types of descriptors, so it seems like we could create an enum:

``````enum Descriptor {
System(u128),
NonSystem(u64)
}
``````

to indicate the alternate descriptors and then make an array of these as the GDT:

``````#[repr(C)]
struct GDT {
descriptors: [Descriptor; MAX_NO_OF_ENTRIES]
}
``````

But this won't work. The reason is the data layout of an enum. In Rust, when you have an enum like this:

``````enum NoData {
First,
Second,
Third
}
``````

They are represented purely by integers, the size of which is determined by the compiler. But when you have an enum like this:

``````enum Data {
First(u64),
Second(u64, u64)
}
``````

How this will be laid out in memory is unspecified and is determined by what the compiler thinks is the right way to arrange things. For one thing, the size of an enum is determined by the size of the largest field, so when we use a `Data::First(n)`, there is bound to be a lot of padding to fill up the remaining space. Another thing is that each field in an enum needs another integer value, the discriminant, which tells what variant of the enum is being used. For example, `Data`'s `First` will probably have a discriminant of 0 and `Second`, a discriminant of 1. But the size of the integer used is unspecified. If Rust thinks using a u8 is the right way to go, it'll do that. If it thinks using a u64 is the right way to go, it'll do that instead.

Although Rust provides ways that we can use to control the layout of an enum, it still doesn't fix the problem.

In our case, the structure we're modeling here has very precise requirements. Each value has to be 128 bits or 64 bits, and there shouldn't be any padding or space between them.

## Modeling the GDT With An Array of u64s

Given these problems with modeling the GDT as an array of enums, another thing we could do is just model it as an array of u64s. (I say "another thing" and not "the thing" because this is programming and there is probably some other good way(s) of doing this, but this is just the way we're going here).

This way, when we want to put a non-system segment in the table, it's just a matter of putting a single u64 value. No padding or spacing problems. When we want to put a system segment in the table, we put in the lower u64 first, then the upper u64 in the next position in the array.

In a new file, `gdt.rs`:

``````// The Global Descriptor Table
#[repr(C)]
struct GDT {
// The segment descriptors
descriptors: [u64; MAX_NO_OF_ENTRIES]
}
``````

If you can't remember, the reason we put the `#[repr(C)]` there is because the field order of the struct matters. Those descriptors must be first for this to be a GDT.

At the end of this modeling process, we are eventually going to create 3 descriptors to be inserted into this table: 2 non-system and 1 system. This means that in an array of 64-bit values, to keep these descriptors, we only need 5 entries, 2 for the non-system, 2 for the system (because it's 128 bits, so it takes up 2 entries) and 1 for the null descriptor.

We can use a value of let's say 8 as our `MAX_NO_OF_ENTRIES`, keeping the extra space there just in case of something else.

In `gdt.rs`:

``````// An artificial limit placed on the number of entries that can be placed
// in the GDT's descriptor array, for convenience
const MAX_NO_OF_ENTRIES: usize = 8;
``````

When a new GDT is created, entry 0 must hold the null descriptor, a 64-bit 0. The next index in the array to place a descriptor will be index 1.

If a non-system descriptor is placed in the `descriptors` array at index i, then the next descriptor to be put into the table has to be at index i + 1, since a non-system descriptor is 64 bits and this is an array of 64-bit values.

But if a system descriptor was placed at index i, then the next index to place a descriptor has to be index i + 2, because system descriptors are 128 bits in size and take up 2 entries in the array.

Because of this, we need to keep track of the next index position that a descriptor can be inserted into. When a system descriptor is added, this next index position will be increased by 2. When a non-system descriptor is added, this next index position will be increased by 1.

Adding a field to keep track of the next position to insert a descriptor:

``````#[repr(C)]
struct GDT {
descriptors: [u64; MAX_NO_OF_ENTRIES],
// NEW:
// The next index available to place a descriptor in the GDT
next_index: usize
}
``````

The GDT's first value must always be 0. This means that we must start inserting into the array at index 1. To indicate this, we create a `new` associated function for `GDT` which will serve as its constructor for creating new GDTs and handling proper initialization.

``````impl GDT {
// Creates a new GDT
fn new() -> Self {
Self {
descriptors: [0; MAX_NO_OF_ENTRIES],
// Start inserting at index 1 to keep the first entry
// as a null descriptor
next_index: 1
}
}
}
``````

Whenever a GDT is created with this `new` function, entries will get added from index 1 till the end. This way, the first entry will remain 0 to fulfill the null descriptor requirement.

To add a descriptor, we create another function:

``````impl GDT {
// ... Others

// Adds a segment descriptor to the descriptors array
fn add_descriptor(&mut self, descriptor: system or non-system descriptor) {

}
}
``````

Since adding a descriptor modifies the `GDT`'s `descriptors` array, the first argument to this function has to be `&mut self` indicating that this function is going to be mutating (modifying) the `GDT` instance.

The second argument, `descriptor` is either a system or non-system descriptor.

To indicate the alternate descriptor types:

``````// A segment descriptor
enum Descriptor {
NonSystem(u64),
// (upper 64 bits, lower 64 bits) of segment descriptor
System(u64, u64)
}
``````

This is going to work because rather than storing the enum itself in the array, it's the associated u64s that we'll store.

``````impl GDT {
// ... Others

// DELETED: fn add_descriptor(&mut self, descriptor: system or non-system descriptor) {
fn add_descriptor(&mut self, descriptor: Descriptor) {

}
}
``````

The sequence to follow to add a descriptor is also straightforward. Get it down yourself before reading on.

1. If `descriptor` is a non-system descriptor
1. Check if the descriptors array is full
2. If the descriptors array is full, return an error
3. Else If the descriptors array is not full, place the descriptor's u64 value in the array's `next_index` position and increment `next_index` by 1
2. Else If the descriptor is a system descriptor
1. Check if the `descriptors` array has enough space for the descriptor
2. If it does, place the descriptor's u64 values into the `next_index` and `next_index` + 1 (128-bit value) and increment `next_index` by 2

Translating this to Rust:

``````impl GDT {
// ... Others

// DELETED: fn add_descriptor(&mut self, descriptor: system or non-system descriptor) {
fn add_descriptor(&mut self, descriptor: Descriptor) {
match descriptor {
Descriptor::NonSystem(value) => {
// Is array full?
if self.next_index >= self.descriptors.len() {
return an error saying "no enough space for descriptor";
}
self.descriptors[self.next_index] = value;
self.next_index += 1;
}
Descriptor::System(higher, lower) => {
// Is there enough space for a system descriptor?
if self.next_index + 1 >= self.descriptors.len() {
return an error saying "no enough space for descriptor";
}
self.descriptors[self.next_index] = lower;
self.descriptors[self.next_index + 1] = higher;
self.next_index += 2;
}
}
}
}
``````

To indicate the possibility of a failure, the function should return a Result:

``````impl GDT {
// ... Others

// DELETED: fn add_descriptor(&mut self, descriptor: Descriptor) {
fn add_descriptor(&mut self, descriptor: Descriptor) -> Result<(), &'static str> { // NEW
match descriptor {
Descriptor::NonSystem(value) => {
if self.next_index >= self.descriptors.len() {
// DELETED: return an error saying "no enough space for descriptor";
return Err("no enough space for descriptor");
}
self.descriptors[self.next_index] = value;
self.next_index += 1;
Ok(()) // NEW
}
Descriptor::System(higher, lower) => {
if self.next_index + 1 >= self.descriptors.len() {
// DELETED: return an error saying "No enough space for descriptor";
return Err("no enough space for descriptor");
}
self.descriptors[self.next_index] = lower;
self.descriptors[self.next_index + 1] = higher;
self.next_index += 2;
Ok(()) // NEW
}
}
}
}
``````

Just one more thing is missing here. To ensure that the GDT is always kept at an address that is a multiple of 8, we need to tell the compiler:

``````// DELETED: #[repr(C)]
#[repr(C, align(8))] // NEW
struct GDT {
descriptors: [u64; MAX_NO_OF_ENTRIES],
next_index: usize
}
``````

The `repr(align(8))` attribute forces `GDT` to have an alignment of at least 8. With this attribute, the compiler will make sure that any `GDT` instance will always be kept in memory at an address that is a multiple of 8.

## Requirements For The Code and Data Segment Descriptors

Now that we have a function for adding descriptors, we need the actual descriptors to add. Earlier, I mentioned that we'll need to create a code segment, a data segment and a Task State Segment.

The Task State Segment is a thing for later when we've modeled a Task State Segment. For now, it's the code and data segments we'll be adding.

For our purposes, the code and data segments have the following requirements:

1. The base should be 0 (starting from the first address in memory) and the size should be the biggest possible.
2. The code segment should be writable and the data segment, readable.

Come up with the values now yourself.

# Take Away

• The default layout of a Rust enum is unspecified

For the full code, go to the repo

# In The Next Post

We'll continue with the GDT.

# References

• https://rust-lang.github.io/unsafe-code-guidelines/layout/enums.html
• https://doc.rust-lang.org/nomicon/other-reprs.html