Demilade Sonuga's blog

All posts

Heap Allocator II

2023-03-21

In this post, we're going to implement a linked list allocator.

Aims

From the previous post and our current requirements, we can surmise that our allocator ought to do 4 things:

  1. Find a free region of a given size.
  2. Mark a region free.
  3. Offer an external interface for allocating and deallocating memory.

To achieve these aims, our allocator will have five functions: alloc for allocation, dealloc for deallocation, find_free_region for finding a free region, and add_free_region to mark a region as free.

Implementation

We need to remember that a linked list allocator uses a linked list and linked lists are not very easy to handle in Rust. We're going to get unsafe a lot.

Create a file allocator.rs and throw this in:

// An allocator for managing heap memory
// Keeps track of free regions with a linked list whose nodes
// are stored in the free regions themselves
pub struct LinkedListAllocator {

}

The allocator keeps track of free regions with a linked list. Because of this, the only field we'll be needing in the allocator is the head node of that linked list.

pub struct LinkedListAllocator {
    // The first free region
    head: *mut ListNode
}

The ListNode itself needs only two pieces of information: the size of the region and a pointer to the next region.

// A free region of memory
struct ListNode {
    // The size of the free region
    size: usize,
    // The next free region
    next: Option<*mut ListNode>
}

We don't have to keep track of its address because the nodes will be stored at the beginning of the free regions themselves. So, the address of a ListNode instance is the address of the free memory region it represents.

To retrieve that ListNode address, we can create a function:

impl ListNode {
    // The address of the ListNode, and hence, the free
    // region it represents
    fn addr(&self) -> usize {
        self as *const _ as usize
    }
}

To represent a chunk of memory, we can create a new struct:

// A chunk of memory
pub struct MemChunk {
    start_addr: usize,
    size: usize
}

Since alloc and dealloc will make use of find_free_region and add_free_region, we'll deal with those first.

find_free_region will take two arguments: the size of the region in bytes and the alignment. If you can't remember what alignment is all about, you can check this post.

The function may or may not find an appropriate free region of memory. To indicate this, the function will return an Option. Some(pointer) in case of success and None in case of failure.

impl LinkedListAllocator {
    // Finds a suitable memory chunk of size `size` and with an address
    // of alignment `alignment` and returns a pointer to the region
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {

    }
}

Finding a suitable region is just a matter of iterating over the free list, checking if a region is big enough and if it is, returning an aligned pointer to that region. The only slightly complicated thing is the alignment matter. The pointer returned from this function must be a multiple of alignment. To ensure this happens, we won't use the actual start address of the regions. We'll use the closest aligned address instead.

To understand this, consider the following:

       1 2 3 4 5 6 7 8
Heap: | | | | | | | | |

           -------
Allocator: | 8 B | -> Null
           -------

Suppose the above is a heap of 8 bytes with addresses starting at 1. And find_free_region is called with size set to 4 and alignment set to 1. An alignment of 1 indicates that any address will do since all numbers are multiples of 1. The closest aligned address in the region is address 1. The effective size of the region starting at address 1 is big enough to hold 4 bytes. So, the first 4 bytes of the region are allocated and address 1 (a pointer is an address) is returned.

After this return from find_free_region, we can envision the heap and allocator like this:

       1    2    3    4    5 6 7 8
Heap: |Used|Used|Used|Used| | | | |

           -------
Allocator: | 4 B | -> Null
           -------

Now, suppose the alignment was 2 instead of 1. This implies that the beginning address cannot be 1, because 1 is not a multiple of 2. From the beginning of the region, the closest aligned address will be 2, because 2 is a multiple of 2. The effective size of the region starting at address 2 is 7. The space at address 1 can no longer be used because of alignment. But this still works out and address 2 is returned.

After this return:

       1 2    3    4    5    6 7 8
Heap: | |Used|Used|Used|Used| | | |

           -------    -------
Allocator: | 1 B | -> | 3 B | -> Null
           -------    -------

Now, suppose the alignment was 8. In this case, the closest aligned address from the region's start address will be 8, which is the address of the last byte. This implies that there is not enough space for the 4 bytes we need to allocate. So, None is returned.

With this understanding, we can break down what exactly this find_free_region function is going to do. In each iteration:

  1. Compute the closest aligned pointer (CAP) is computed.
  2. If the region beginning at CAP is a perfect fit and CAP is the actual beginning of the region
    1. Remove the node from the free list and return the pointer to the region.
  3. If the region beginning at CAP is a perfect fit and CAP is not the actual region start
    1. Reduce the size of the region by the requested size.
    2. Return the CAP.
  4. If the region beginning at CAP is bigger than required and CAP is the actual region start
    1. Move the node to address node.address + requested size and reduce node.size by the requested size.
    2. Set the previous node's next field to the new node address.
    3. Return the CAP.
  5. If the region beginning at CAP is bigger than required and CAP is not the actual region start,
    1. Make the current node end right before CAP.
    2. Create a new node starting at CAP + size and ending at the actual end of the region.
    3. Set the current node's next to the new node and the new node's next to the previous node's next.
    4. Return the CAP.
  6. If the region beginning at CAP is too small, move to the next region.

Getting this down in code:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            
        }
        // No suitable region found
        None
    }
}

In the above code, self.head is never curr_node_ptr. self.head is assumed to be a dummy node because this makes writing this function so much easier. Since the node to be removed is never the head, we don't have to think about edge cases involving head manipulation.

Number 1, to compute the closest aligned pointer:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // Computing the closest aligned pointer
            let region_ptr = (*curr_node_ptr).addr() as *mut u8;
            let closest_aligned_ptr_distance = region_ptr.align_offset(alignment);
            let closest_aligned_ptr = region_ptr.offset(closest_aligned_ptr_distance as isize);

        }
        None
    }
}

The address of the current region is the address of the current node, hence: let region_ptr = (*curr_node_ptr).addr() as *mut u8; The pointer's align_offset function takes an alignment and returns the amount by which the pointer needs to be increased to become aligned.

Number 2:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Perfect fit and CAP is the region start
            if (*curr_node_ptr).size == size && closest_aligned_ptr == region_ptr {
                // Before: Node -> Node to Remove -> Node
                // After: Node -> Node
                core::mem::swap(&mut (*prev_node_ptr).next, &mut (*curr_node_ptr).next);
                return Some(region_ptr);
            }

        }
        None
    }
}

Since the current node is being removed, the previous node's next should point to the node after the one being removed. We do this with the core::mem::swap function, which swaps values in two locations.

Number 3:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Perfect fit and CAP is not the region start
            if (*curr_node_ptr).size == size && closest_aligned_ptr != region_ptr {
                (*curr_node_ptr).size -= size;
                return Some(closest_aligned_ptr);
            }

        }
        None
    }
}

The list node remains the same except for the size. It still points to the next free region. And the list node size reduces by the requested size.

Number 4:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Bigger and CAP is the region start
            if (*curr_node_ptr).size > size && closest_aligned_ptr == region_ptr {
                // Reduce by requested size
                (*curr_node_ptr).size -= size;
                // The new location for the node
                let new_location = (*curr_node_ptr).addr() + size;
                let mut new_location_ptr = new_location as *mut ListNode;
                // Moving the node to the new location
                new_location_ptr.write_unaligned(curr_node_ptr.read_unaligned());
                // Setting the previous node's next to the new location
                (*prev_node_ptr).next = Some(new_location_ptr);
                return Some(curr_node_ptr as *mut u8);
            }

        }
        None
    }
}

Number 5:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Bigger and CAP is not the region start
            if (*curr_node_ptr).size > size && closest_aligned_ptr != region_ptr {
                let region_end = (*curr_node_ptr).addr() + size;

                // Make the node end right before CAP
                let curr_node_size = closest_aligned_ptr as usize - region_ptr as usize;
                (*curr_node_ptr).size = curr_node_size;

                // Create a new node starting at CAP + size and ending at the original region end
                let mut new_node_ptr = closest_aligned_ptr.offset(size as isize) as *mut ListNode;
                (*new_node_ptr).size = region_end - new_node_ptr as usize;

                // Set the new node's next to the previous node's next
                (*new_node_ptr).next = (*curr_node_ptr).next;

                // Set the current node's next to the new node
                (*curr_node_ptr).next = Some(new_node_ptr);
                
                return Some(closest_aligned_ptr);
            }

        }
        None
    }
}

And number 6:

impl LinkedListAllocator {
    unsafe fn find_free_region(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Region beginning at CAP is too small
            // Moving to the next region
            prev_node_ptr = curr_node_ptr;
        }
        None
    }
}

As for the add_free_region function, a single memory chunk will be taken as the argument and either a new node will be created to represent it in the list, or an existing node will be enlarged. To fully get what I'm talking about, you can refer back to this post.

To represent a chunk of memory:

// A chunk of memory
pub struct MemChunk {
    start_addr: usize,
    size: usize
}

And functions for convenience:

impl MemChunk {
    fn start_addr(&self) -> usize {
        self.start_addr
    }

    fn end_addr(&self) -> usize {
        self.start_addr + self.size - 1
    }

    fn size(&self) -> usize {
        self.size
    }
}

Adding the heap memory to the allocator's list will just be a matter of creating a MemChunk that represents it and passing it as an argument to add_free_region.

To add a free region, there are 4 cases we need to consider:

  1. The memory region to be added comes immediately after the current free region.
  2. The memory region to be added comes immediately before the next free region.
  3. The memory region to be added comes immediately before the next and after the current free regions.
  4. The memory region to be added comes immediately before the current free region.
  5. The memory region, although it doesn't come immediately before or after another free region, still comes before a region.
  6. The memory region to be added comes after all other regions.

We iterate over all nodes and check for these conditions as we do so.

impl LinkedListAllocator {
    // Manage the free region described by `mem_chunk`
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {

        }
        // The region to be added comes after all other regions
        // Regions: ---NNN----------------
        // Chunk:   ----------MMM---------
        let mut new_node = mem_chunk.start_addr() as *mut ListNode;
        (*new_node).size = mem_chunk.size();
        (*new_node).next = None;
        (*prev_node_ptr).next = Some(new_node);
    }
}

If after all iterations, none of the first 4 conditions are found, then it must be the 5th.

For the first condition:

impl LinkedListAllocator {
    // Manage the free region described by `mem_chunk`
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            let mut chunk_comes_imm_after_curr_region = false;
            let mut chunk_comes_imm_before_next_region = false;

            let region_end_addr = curr_node_ptr as usize + (*curr_node_ptr).size - 1;

            // The memory chunk comes immediately after the current free region
            // Regions: -----NNNN--------
            // Chunk:   ---------MMM-----
            if mem_chunk.start_addr() == region_end_addr + 1 {
                chunk_comes_imm_after_curr_region = true;
            }

            if chunk_comes_imm_after_curr_region && !chunk_comes_imm_before_next_region {
                // Merge the new chunk with the current region
                (*curr_node_ptr).size += mem_chunk.size();
                return;
            }
        }
        // ...Others
    }
}

The new chunk is merged with the region that comes before it, making the region bigger.

For the second condition:

impl LinkedListAllocator {
    // Manage the free region described by `mem_chunk`
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            let mut chunk_comes_imm_after_curr_region = false;
            let mut chunk_comes_imm_before_next_region = false;

            let region_end_addr = curr_node_ptr as usize + (*curr_node_ptr).size - 1;
            let next_node_ptr_opt = (*curr_node_ptr).next; // NEW

            if mem_chunk.start_addr() == region_end_addr + 1 { /* ...Others */ }

            // NEW:
            // The memory chunk comes immediately before the next free region
            // Regions: ------NNNN-------
            // Chunk:   ---MMM-----------
            if let Some(next_node_ptr) = next_node_ptr_opt {
                if (*next_node_ptr).addr() == mem_chunk.end_addr() + 1 {
                    chunk_comes_imm_before_next_region = true;
                }
            }

            if chunk_comes_imm_after_curr_region && !chunk_comes_imm_before_next_region { /* ...Others */ }

            // NEW:
            if chunk_comes_imm_before_next_region && !chunk_comes_imm_after_curr_region {
                // Shift the node to the mem chunk's start address and increase the size
                let new_region_start_ptr = mem_chunk.start_addr() as *mut ListNode;
                let next_node_ptr = next_node_ptr_opt.unwrap();
                let new_region_size = mem_chunk.size() + (*next_node_ptr).size;
                new_region_start_ptr.write_unaligned(next_node_ptr.read_unaligned());
                (*new_region_start_ptr).size = new_region_size;
                return;
            }
        }
        // ...Others
    }
}

For the third condition:

impl LinkedListAllocator {
    // Manage the free region described by `mem_chunk`
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Regions: ----NNNN---NNNNNN
            // Chunk:   --------MMM------
            if chunk_comes_imm_before_next_region && chunk_comes_imm_after_curr_region {
                // Merge the chunk and the second node into the first node
                let next_node_ptr = next_node_ptr_opt.unwrap();
                let new_region_size = (*curr_node_ptr).size + mem_chunk.size() + (*next_node_ptr).size;
                (*curr_node_ptr).size = new_region_size;
                (*curr_node_ptr).next = (*next_node_ptr).next;
                return;
            }
        }
        // ...Others
    }
}

For the fourth condition:

impl LinkedListAllocator {
    // Manage the free region described by `mem_chunk`
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others

            // Memory chunk come immediately before the current free region
            if mem_chunk.end_addr() + 1 == (*curr_node_ptr).addr() {
                let new_region_start_ptr = mem_chunk.start_addr() as *mut ListNode;
                let new_region_size = mem_chunk.size() + (*curr_node_ptr).size;
                new_region_start_ptr.write_unaligned(curr_node_ptr.read_unaligned());
                (*new_region_start_ptr).size = new_region_size;
                (*prev_node_ptr).next = Some(new_region_start_ptr);
                return;
            }
        }
        // ...Others
    }
}

For the fifth condition:

impl LinkedListAllocator {
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others
            
            if mem_chunk.start_addr() < (*curr_node_ptr).addr() {
                // The region to be added doesn't come immediately before or after any free region
                // but still comes before the current region
                // The region must be placed before those that come after it to maintain
                // the order
                // Regions: ---NNN-----------NNNN-----
                // Chunk:   ----------MMM-------------
                let mut new_node = mem_chunk.start_addr() as *mut ListNode;
                (*new_node).size = mem_chunk.size();
                (*new_node).next = Some(curr_node_ptr);
                (*prev_node_ptr).next = Some(new_node);
                return;
            }
        }
        // ...Others
    }
}

Lastly, to avoid eternal loops:

impl LinkedListAllocator {
    unsafe fn add_free_region(&mut self, mem_chunk: MemChunk) {
        let mut prev_node_ptr = self.head;
        while let Some(curr_node_ptr) = (*prev_node_ptr).next {
            // ...Others
            
            prev_node_ptr = curr_node_ptr;
        }
        // ...Others
    }
}

For alloc, we need to take two arguments: size in bytes and alignment. alloc is essentially just a public interface that we'll use outside allocator.rs.

impl LinkedListAllocator {
    // Allocate heap memory
    pub fn alloc(&mut self, size: usize, alignment: usize) -> Option<*mut u8> {
        unsafe {
            self.find_free_region(size, alignment)
        }
    }
}

Similarly, dealloc will be like a wrapper around add_free_region for returning allocated memory. The arguments needed are the pointer and the size in bytes.

impl LinkedListAllocator {
    // Deallocate heap memory
    pub fn dealloc(&mut self, ptr: *mut u8, size_to_dealloc: usize) {
        unsafe {
            self.add_free_region(MemChunk {
                start_addr: ptr as usize,
                size: size_to_dealloc
            });
        }
    }
}

And that's it for the implementation. In the next post, we'll be testing it.

Take Away

For the full code, go to the repo

In The Next Post

We'll be testing the heap allocator