Multitasking OS for Xmega

Last year, I received a request to extend the firmware of an existing project for a new product. The code, which was already quite intricate, risked becoming an incomprehensible and bug-ridden mess. So, while chatting with a friend, we agreed that “if there were threads, the code would be much more linear”. And that’s how, after a series of searches on GitHub and similar platforms, I ended up implementing a multitasking system for AVR Xmega.
Why AVR Again?
With the advent of 32-bit ARM controllers, the AVR architecture has got a bad rap. So why, in 2021, develop a multitasking system for this platform? The answer is rather simple: the hardware was already using this particular microcontroller, and a significant portion of the software had already been written.
Features
As a starting point, I referred to the xmultitasking repository 1, which provided a very basic implementation that I extended to meet the following requirements:
- Non-preemptive: it is the task currently in execution that voluntarily releases the CPU;
- Static: no dynamic memory allocation;
- “Systick” presence: a timer generates periodic interrupts (e.g., 1ms). A task can release the CPU to wait for a timeout;
- Simple synchronization primitives.
Task Descriptor
In practice, a task corresponds to a function executed cyclically, for example:
|
|
Each task can be in one of the following three states:
- Running
- Ready
- Blocked
Since the AVR is a single-core processor, only one task can be running at a time, thus holding CPU usage. A blocked task is waiting for some kind of event, such as a timeout, a specific interrupt, or an event triggered by another task. A ready task is part of a list, together with other tasks, waiting to resume execution as soon as the CPU becomes available (context switch).
Stack Partitioning

In a typical memory layout, static variables are allocated at the lowest memory addresses, at the beginning of the RAM. Traditionally, this area is divided into two sections, .data
and .bss
, with the latter reserved for uninitialized static variables. The sizes are known at compile-time and can be obtained by querying avr-size project.elf
.
text data bss dec hex filename
58 2 2 62 3e project.elf
Automatic variables require a variable-sized area of memory, known as the stack. The stack area starts from the bottom of the RAM and grows towards lower addresses. The size is variable: memory is allocated when new variables are added to the scope and automatically deallocated when they go out of scope. Additionally, the stack stores the return address upon function call. To keep track of the stack head, a dedicated processor register called the stack pointer (SP) is used.

Each task must have its own scope with its stack of automatic variables.
Therefore, it is necessary to partition the available memory to virtually have MAX_TASK
stack areas.
In the current implementation, each task has the same stack size, STACK_SIZE
, which is 200 bytes.
Stack Initialization
A task is created using the TASK_create
function, which, except for a very short prologue, is an alter ego of the assembly routine TASK_init
.
As explained in the next paragraph, the context switch routine expects a specific stack layout.
When creating a new task, all registers are set to zero except the program counter, which must match the address of the task’s handler function.
Some models of Xmega CPU have bigger memory address space, larger than 64 kwords.
In such cases, the program counter’s size is 24 bits, and an additional byte is needed on the stack.
In this simple multitasking system, it is assumed that the handler is not located in this “extended memory,” so the most significant byte is always set to zero.
Context Switch
When switching from one task to another, both the CPU register contents and the current state of the stack pointer must be preserved. The context switch routine is implemented within the TASK_yield()
function, which will be discussed in the next paragraph.

The operations to be performed, in order, are:
- Saving the program counter for task i (return address, or RA).
- Saving all general-purpose registers (r0-r31) and any auxiliary registers.
- Saving the current stack pointer, SPi.
- Loading the new stack pointer, SPj.
- Loading the auxiliary registers and general-purpose registers.
- Returning to the address of task j.
These are the only parts of the program that need to be implemented in pure assembly, as they directly modify the stack content and the respective pointer.
|
|
It is interesting to note that at no point in the code are the return addresses manipulated, or at least not visibly.
This is because the call
instruction, used here to call TASK_yield
, already saves the return address on the stack by pushing the two or three bytes of the program counter.
Conversely, ret
performs the inverse operation to return to the caller.
The trick that allows TASK_yield
to perform the context switch lies in moving the stack pointer to the intermediate code section between call
and ret
.
Scheduler
The scheduler implements a simple round-robin algorithm without priorities or preemption, so it must be manually invoked using the aforementioned TASK_yield()
function.
In short, the scheduler continues cycling as long as the enabled task mask task_enable_mask_AT
is zero. When at least one active task is found, the first one after the current task is selected, and a context switch is performed.
If the current task is the only one that has been awakened, a context switch is avoided, saving some machine cycles.
|
|
Stack Smash!
The AVR architecture does not have any hardware memory protection mechanisms. However, since a task could allocate more stack space than necessary, leading to disastrous consequences, I deemed it appropriate to implement at least a rudimentary software mechanism for control. I took inspiration from canary bytes2.
The principle is simple: at the end of each stack segment, I place a byte initialized with a known value (canary byte). During each context switch, I check if the content remains unchanged. If it has changed, I can conclude that the task about to release the CPU has grown too large and corrupted the stacks of other tasks.
In these unfortunate cases, there is little that can be done. I have implemented the program to abort, potentially flashing a menacing red LED. If I am debugging, I can inspect the register contents to note which task caused the damage and save a memory dump.
Improvements
- Stack smash detector
- Watchdog
- Sleep mode