How to Write a Small RTOS



Complex system behaviors often require a real-time operating system (RTOS) that supports multiple running tasks under the coordination of a scheduler. A RTOS helps the software designer implement sophisticated systems as parallel interacting programs. In this blog, I will introduce the basics of RTOS and how a small RTOS is created.

Basic Components

A minimal RTOS has the following basic elements:

  • Scheduler — enforcing scheduling policies, such as priorities and inter-task signaling
  • Task control blocks (TCB) — data structures containing the context and run state of tasks
  • Portable layer — hardware-specific RTOS functions separated out for portability
  • Synchronization — semaphore that enables inter-task synchronization

Task Control Blocks

A TCB data structure looks something like this:

typedef struct tcb_s {
    unsigned int sem;              /* semaphore flag */
    unsigned int *stack_top;       /* top of stack */
    unsigned int *stack_ptr;       /* current stack pointer */
    void *(*task_ptr)(void *);     /* task function pointer */
} tcb_t;

The storage size of unsigned int typically defaults to the machine word size. For instance, on an ARM it is a 32-bit word. Keeping the TCB elements in machine word size allows the CPU to access the data more efficiently, and therefore, the scheduler run faster.

The semaphore flag, sem, contains the semaphore that the task associated with this TCB is waiting on. The variable, stack_top, points to the top of the stack space allocated to the task. The current stack pointer is saved in stack_ptr. Finally, task_ptr points to the function implementing the task, which generally should have the following form:

void *mytask(void *)
    . . .                         /* initialization */  
    while (1) {
       . . .                      /* do periodic stuff here */

Tasks in embedded systems general don’t ever exit, hence the while loop after the initialization. It is possible to make mytask() a one-shot task, but when the task exits, it must return program control back to the scheduler. More on this will be discussed in later sections.


The scheduler interacts with the TCBs to perform task switching, which involves the following steps:

  1. Push context onto the current task’s stack
  2. Determine the next task to run
  3. Pop context from next task’s stack

The word, context, refers to a set of CPU registers saved when a task is preempted. When these register values are restored, the same task will continue execution from the point of preemption. The context must include the interrupt state of the task when it was interrupted.

A simple scheduler may looks something like this:

unsigned int os_curr_tid;          /* OS current task ID */
unsigned int os_sem;               /* OS semaphore flag */
tcb_t os_tcb[MAX_TASKS];           /* OS task control blocks */

void os_switch(void)
   unsigned int tid = MAX_TASKS;
   unsigned int reason;

   do {
       reason = os_tcb[--tid].sem & os_sem;
       if (reason || os_tcb[tid].sem == 0) {
          os_curr_tid = tid;
          os_tcb[os_curr_tid].reason = reason;
          os_tcb[os_curr_tid].sem &= ~reason;
          os_sem &= ~reason;
   } while (tid > 0);

The scheduler loops through the TCB array from high to low using tid as index. For each TCB, it first checks if any semaphore the task is waiting on has been posted in os_sem. If posted, or if the task is not waiting on any semaphore, the scheduler will switch to this task. Task switching involves context save and restore, as well as clearing of the matched semaphore in the TCB’s reason field. The reason field allows the task to know which semaphore woke it up. The use of semaphore is discussed later in this blog.

Given that the tasks are checked based on high tid to low, and that the first qualified task will run, the index, tid, defines the task priority. Also, if no task can run, the os_switch() function will do nothing, and the current running task will continue. It is a good practice to define an “idle” task at the lowest priority to perform power management functions, because when the idle task gets to run, the system is essentially inactive, a good time to consider putting the system in low-power or sleep mode.

Portable Layer

The context save and restore functions, os_context_save() and os_context_restore(), are hardware-specific, because different CPU architectures have different set of registers. Another example of hardware-specific functions is critical section functions, os_enter_critical() and os_exit_critical(), where the global interrupt is enabled and disabled. Again, each CPU architecture may have different way of handling interrupts. To facilitate RTOS porting, hardware-specific functions are often grouped in separate files and folders for portability reason.


Semaphore is a key building block for inter-task synchronization. Earlier we saw that the scheduling policy involves semaphore checking. In this section we will examine that in more detail.

Inter-task synchronization involves blocking and preemption. Blocking is when a running task waiting on an event is switched out by the scheduler so that other tasks can run while the task is waiting. This is a more efficient use of the CPU resource. A task can wait (or block) on an event by calling the os_wait() function. An example implementation is shown below:

unsigned int os_wait(unsigned int sem)
    os_tcb[os_curr_tid].sem = sem;

The os_wait() function takes an unsigned int as input, where each bit is a semaphore. os_wait() saves the semaphore in the current task’s TCB, and then turns program control to the scheduler through the os_yield() call. The scheduler will then schedule in the next highest priority unblocked task.

The os_wait() function is used in a task this way:

void *mytask(void *)
    /* initialization */  
       . . .

    while (1) {
       /* Wait on SEM_1 or SEM_2 */
       reason = os_wait(SEM_1 | SEM_2);

       /* Handle different wake reason */
       if (reason & SEM_1) {
          . . .
       if (reason & SEM_2) {
          . . .

In this example, the os_wait() waits on two different semaphores, SEM_1 and SEM_2. If one or both of these semaphores are posted, then os_wait() will unblock, and the semaphore that woke it will return through the parameter, reason. Using reason, a task can take specific actions.

Semaphores can be posted by a task or by an interrupt handler by calling the function, os_post():

void os_post(unsigned int sem)
    os_sem |= sem;

The function simply sets the semaphore flags in os_sem, the system semaphore. The scheduler compares os_sem with each os_tcb[tid].sem to determine whether a task is unblocked. The critical section wrapper is required because is a read-modify-write operation, which is non-atomic. The critical section will make that statement atomic.

When to Reschedule?

Tasks are rescheduled by executing the os_switch() function.   However, os_switch() is usually not entered by calling it, but by branching to it.   This makes sense, because os_switch() never returns, but rather it exits by branching to the new task’s preemption point when the context is restored.

Rescheduling is done by calling os_yield() function. Depending on the hardware architecture, different mean to branch to os_switch() could be used, and is encapsulated within the os_yield() function. This makes os_yield() a function in the Portable Layer. Common methods to branch to the scheduler include:

  • Software interrupt/trap
  • Hardware interrupt
  • Direct branching

Software interrupt is useful in CPU architectures that support privileged modes.  These architectures often have several banks of context registers, one for each mode. Such architectures prohibit rogue user code from accessing system resources directly. Instead, system resource is accessed by triggering software interrupt (or TRAP), where a different context register bank is used.  This is how the OS protects system resources: by enforcing access policies through software interrupt handler.  Underneath each OS call, including os_yield(), is a software interrupt instruction, like TRAP, with a unique interrupt number.

Hardware interrupt method is like software interrupt, except the triggering mechanism is based asserting hardware interrupt signal. An example of hardware interrupt is the PENDSV interrupt request of the ARM Cortex-M architecture. By asserting PENDSV within os_yield(), the ARM interrupt subsystem registers a request for interrupt, such that when all pending hardware interrupts are serviced, the PENDSV handler is finally called. The scheduler, os_switch(), is branched within the PENDSV interrupt handler.

Lastly, Direct branching can be done in CPU architectures that offer little to no privileged mode protection. Direct branching is usually more tricky to implement, because the context of where os_yield() is called must be explicitly handled. For instance, if os_yield() is called from an interrupt service routine (ISR), os_switch() will be running inside the ISR context, and interrupt must be reenabled, and user mode operation restored when the new tasked is switched in.

(The above article is solely the expressed opinion of the author and does not necessarily reflect the position of his current and past employers)