RISC-V trap (exception) handler

ztex, Tony, Liu
5 min readApr 12, 2022

Introduction

Does this bother you? You are an embedded system engineer, you just deploy a build onto your machine.

However it crashes unprompted … you have not clue but the guess. (Well, of curse you can always use JTAG, I guess I’ll write another article about this later)

It would be nice if, at least, we know something about it.

Objective

During development, unprompted crashing is a pain in the ass, since all we know is the fact it goes wrong but we don’t know what is the cause, what task is selected, state of stack etc.

With all the information we could exploit, in this passage, we will only focus on how to handle traps and get to know what kind of traps it is.

What this is able to bring to us is that, for example, we can know that it crashes because there is a Load access fault.

In that case, we know that there could be some behaviors like this

int *ptr = 0;
int val = *ptr;

Combine with reveal of current task, we get to narrow the problem to a specific code snippet and looking for sign of such behavior.

RISC-V privileged architecture spec

At any time, a RISC-V hardware thread (hart) is running at some privilege level encoded as a mode

in one or more CSRs (control and status registers). (Sure there is another Debug mode, but that’s another story).

There are many details about this mode.

Long story short, what we should focus on the the Machine Mode.

RISC-V machine-level CSR

We can find there is a table describing RISC-V machine-level CSR addresses

So let’s wrap it out real quick

  • mstaus: tell us the machine status
  • mie: we can turn on/off interrupt with this
  • mtvec: register a handler to deal with all the traps
  • mepc: record the address of the instruction causing the trap (See ecall/ebreak for more detail)
  • mcause: which kind of traps it is

How does FreeRTOS register the handler

Ok, say we are working on FreeRTOS, how do we register the trap handler or at least find out the address.

One quick way to find out the address is using the mtvec:

#include <inttypes.h>
uintptr_t mtvec;
__asm__ volatile ("csrr %0, mtvec" : "=r"(mtvec));
printf("[ZTEX] mtvec: %" PRIxPTR, mtvec);

then you get the address of the trap handler. Combine with a map file (refer to: https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html and https://stackoverflow.com/questions/38961649/gcc-how-to-create-a-mapfile-of-the-object-file). we get to know the handler’s symbol.

Anyway, in FreeRTOS the function is freertos_risc_v_trap_handler.

It seems like we need to look into this freertos_risc_v_trap_handler over portable/GCC/RISC-V/portASM.S

freertos_risc_v_trap_handler

The implementation is long and tedious, so I’m cut to the chase and related part. Anyone interested can always check it out: https://github.com/sifive/Amazon-FreeRTOS/blob/master/freertos_kernel/portable/GCC/RISC-V/portASM.S

        csrr a0, mcause
csrr a1, mepc
test_if_asynchronous:
srli a2, a0, __riscv_xlen - 1 /* MSB of mcause is 1 if handing an asynchronous interrupt - shift to LSB to clear other bits. */
beq a2, x0, handle_synchronous /* Branch past interrupt handing if not asynchronous. */
store_x a1, 0( sp ) /* Asynch so save unmodified exception return address. */
handle_asynchronous:

What does test_if_asynchronous do?

According to 3.1.20 Machine Cause Register (mcause).

This sees the MSB of mcause and see if this is a “interrupt” (or asynchronous? I suppose?) (well I don’t think the term is accurate).

Anyway, an asynchronous case is like:

This is dealt in the handle_asynchronous section, in which portasmHANDLE_INTERRUPT is called.

This is where FreeRTOS handler interrupt like: timer interrupt, gpio … etc (BTW this is how scheduler works in FreeRTOS)

a synchronous case is like:

While no need to say that what we really care should be synchronous ones which are taken care in handle_synchronous section

handle_synchronous:
addi a1, a1, 4 /* Synchronous so updated exception return address to the instruction after the instruction that generated the exeption. */
store_x a1, 0( sp ) /* Save updated exception return address. */

the addi a1, a1, 4, is trying to skip the instruction causing the trap (see ecall/ebreak for more)

handle_synchronous

We need to do some modification over here.

handle_synchronous:
addi a1, a1, 4 /* Synchronous so updated exception return address to the instruction after the instruction that generated the exeption. */
store_x a1, 0( sp ) /* Save updated exception return address. */
+ load_x sp, xISRStackTop /* Switch to ISR stack before function call. */
+ fence.i
+ jal ztex_dump /* Jump to the interrupt handler if there is no CLINT or if there is a CLINT and it has been determined that an external interrupt is pending. */

Here I call ztex_dump to take care of it, which is a function I made.

+void ztex_dump(void)
+{
+ uintptr_t mcause;
+ uintptr_t mtvec;
+ uintptr_t mstatus;
+ uintptr_t mepc;
+
+ __asm__ volatile ("csrr %0, mcause" : "=r"(mcause));
+ __asm__ volatile ("csrr %0, mtvec" : "=r"(mtvec));
+ __asm__ volatile ("csrr %0, mstatus" : "=r"(mstatus));
+ __asm__ volatile ("csrr %0, mepc" : "=r"(mepc));
+ ... do some stuff ...
+ }
+}

We can then deliver more info for debugging: mcause tells us what situation it is, pxCurrentTCB tells use task name, stack

Anyway, one can be creative over this part.

The reason we need fence.i is unclear. Well since tons of pop do do the work, I can only assume that it make sure store_x a1, 0( sp ) and load_x sp, xISRStackTop have to be done before further actions.

How to verify it

Ok, with all of that, it is time to verify it.

Load access fault

+            __asm__ volatile ("addi t0, x0, 1");
+ __asm__ volatile ("lw t1, 0(t0)");
+ for(;;) {}

You should be see the mcause carrying value of 5

Store access fault

int *ptr = 0x1;
*ptr = 0xab;

You should be see the mcause carrying value of 7

Conclusion

To handle some traps, and give info about it.

One approach is like this:

  1. use mtvec to reveal the address of handler, or you just know
  2. modify the handler and take care of the traps that are fatal faults
  3. Those are the ones that the mcause’s MSB is 0
  4. Use Exception Code to identify what situation we are dealing with

Reference

  1. https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
  2. https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
  3. https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf
  4. https://forums.sifive.com/t/flush-invalidate-l1-l2-on-the-u54-mc/4483

--

--

ztex, Tony, Liu

Incoming-Intern, CPU emulation software @Apple, Ex-SDE @Amazon. Working on embedded system, Free-RTOS, RISC-V etc.