Design Review — Backtrace in RISC-V

ztex, Tony, Liu
8 min readMay 11, 2022

--

Introduction

Today, we’re going to cover my experience about conducting backtrace under FreeRTOS + RISCV, with interference brought from compiler option.

After we get the stack and dump it. Now here are what developers already know:

  1. Registers value
  2. The address of the instruction cause the problem (this leads to which exactly function which operation, function-call, assignment it is)
  3. The raw stack (this shows local variables, call frames etc.)

However, developers may need more information for further analysis, or more precisely, how do we get here? what functions have been called.

Say we have a program below:

int A(int x)
{
int *ptr = 0x00;
*ptr = 0xab; // invalid, cause system crashes
return x + 20;
}
int B(int x, int y)
{
A(x) + y;
}
int C(int x, int y, int z)
{
B(x, y) + z;
}
int main()
{
C(0xab, 0xcd, 0xef);
}

Backtrace works like this:

#0  0x0000555555555129 in A ()
#1 0x0000555555555158 in B ()
#2 0x000055555555517f in C ()
#3 0x000055555555519e in main ()

As what showed above, we know the it goes from main(), call C(), B(), then A()

Stack and Call Frame

For most cases under RISC-V architecture, stack grows downwards (from high memory to low memory).

Below is the assembly of main in the last example

0000000000000674 <main>:
main():
674: 1141 addi sp,sp,-16
676: e406 sd ra,8(sp)
678: e022 sd s0,0(sp)
67a: 0800 addi s0,sp,16
67c: 0ef00613 li a2,239
680: 0cd00593 li a1,205
684: 0ab00513 li a0,171
688: fb3ff0ef jal ra,63a <C>
68c: 4781 li a5,0
68e: 853e mv a0,a5
690: 60a2 ld ra,8(sp)
692: 6402 ld s0,0(sp)
694: 0141 addi sp,sp,16
696: 8082 ret

At the very beginning of a function, sp goes down to allocate a space dedicated for that function.

addi	sp,sp,-16

The space is known as call frame. There is some rules about that, which is specified in calling convention (https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf)

ra is at the top of the call frame

sd	ra,8(sp)

follow by fp

sd	s0,0(sp)

Backtrace & Call Frame

What backtrace does is break the stack down to a series of call frames.

Furthermore, the register, ra , can used to determine what function it is.

How does backtrace work

We can leverage knowledge from Linux kernel to get it start. (https://github.com/torvalds/linux/blob/master/arch/riscv/kernel/stacktrace.c)

The algorithm goes like:

  1. Get fp for the latest call frame
  2. While it does not touch the limit
  3. Use fp to get the fp’, ra’ which lead us to last call frame
void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,                                                                                                                               
bool (*fn)(void *, unsigned long), void *arg)
{
unsigned long fp, sp, pc;
int level = 0;

if (regs) {
fp = frame_pointer(regs);
sp = user_stack_pointer(regs);
pc = instruction_pointer(regs);
} else if (task == NULL || task == current) {
fp = (unsigned long)__builtin_frame_address(0);
sp = sp_in_global;
pc = (unsigned long)walk_stackframe;
} else {
/* task blocked in __switch_to */
fp = task->thread.s[0];
sp = task->thread.sp;
pc = task->thread.ra;
}

for (;;) {
unsigned long low, high;
struct stackframe *frame;

if (unlikely(!__kernel_text_address(pc) || (level++ >= 1 && !fn(arg, pc))))
break;

/* Validate frame pointer */
low = sp + sizeof(struct stackframe);
high = ALIGN(sp, THREAD_SIZE);
if (unlikely(fp < low || fp > high || fp & 0x7))
break;
/* Unwind stack frame */
frame = (struct stackframe *)fp - 1;
sp = fp;
if (regs && (regs->epc == pc) && (frame->fp & 0x7)) {
fp = frame->ra;
pc = regs->ra;
} else {
fp = frame->fp;
pc = ftrace_graph_ret_addr(current, NULL, frame->ra,
(unsigned long *)(fp - 8));
}
}
}

So the point is: How to get the fp for the latest call frame?

Or, see https://stackoverflow.com/questions/42739893/force-gdb-to-use-frame-pointer-based-unwinding

We can use frame pointer (It does not always held, which we will explain below), or .eh_frame and .debug_frame sections, which is in ELF file.

One way to obtain the frame size of a given function is using objdump to get the DWARF debug sections in the file (https://sourceware.org/binutils/docs/binutils/objdump.html)

riscv64-unknown-elf-objdump -WF RdkRed.out | grep b0202282

where b0202282 is the address of the instruction in the given function

00028958 000000000000003c 000287d8 FDE cie=000287d8 pc=00000000b02021e4..00000000b0202280
00000000b0202272 s0+0 u c-16 c-24 c-32 c-40
00000000b0202274 sp+64 u u c-24 c-32 c-40
00000000b0202276 sp+64 u u c-24 u c-40
00000000b0202278 sp+64 u u c-24 u u
00000000b020227c sp+64 u u u u u
00000000b020227e sp+0 u u u u u
00028998 000000000000003c 000287d8 FDE cie=000287d8 pc=00000000b0202280..00000000b0202324
00000000b0202280 sp+0 u u u u u u
00000000b0202282 sp+64 u u u u u u
00000000b020228c sp+64 u c-16 c-24 c-32 c-40 c-48
00000000b020228e s0+0 u c-16 c-24 c-32 c-40 c-48
00000000b0202290 s0+0 c-8 c-16 c-24 c-32 c-40 c-48
function(...):
{
421870 b0202280: 7139 addi sp,sp,-64
421871 b0202282: f822 sd s0,48(sp)
421872 b0202284: f426 sd s1,40(sp)
421873 b0202286: f04a sd s2,32(sp)
421874 b0202288: ec4e sd s3,24(sp)
421875 b020228a: e852 sd s4,16(sp)
421876 b020228c: 0080 addi s0,sp,64
421877 b020228e: fc06 sd ra,56(sp)
421878 b0202290: 8a06 mv s4,ra
421879 b0202292: 84aa mv s1,a0
421880 b0202294: 892e mv s2,a1
421881 b0202296: 89b2 mv s3,a2
...

y’all see that it’s 64 bytes

Problems: Get the frame base address

GNU compiler Optimize-Option: -fomit-frame-pointer

Refer to: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fomit-frame-pointer

Here is how document says:

-fomit-frame-pointerOmit the frame pointer in functions that don’t need one. This avoids the instructions to save, set up and restore the frame pointer; on many targets it also makes an extra register available.On some targets this flag has no effect because the standard calling sequence always uses a frame pointer, so it cannot be omitted.Note that -fno-omit-frame-pointer doesn’t guarantee the frame pointer is used in all functions. Several targets always omit the frame pointer in leaf functions.Enabled by default at -O1 and higher.

This option is a pain in the ass, this makes it possible for register, fp, not point to the frame base address

( riscv64-linux-gnu-gcc-10 -fomit-frame-pointer main.c )

( riscv64-linux-gnu-gcc-10 main.c )

Getting the Return or Frame Address of a Function

Refer to: https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html

int func_return_address(int argc, char **argv)
{
printf("%p\n", __builtin_return_address(0));
}
int func_frame_address(int argc, char **argv)
{
printf("%p\n", __builtin_frame_address(0));
}
int main(int argc, char **argv)
{
return 1;
}

( __builtin_return_address(0) )

0000000000000624 <func_return_address>:
func_return_address():
624: 1101 addi sp,sp,-32
626: ec06 sd ra,24(sp)
628: e822 sd s0,16(sp)
62a: 1000 addi s0,sp,32
62c: 8706 mv a4,ra
62e: 87aa mv a5,a0
630: feb43023 sd a1,-32(s0)
634: fef42623 sw a5,-20(s0)
638: 87ba mv a5,a4
63a: 85be mv a1,a5
63c: 00000517 auipc a0,0x0
640: 0bc50513 addi a0,a0,188 # 6f8 <__libc_csu_fini+0x4>
644: f0dff0ef jal ra,550 <printf@plt>
648: 0001 nop
64a: 853e mv a0,a5
64c: 60e2 ld ra,24(sp)
64e: 6442 ld s0,16(sp)
650: 6105 addi sp,sp,32
652: 8082 ret

( __builtin_frame_address(0) )

0000000000000654 <func_frame_address>:
func_frame_address():
654: 1101 addi sp,sp,-32
656: ec06 sd ra,24(sp)
658: e822 sd s0,16(sp)
65a: 1000 addi s0,sp,32
65c: 87aa mv a5,a0
65e: feb43023 sd a1,-32(s0)
662: fef42623 sw a5,-20(s0)
666: 87a2 mv a5,s0
668: 85be mv a1,a5
66a: 00000517 auipc a0,0x0
66e: 08e50513 addi a0,a0,142 # 6f8 <__libc_csu_fini+0x4>
672: edfff0ef jal ra,550 <printf@plt>
676: 0001 nop
678: 853e mv a0,a5
67a: 60e2 ld ra,24(sp)
67c: 6442 ld s0,16(sp)
67e: 6105 addi sp,sp,32
680: 8082 ret

From the assembly above, we get to know that these __builtin_ functions depend on registers and compiler preprocess

This means that we cannot use these __builtin_ functions to get the latest call frame’s base address.

Retrieving Process Call Frames From Exception Context

With optimize option, -fomit-frame-pointer, and, without debug sections in ELF file. How do we conduct backtrace?

One naive approach is: starting from sp, toward high address, look for an address the point to a value that is equal to ra.

Say address is addr, the frame base address, fp, is (addr + 1 * word_size)

After we get to start backtrace, basically like what Linux kernel does.

Stack before exception(credit: my colleague, CJ)
Stack after exception(credit: my colleague, CJ)
uint64_t *find_fp(uint64_t *sp, uint64_t *ra)                                                                                                                                                              
{
uint64_t *ptr = sp;
for(int i = 0; i < 64; i++)
{
if(((uint64_t)(*ptr)) == ((uint64_t)ra))
{
return ptr + 1;
}
ptr++;
}
return sp;
}

typedef struct frame
{
uint64_t fp;
uint64_t ra;
} frame_t;

void trap_core_dump(void *base)
{
uintptr_t mcause;
uintptr_t mstatus;
uintptr_t mepc;
uintptr_t mtval;

uint64_t *stack_base;
uint64_t *fp;
uint64_t *sp;
uint64_t *ra;
uint64_t *x8;
frame_t *frame;

tcb_t *ptr = (tcb_t *)base;
if (8 > mcause)
{
sp = (uint64_t *)(ptr->pxTopOfStack);
stack_base = (uint64_t *)(ptr->pxStack);
ra = (uint64_t *) sp[1]; // Retrieve ra from stack
x8 = (uint64_t *) sp[5]; // Retrieve fp from stack
sp += CONTEXT_SIZE;
fp = find_fp(sp, ra); // Use ra to find the fp of the latest frame
trap_printf("[+] Find sp = %p\r\n", sp);
trap_printf("[+] Find ra = %p\r\n", ra);
trap_printf("[+] Find fp = %p\r\n", fp);
trap_printf("[+] Find x8 = %p\r\n", x8);
trap_printf("[+] Find stack_base = %p\r\n", stack_base);

/* Backtrace for 3 frames */
for (int i = 0; i < 3; i++)
{
frame = (frame_t *)fp - 1;
sp = (uint64_t *) fp;
fp = (uint64_t *) frame->fp;
ra = (uint64_t *) frame->ra;
trap_printf("fp: %p \r\n", fp);
trap_printf("sp: %p \r\n", sp);
trap_printf("ra: %p \r\n", ra);
trap_printf("---------------------------------------------\r\n");
}
}
}

Test case:

int A(int x)
{
...
/* This cause Store/AMO access fault */
int *ptr = (int *)0x1;
*ptr = 0xab;
return x + 1;
}
int B(int x, int y)
{
...
return x + A(y);
}
int C(int x, int y, int z)
{
...
return x + B(y, z);
}
size_t CLedRgb::Dispatch(RfwOpcode_t rtOpCode, size_t rdArg0, size_t rdArg1, size_t rdArg2, size_t rdArg3)
{
...
RFW_NOTIFY_INFO("[ZTEX] debug");
C(0xaa, 0xbb, 0xcc);
...
}

Result:

Command Base 
base: 0xc016e0d0
top: 0xc016fcf0
[+] Find sp = 0xc016fde0
[+] Find ra = 0xb0202246
[+] Find fp = 0xc016fe50
[+] Find x8 = 0xc016fe90
[+] Find stack_base = 0xc016e0d0
[0xc016fde0] cc
[0xc016fde8] c016fe90
[0xc016fdf0] a5a5a5a5a5a5a5a5
[0xc016fdf8] a5a5a5a5a5a5a5a5
[0xc016fe00] a5a5a5a5a5a5a5a5
[0xc016fe08] a5a5a5a5a5a5a5a5
[0xc016fe10] a5a5a5a5a5a5a5a5
[0xc016fe18] a5a5a5a5a5a5a5a5
[0xc016fe20] c0028d4a
[0xc016fe28] b02022e8
[0xc016fe30] cc
[0xc016fe38] bb
[0xc016fe40] c016fe90
[0xc016fe48] b0202246
[0xc016fe50] 5
[0xc016fe58] c016fe50
[0xc016fe60] c01706b2
[0xc016fe68] cc
[0xc016fe70] bb
[0xc016fe78] aa
[0xc016fe80] c016fed0
[0xc016fe88] b02022e8
[0xc016fe90] 62675264654c43
[0xc016fe98] c016fe90
[0xc016fea0] 5
[0xc016fea8] c01706b2
[0xc016feb0] c01706b2
[0xc016feb8] c01cdd90
[0xc016fec0] 0
[0xc016fec8] c0028d4a
[0xc016fed0] 5
[0xc016fed8] c01706b2
fp: 0xc016fe90
sp: 0xc016fe50
ra: 0xb0202246 [B(int, int)]
---------------------------------------------
fp: 0xc016fed0
sp: 0xc016fe90
ra: 0xb02022e8 [C(int, int, int)]
---------------------------------------------
fp: 0
sp: 0xc016fed0
ra: 0xc0028d4a [size_t CLedRgb::Dispatch(RfwOpcode_t rtOpCode, size_t rdArg0, size_t rdArg1, size_t rdArg2, size_t rdArg3)]
---------------------------------------------

Conclusion

With the mechanism above we manage to find some information for debugging without JTAG and GDB.

One thing that really make me excited is that we found a hardware problem that one bit of value in RAM can be wrong. This is a HUG WINNNNN !!!

Appendix

--

--

ztex, Tony, Liu
ztex, Tony, Liu

Written by ztex, Tony, Liu

Incoming-Intern, CPU emulation software @Apple, Ex-SDE @Amazon. Working on embedded system, Free-RTOS, RISC-V etc.

No responses yet