Kprobe 筆記

甚麼是 Kprobe?

Kprobes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively.
There are currently two types of probes: kprobes, and kretprobes (also called return probes). A kprobe can be inserted on virtually any instruction in the kernel. A return probe fires when a specified function returns.

Kprobe 原理

When a kprobe is registered, Kprobes makes a copy of the probed instruction and replaces the first byte(s) of the probed instruction with a breakpoint instruction (e.g., int3 on i386 and x86_64).When a CPU hits the breakpoint instruction, a trap occurs, the CPU’s registers are saved, and control passes to Kprobes via the notifier_call_chain mechanism. Kprobes executes the “pre_handler” associated with the kprobe, passing the handler the addresses of the kprobe struct and the saved registers.Next, Kprobes single-steps its copy of the probed instruction. (It would be simpler to single-step the actual instruction in place, but then Kprobes would have to temporarily remove the breakpoint instruction. This would open a small time window when another CPU could sail right past the probepoint.)After the instruction is single-stepped, Kprobes executes the “post_handler,” if any, that is associated with the kprobe. Execution then continues with the instruction following the probepoint.

Jump Optimization

Safety Check:

Optimization:

實際案例:

第一步: 查看 symbol

 ffffffff8b013d20 409 t pt_buffer_setup_aux
ffffffff8b014130 11f T intel_pt_interrupt
ffffffff8b014250 2d T cpu_emergency_stop_pt
ffffffff8b014280 13a t rapl_pmu_event_init [intel_rapl_perf]
ffffffff8b0143c0 bb t rapl_event_update [intel_rapl_perf]
ffffffff8b014480 10 t rapl_pmu_event_read [intel_rapl_perf]
ffffffff8b014490 a3 t rapl_cpu_offline [intel_rapl_perf]
ffffffff8b014540 24 t __rapl_event_show [intel_rapl_perf]
ffffffff8b014570 f2 t rapl_pmu_event_stop [intel_rapl_perf]

第二步: 相關的 structures

Kprobe 結構

struct kprobe {
struct hlist_node hlist;
/* list of kprobes for multi-handler support */
struct list_head list;
/*count the number of times this probe was temporarily disarmed */
unsigned long nmissed;
/* location of the probe point */
kprobe_opcode_t *addr;
/* Allow user to indicate symbol name of the probe point */
const char *symbol_name;
/* Offset into the symbol */
unsigned int offset;
/* Called before addr is executed. */
kprobe_pre_handler_t pre_handler;
/* Called after addr is executed, unless… */
kprobe_post_handler_t post_handler;
/*
* … called if executing addr causes a fault (eg. page fault).
* Return 1 if it handled fault, otherwise kernel will see it.
*/
kprobe_fault_handler_t fault_handler;
/*
* … called if breakpoint trap occurs in probe handler.
* Return 1 if it handled break, otherwise kernel will see it.
*/
kprobe_break_handler_t break_handler;
/* Saved opcode (which has been replaced with breakpoint) */
kprobe_opcode_t opcode;
/* copy of the original instruction */
struct arch_specific_insn ainsn;
/*
* Indicates various status flags.
* Protected by kprobe_mutex after this kprobe is registered.
*/
u32 flags;
};

pre_handler 的結構

typedef int (*kprobe_pre_handler_t) (struct kprobe *, struct pt_regs *);
int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
Integer valued arguments in the leftmost four positions are passed in left-to-right order in RCX, RDX, R8, and R9, respectively.

第三部: 寫一個 kernel module

#define MAX_SYMBOL_LEN 64
static char symbol[MAX_SYMBOL_LEN] = “elv_merge”;
module_param_string(symbol, symbol, sizeof(symbol), 0644);
static long long times = 0;
/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name = symbol,
};
static int __kprobes handler_pre(struct kprobe *p, struct pt_regs *regs)
{
struct bio *block_io = NULL;
times++;
if (times >= LLONG_MAX — 1) {
times = 0;
}
pr_info(“<ztex><%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx, times=%lld, nmissed=%lu\n”,
p->symbol_name, p->addr, regs->ip, regs->flags, times, p->nmissed);
dump_stack();
block_io = regs->dx;
if (NULL != block_io) {
pr_info(“<ztex>disk name: %s\n”, block_io->bi_disk->disk_name);
}
/* A dump_stack() here will give a stack backtrace */
return 0;
}
static int __init kprobe_init(void)
{
int ret;
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
kp.fault_handler = handler_fault;
ret = register_kprobe(&kp);
if (ret < 0) {
pr_err(“register_kprobe failed, returned %d\n”, ret);
return ret;
}
pr_info(“Planted kprobe at %p\n”, kp.addr);
return 0;
}
static void __exit kprobe_exit(void)
{
unregister_kprobe(&kp);
pr_info(“kprobe at %p unregistered\n”, kp.addr);
}
module_init(kprobe_init)
module_exit(kprobe_exit)
MODULE_LICENSE(“GPL”);
result

--

--

--

A huge fan of computer science. Amazon embedded SDE who likes to share some stories to everybody. Working on embedded system, Free-RTOS, RISC-V etc.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ztex, Tony, Liu

ztex, Tony, Liu

A huge fan of computer science. Amazon embedded SDE who likes to share some stories to everybody. Working on embedded system, Free-RTOS, RISC-V etc.

More from Medium

Kardashev Scale — How to categorize Advanced Civilizations?

LiveFlow raised $3.5million

Still Image to 3D video: AI Driven 3D Ken Burns Effect