Hooking Linux Kernel Functions, how to Hook Functions with Ftrace
Ftrace is a Linux kernel framework for tracing Linux kernel functions. But our team managed to find a new way to use ftrace when trying to enable system activity monitoring to be able to block suspicious processes. It turns out that ftrace allows you to install hooks from a loadable GPL module without rebuilding the kernel. This approach works for Linux kernel versions 3.19 and higher for the x86_64 architecture.
A new approach: Using ftrace for Linux kernel hooking
What is an ftrace? Basically, ftrace is a framework used for tracing the kernel on the function level. This framework has been in development since 2008 and has quite an impressive feature set. What data can you usually get when you trace your kernel functions with ftrace? Linux ftrace displays call graphs, tracks the frequency and length of function calls, filters particular functions by templates, and so on. Further down this article you’ll find references to official documents and sources you can use to learn more about the capabilities of ftrace.
The implementation of ftrace is based on the compiler options -pg and -mfentry. These kernel options insert the call of a special tracing function — mcount() or __fentry__() — at the beginning of every function. In user programs, profilers use this compiler capability for tracking calls of all functions. In the kernel, however, these functions are used for implementing the ftrace framework.
Calling ftrace from every function is, of course, pretty costly. This is why there’s an optimization available for popular architectures — dynamic ftrace. If ftrace isn’t in use, it nearly doesn’t affect the system because the kernel knows where the calls mcount() or __fentry__() are located and replaces the machine code with nop (a specific instruction that does nothing) at an early stage. And when Linux kernel trace is on, ftrace calls are added back to the necessary functions.
Description of necessary functions
The following structure can be used for describing each hooked function:
/** * struct ftrace_hook describes the hooked function * * @name: the name of the hooked function * * @function: the address of the wrapper function that will be called instead * of the hooked function * * @original: a pointer to the place where the address * of the hooked function should be stored, filled out during installation * of the hook * * @address: the address of the hooked function, filled out during installation * of the hook * * @ops: ftrace service information, initialized by zeros; * initialization is finished during installation of the hook */ struct ftrace_hook { const char *name; void *function; void *original; unsigned long address; struct ftrace_ops ops; }; |
There are only three fields that the user needs to fill in: name, function, and original. The rest of the fields are considered to be implementation details. You can put the description of all hooked functions together and use macros to make the code more compact:
#define HOOK(_name, _function, _original) \ { \ .name = (_name), \ .function = (_function), \ .original = (_original), \ } static struct ftrace_hook hooked_functions[] = { HOOK( "sys_clone" , fh_sys_clone, &real_sys_clone), HOOK( "sys_execve" , fh_sys_execve, &real_sys_execve), }; |
This is what the hooked function wrapper looks like:
/* * It’s a pointer to the original system call handler execve(). * It can be called from the wrapper. It’s extremely important to keep the function signature * without any changes: the order, types of arguments, returned value, * and ABI specifier (pay attention to “asmlinkage”). */ static asmlinkage long (*real_sys_execve)( const char __user *filename, const char __user * const __user *argv, const char __user * const __user *envp); /* * This function will be called instead of the hooked one. Its arguments are * the arguments of the original function. Its return value will be passed on to * the calling function. This function can execute arbitrary code before, after, * or instead of the original function. */ static asmlinkage long fh_sys_execve ( const char __user *filename, const char __user * const __user *argv, const char __user * const __user *envp) { long ret; pr_debug( "execve() called: filename=%p argv=%p envp=%p\n" , filename, argv, envp); ret = real_sys_execve(filename, argv, envp); pr_debug( "execve() returns: %ld\n" , ret); return ret; } |
Now, hooked functions have a minimum of extra code. The only thing requiring special attention is the function signatures. They must be completely identical; otherwise, the arguments will be passed on incorrectly and everything will go wrong. This isn’t as important for hooking system calls, though, since their handlers are pretty stable and, for performance reasons, the system call ABI and function call ABI use the same layout of arguments in registers. However, if you’re going to hook other functions, remember that the kernel has no stable interfaces.
Initializing ftrace
Our first step is finding and saving the hooked function address. As you probably know, when using ftrace, Linux kernel tracing can be performed by the function name. However, we still need to know the address of the original function in order to call it.
You can use kallsyms — a list of all kernel symbols — to get the address of the needed function. This list includes not only symbols exported for the modules but actually all symbols. This is what the process of getting the hooked function address can look like:
static int resolve_hook_address ( struct ftrace_hook *hook) hook->address = kallsyms_lookup_name(hook->name); if (!hook->address) { pr_debug( "unresolved symbol: %s\n" , hook->name); return -ENOENT; } *((unsigned long *) hook->original) = hook->address; return 0; } |
Next, we need to initialize the ftrace_ops structure. Here we have one necessary field, func, pointing to the callback. However, some critical flags are needed:
int fh_install_hook ( struct ftrace_hook *hook) int err; err = resolve_hook_address(hook); if (err) return err; hook->ops.func = fh_ftrace_thunk; hook->ops.flags = FTRACE_OPS_FL_SAVE_REGS | FTRACE_OPS_FL_IPMODIFY; /* ... */ } |
The fh_ftrace_thunk () feature is our callback that ftrace will call when tracing the function. We’ll talk about this callback later. The flags are needed for hooking — they command ftrace to save and restore the processor registers whose contents we’ll be able to change in the callback.
Now we’re ready to turn on the hook. First, we use ftrace_set_filter_ip() to turn on the ftrace utility for the needed function. Second, we use register_ftrace_function() to give ftrace permission to call our callback:
int fh_install_hook ( struct ftrace_hook *hook) { /* ... */ err = ftrace_set_filter_ip(&hook->ops, hook->address, 0, 0); if (err) { pr_debug( "ftrace_set_filter_ip() failed: %d\n" , err); return err; } err = register_ftrace_function(&hook->ops); if (err) { pr_debug( "register_ftrace_function() failed: %d\n" , err); /* Don’t forget to turn off ftrace in case of an error. */ ftrace_set_filter_ip(&hook->ops, hook->address, 1, 0); return err; } return 0; } |
To turn off the hook, we repeat the same actions in reverse:
void fh_remove_hook ( struct ftrace_hook *hook) { int err; err = unregister_ftrace_function(&hook->ops); if (err) pr_debug( "unregister_ftrace_function() failed: %d\n" , err); } err = ftrace_set_filter_ip(&hook->ops, hook->address, 1, 0); if (err) { pr_debug( "ftrace_set_filter_ip() failed: %d\n" , err); } } |
When the unregister_ftrace_function() call is over, it’s guaranteed that there won’t be any activations of the installed callback or our wrapper in the system. We can unload the hook module without worrying that our functions are still being executed somewhere in the system. Next, we provide a detailed description of the function hooking process.
Hooking functions with ftrace
So how can you configure kernel function hooking? The process is pretty simple: ftrace is able to alter the register state after exiting the callback. By changing the register %rip — a pointer to the next executed instruction — we can change the function executed by the processor. In other words, we can force the processor to make an unconditional jump from the current function to ours and take over control.
This is what the ftrace callback looks like:
static void notrace fh_ftrace_thunk(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ops, struct pt_regs *regs) { struct ftrace_hook *hook = container_of(ops, struct ftrace_hook, ops); regs->ip = (unsigned long ) hook->function; } |
We get the address of struct ftrace_hook for our function using a macro container_of() and the address of struct ftrace_ops embedded in struct ftrace_hook. Next, we substitute the value of the register %rip in the struct pt_regs structure with our handler’s address. For architectures other than x86_64, this register can have a different name (like PC or IP). The basic idea, however, still applies.
Note that the notrace specifier added for the callback requires special attention. This specifier can be used for marking functions that are prohibited for Linux kernel tracing with ftrace. For instance, you can mark ftrace functions that are used in the tracing process. By using this specifier, you can prevent the system from hanging if you accidentally call a function from your ftrace callback that’s currently being traced by ftrace.
The ftrace callback is usually called with a disabled preemption (just like kprobes), although there might be some exceptions. But in our case, this limitation wasn’t important since we only needed to replace eight bytes of %rip value in the pt_regs structure.
Since the wrapper function and the original are executed in the same context, both functions have the same restrictions. For instance, if you hook an interrupt handler, then sleeping in the wrapper is still out of the question.
Protection from recursive calls
There’s one catch in the code we gave you before: when the wrapper calls the original function, the original function will be traced by ftrace again, thus causing an endless recursion. We came up with a pretty neat way of breaking this cycle by using parent_ip — one of the ftrace callback arguments that contains the return address to the function that called the hooked one. Usually, this argument is used for building function call graphs. However, we can use this argument to distinguish the first traced function call from the repeated calls.
The difference is significant: during the first call, the argument parent_ip will point to some place in the kernel, while during the repeated call it will only point inside our wrapper. You should pass control only during the first function call. All other calls must let the original function be executed.
We can run the entry test by comparing the address to the boundaries of the current module with our functions. However, this approach works only if the module doesn’t contain anything other than the wrapper that calls the hooked function. Otherwise, you’ll need to be more picky.
So this is what a correct ftrace callback looks like:
static void notrace fh_ftrace_thunk (unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ops, struct pt_regs *regs) { struct ftrace_hook *hook = container_of(ops, struct ftrace_hook, ops); /* Skip the function calls from the current module. */ if (!within_module(parent_ip, THIS_MODULE)) regs->ip = (unsigned long ) hook->function; } |
This approach has three main advantages:
- Low overhead costs. You need to perform only several comparisons and subtractions without grabbing any spinlocks or iterating through lists.
- It doesn’t have to be global. Since there’s no synchronization, this approach is compatible with preemption and isn’t tied to the global process list. As a result, you can trace even interrupt handlers.
- There are no limitations for functions. This approach doesn’t have the main kretprobes drawback and can support any number of trace function activations (including recursive) out of the box. During recursive calls, the return address is still located outside of our module, so the callback test works correctly.
In the next section, we take a more detailed look at the hooking process and describe how ftrace works.
The scheme of the hooking process
So, how does ftrace work? Let’s take a look at a simple example: you’ve typed the command Is in the terminal to see the list of files in the current directory. The command-line interpreter (say, Bash) launches a new process using the common functions fork() plus execve() from the standard C library. Inside the system, these functions are implemented through system calls clone() and execve() respectively. Let’s suggest that we hook the execve() system call to gain control over launching new processes.
Figure 1 below gives an ftrace example and illustrates the process of hooking a handler function.
Figure 1. Linux kernel hooking with ftrace.
In this image, we can see how a user process (blue) executes a system call to the kernel (red) where the ftrace framework (violet) calls functions from our module (green).
Below, we give a more detailed description of each step of the process:
- The SYSCALL instruction is executed by the user process. This instruction allows switching to the kernel mode and puts the low-level system call handler entry_SYSCALL_64() in charge. This handler is responsible for all system calls of 64-bit programs on 64-bit kernels.
- A specific handler receives control. The kernel accomplishes all low-level tasks implemented on the assembler pretty fast and hands over control to the high-level do_syscall_64 () function, which is written in C. This function reaches the system call handler table sys_call_table and calls a particular handler by the system call number. In our case, it’s the function sys_execve ().
- Calling ftrace. There’s an __fentry__() function call at the beginning of every kernel function. This function is implemented by the ftrace framework. In the functions that don’t need to be traced, this call is replaced with the instruction nop. However, in the case of the sys_execve() function, there’s no such call.
- Ftrace calls our callback. Ftrace calls all registered trace callbacks, including ours. Other callbacks won’t interfere since, at each particular place, only one callback can be installed that changes the value of the %rip register.
- The callback performs the hooking. The callback looks at the value of parent_ip leading inside the do_syscall_64() function — since it’s the particular function that called the sys_execve() handler — and decides to hook the function, changing the values of the register %rip in the pt_regs structure.
- Ftrace restores the state of the registers. Following the FTRACE_SAVE_REGS flag, the framework saves the register state in the pt_regs structure before it calls the handlers. When the handling is over, the registers are restored from the same structure. Our handler changes the register %rip — a pointer to the next executed function — which leads to passing control to a new address.
- Wrapper function receives control. An unconditional jump makes it look like the activation of the sys_execve() function has been terminated. Instead of this function, control goes to our function, fh_sys_execve(). Meanwhile, the state of both processor and memory remains the same, so our function receives the arguments of the original handler and returns control to the do_syscall_64() function.
- The original function is called by our wrapper. Now, the system call is under our control. After analyzing the context and arguments of the system call, the fh_sys_execve() function can either permit or prohibit execution. If execution is prohibited, the function returns an error code. Otherwise, the function needs to repeat the call to the original handler and sys_execve() is called again through the real_sys_execve pointer that was saved during the hook setup.
- The callback gets control. Just like during the first call of sys_execve(), control goes through ftrace to our callback. But this time, the process ends differently.
- The callback does nothing. The sys_execve() function was called not by the kernel from do_syscall_64() but by our fh_sys_execve() function. Therefore, the registers remain unchanged and the sys_execve() function is executed as usual. The only problem is that ftrace sees the entry to sys_execve() twice.
- The wrapper gets back control. The system call handler sys_execve() gives control to our fh_sys_execve() function for the second time. Now, the launch of a new process is nearly finished. We can see if the execve() call finished with an error, study the new process, make some notes to the log file, and so on.
- The kernel receives control. Finally, the fh_sys_execve() function is finished and control returns to the do_syscall_64() function. The function sees the call as one that was completed normally, and the kernel proceeds as usual.
- Control goes to the user process. In the end, the kernel executes the IRET instruction (or SYSRET, but for execve() there can be only IRET), installing the registers for a new user process and switching the processor into user code execution mode. The system call is over and so is the launch of the new process.
As you can see, the process of hooking Linux kernel function calls with ftrace isn’t that complex.
Even though the main purpose of ftrace is to trace Linux kernel function calls rather than hook them, our innovative approach turned out to be both simple and effective. However, the approach we describe above works only for kernel versions 3.19 and higher and only for the x86_64 architecture.
In the final part of our series, we’ll tell you about the main ftrace pros and cons and some unexpected surprises that might be waiting for you if you decide to implement this approach. Meanwhile, you can read about another unusual solution for installing hooks — by using the GCC attribute constructor with LD_PRELOAD.
What Are the Main Pros and Cons of Ftrace?
Ftrace is a Linux utility that ’s usually used for tracing kernel functions. But as we looked for a useful solution that would allow us to enable system activity monitoring and block suspicious processes, we discovered that Linux ftrace can also be used for hooking function calls.
Pros and cons of using ftrace
Ftrace makes hooking Linux kernel functions much easier and has several crucial advantages.
- A mature API and simple code. Leveraging ready-to-use interfaces in the kernel significantly reduces code complexity. You can hook your kernel functions with ftrace by making only a couple of function calls, filling in two structure fields, and adding a bit of magic in the callback. The rest of the code is just business logic executed around the traced function.
- Ability to trace any function by name. Linux kernel tracing with ftrace is quite a simple process – writing the function name in a regular string is enough to point to the one you need. You don’t need to struggle with the linker, scan the memory, or investigate internal kernel data structures. As long as you know their names, you can trace your kernel functions with ftrace even if those functions aren’t exported for the modules.
But just like the other approaches that we’ve described in this series, ftrace has a couple of drawbacks.
Kernel configuration requirements. There are several kernel requirements needed to ensure successful ftrace Linux kernel tracing:
- The list of kallsyms symbols for searching functions by name
- The ftrace framework as a whole for performing tracing
- Ftrace options crucial for hooking functions
All these features can be disabled in the kernel configuration since they aren’t critical for the system’s functioning. Usually, however, the kernels used by popular distributions still contain all these kernel options as they don’t affect system performance significantly and may be useful for debugging. Still, you’d better keep these requirements in mind in case you need to support some particular kernels.
Overhead costs. Since ftrace doesn’t use breakpoints, it has lower overhead costs than kprobes. However, the overhead costs are higher than for splicing manually. In fact, dynamic ftrace is a variation of splicing which executes the unneeded ftrace code and other callbacks.
Functions are wrapped as a whole. Just as with usual splicing, ftrace wraps the functions as a whole. And while splicing technically can be executed in any part of the function, ftrace works only at the entry point. You can see this limitation as a disadvantage, but usually it doesn’t cause any complications.
Double ftrace calls. As we’ve explained before, using the parent_ip pointer for analysis leads to calling ftrace twice for the same hooked function. This adds some overhead costs and can disrupt the readings of other traces because they’ll see twice as many calls. This issue can be fixed by moving the original function address five bytes further (the length of the call instruction) so you can basically spring over ftrace.
Let’s take a closer look at some of these disadvantages.
Kernel configuration requirements
The kernel has to support both ftrace and kallsyms. This requires enabling two configuration options:
- CONFIG_FTRACE
- CONFIG_KALLSYMS
Next, ftrace has to support a dynamic register modification, which is the responsibility of the following option:
- CONFIG_DYNAMIC_FTRACE_WITH_REGS
To access the FTRACE_OPS_FL_IPMODIFY flag, the kernel you use has to be based on version 3.19 or higher. Older kernel versions can still modify the register %rip, but from version 3.19, this register can be modified only after setting the flag. In older versions of the kernel, the presence of this flag will lead to a compilation error. For newer versions, the absence of this flag means a non-operating hook.
Last but not least, we need to pay attention to the ftrace call location inside the function. The ftrace call must be located at the beginning of the function, before the function prologue (where the stack frame is formed and the space for local variables is allocated). The following option takes this feature into account:
- CONFIG_HAVE_FENTRY
While the x86_64 architecture does support this option, the i386 architecture doesn’t. The compiler can’t insert an ftrace call before the function prologue due to ABI limitations of the i386 architecture. As a result, by the time you perform an ftrace call the function stack has already been modified, and changing the value of the register isn’t enough for hooking the function. You’ll also need to undo the actions executed in the prologue, which differ from function to function.
This is why ftrace function hooking doesn’t support a 32-bit x86 architecture. In theory, you can still implement this approach by generating and executing an anti-prologue, for instance, but it’ll significantly boost the technical complexity.
Unexpected surprises when using ftrace
At the testing stage, we faced one particular peculiarity: hooking functions on some distributions led to the permanent hanging of the system. Of course, this problem occurred only on systems that were different from those used by our developers. We also couldn’t reproduce the problem with the initial hooking prototype on any distributions or kernel versions.
According to debugging, the system got stuck inside the hooked function. For some unknown reason, the parent_ip still pointed to the kernel instead of the function wrapper when calling the original function inside the ftrace callback. This launched an endless loop wherein ftrace called our wrapper again and again while doing nothing useful.
Fortunately, we had both working and broken code and eventually discovered what was causing the problem. When we unified the code and got rid of the pieces we didn’t need at the moment, we narrowed down the differences between the two versions of the wrapper function code.
This is the stable code:
static asmlinkage long fh_sys_execve( const char __user *filename, const char __user * const __user *argv, const char __user * const __user *envp) { long ret; pr_debug( "execve() called: filename=%p argv=%p envp=%p\n" , filename, argv, envp); ret = real_sys_execve(filename, argv, envp); pr_debug( "execve() returns: %ld\n" , ret); return ret; } |
And this is the code that caused the system to hang:
static asmlinkage long fh_sys_execve( const char __user *filename, const char __user * const __user *argv, const char __user * const __user *envp) { long ret; pr_devel( "execve() called: filename=%p argv=%p envp=%p\n" , filename, argv, envp); ret = real_sys_execve(filename, argv, envp); pr_devel( "execve() returns: %ld\n" , ret); return ret; } |
How can the logging level possibly affect system behavior? Surprisingly enough, when we took a closer look at the machine code of these two functions, it became obvious that the reason behind these problems was the compiler.
It turns out that the pr_devel() calls are expanded into no-op. This printk-macro version is used for logging at the development stage. And since these logs pose no interest at the operating stage, the system simply cuts them out of the code automatically unless you activate the DEBUG macro. After that, the compiler sees the function like this:
static asmlinkage long fh_sys_execve( const char __user *filename, const char __user * const __user *argv, const char __user * const __user *envp) { return real_sys_execve(filename, argv, envp); } |
And this is where optimizations take the stage. In our case, the so-called tail call optimization was activated. If a function calls another and returns its value immediately, this optimization lets the compiler replace a function call instruction with a cheaper direct jump to the function’s body. This is what this call looks like in machine code:
0000000000000000 <fh_sys_execve>: 0: e8 00 00 00 00 callq 5 <fh_sys_execve+0x5> 5: ff 15 00 00 00 00 callq *0x0(%rip) b: f3 c3 repz retq </fh_sys_execve> |
And this is an example of the broken call:
0000000000000000 <fh_sys_execve>: 0: e8 00 00 00 00 callq 5 <fh_sys_execve+0x5> 5: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax c: ff e0 jmpq *%rax </fh_sys_execve> |
The first CALL instruction is the exact same __fentry__() call that the compiler inserts at the beginning of all functions. But after that, the broken and the stable code act differently. In the stable code, we can see the real_sys_execve call (via a pointer stored in memory) performed by the CALL instruction, which is followed by fh_sys_execve() with the help of the RET instruction. In the broken code, however, there’s a direct jump to the real_sys_execve() function performed by JMP.
The tail call optimization allows you to save some time by not allocating a useless stack frame that includes the return address that the CALL instruction stores in the stack. But since we’re using parent_ip to decide whether we need to hook, the accuracy of the return address is crucial for us. After optimization, the fh_sys_execve() function doesn’t save the new address on the stack anymore, so there’s only the old one leading to the kernel. And this is why the parent_ip keeps pointing inside the kernel and that endless loop appears in the first place.
This is also the main reason why the problem appeared only on some distributions. Different distributions use different sets of compilation flags for compiling the modules. And in all the problem distributions, the tail call optimization was active by default.
We managed to solve this problem by turning off tail call optimization for the entire file with the wrapper functions:
- #pragma GCC optimize(«-fno-optimize-sibling-calls»)
For further hooking experiments, you can use the full kernel module code from GitHub.
Conclusion
While developers typically use ftrace to trace Linux kernel function calls, this utility showed itself to be rather useful for hooking Linux kernel functions as well. And even though this approach has some disadvantages, it gives you one crucial benefit: overall simplicity of both the code and the hooking process.