SystemTap can monitor multiple system-wide synchronous and asynchronous events at the same time. It can do scriptable filtering and statistics collection. It’s a dynamic method of monitoring and tracing the operations of a running Linux kernel.
To instrument the running kernel, SystemTap uses Kprobes and return probes. With kernel debug information, it gets the addresses for functions and variables referenced in the script. With utrace, SystemTap supports probing user-space executables and shared libraries as well. SystemTap is, therefore, useful to systems administrators, kernel developers, support engineers, researchers and students.
Installation
To install SystemTap on Fedora, run the following commands as root:
yum install systemtap kernel-devel
debuginfo-install kernel
Working
To understand the working of System Tap, run the script in verbose mode (with the -v
switch). The stap
program is the front-end to SystemTap. The -e
switch instructs it to execute the script in the following argument:
$ stap -v -e 'probe syscall.read {printf("syscall %s arguments %s \n", name, argstr); exit()}' Pass 1: parsed user script and 65 library script(s) using 83596virt/20428res/2412shr kb, in 150usr/10sys/249real ms. Pass 2: analyzed script: 1 probe(s), 4 function(s), 0 embed(s), 0 global(s) using 216260virt/115660res/73964shr kb, in 560usr/20sys/946real ms. Pass 3: translated to C into "/tmp/stapUGVeZi/stap_b40c8268c87acc683f75ded62a52ee66_2113.c" using 216260virt/117180res/75484shr kb, in 320usr/40sys/1014real ms. Pass 4: compiled C into "stap_b40c8268c87acc683f75ded62a52ee66_2113.ko" in 3010usr/1210sys/12818real ms. Pass 5: starting run. syscall read arguments 4, 0x00007fffa773b4c0, 8196 Pass 5: run completed in 20usr/60sys/174real ms.
Let’s see check each of the passes mentioned:
- Passes 1 and 2: The script we want to run is parsed, and the code is checked for semantic and syntactic errors. Any tapset reference is imported. Debug data (provided via debuginfo packages) is read to find the addresses for functions and variables referenced in the script.
- Pass 3: The script is translated into C code.
- Pass 4: The translated C code is compiled to create a kernel module.
- Pass 5: The compiled module is inserted into the running kernel.
Probes are inserted at proper locations, as soon as the module's are loaded. From now on, whenever a probe is hit, the handler for that probe is called.
The basic syntax to write a probe for an event, and the handler to run when that event occurred:
probe { handler }
where,
event
is one of thekernel.function
,process.statement
,timer.ms
,begin
,end
, or(tapset)
aliases. For more information, look at the man page forstapprobes
.handler
can have:- filtering/conditionals (
if
…next
) - control structures (
foreach
,while
)
- filtering/conditionals (
Note:
You don’t need to declare the type of a variable, already inferred from the context.
Have predefined functions like pid
, execname
, log
, etc.
You can find the installed package /usr/share/doc/systemtap-<version>/langref.pdf.
How to run stap
The stap
program can be invoked with multiple syntax's:
stap -e '<script>' [-c <target program>]
stap script.stp [-c <target program>]
stap -l '<event*>'
Tapset libraries
In the example shown earlier, after probing on the read
system call, we printed the name of the system call, and the arguments passed via name
and argstr
. This was possible because in one of the tapset
libraries, /usr/share/systemtap/tapset/syscalls2.stp
, the following is defined:
probe syscall.read = kernel.function("SyS_read").call !, kernel.function("sys_read").call { name = "read" fd = $fd buf_uaddr = $buf count = $count argstr = sprintf("%d, %p, %d", $fd, $buf, $count) }
Tapsets provide abstraction to common probe points, and define functions that you can use in your script. They (probe aliases, not probes) are not runnable themselves.