Systemtap

Overview

Systemtap allows you harness both static and dynamic instrumentation without recompiling your code. It can perform simple things like dynamically inserting a printk anywhere, or changing a critical data structure of the kernel (guru mode). All operations are performed as root (shell prompt of #). While Systemtap has many safeguards in place to sandbox dangerous, system crashing actions, it's not infallible. Proceed at your own risk.

Basics

Systemtap Installation

$ sudo apt-get install -y systemtap gcc

Where to get debug symbols for kernel X?

GPG key import

  • 16.04 and higher

 sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622 

  • older distributions

 sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01 

Add repository config

codename=$(lsb_release -c | awk  '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ ${codename}      main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates  main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF

sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym

How do I build a debuginfo kernel if one isn't available?

$ cd $HOME
$ sudo apt-get install dpkg-dev debhelper gawk
$ mkdir tmp
$ cd tmp
$ sudo apt-get build-dep --no-install-recommends linux-image-$(uname -r)
$ apt-get source linux-image-$(uname -r)
$ cd linux-2.6.31 (this is currently the kernel version of 9.10)
$ fakeroot debian/rules clean
$ AUTOBUILD=1 fakeroot debian/rules binary-generic skipdbg=false
$ sudo dpkg -i ../linux-image-debug-2.6.31-19-generic_2.6.31-19.56_amd64.ddeb

Work around broken dbgsym file layout so kernel and module probe points work

Systemtap having been developed at RH is predisposed to their layout for kernel debug symbols. Typically, everything is installed under /usr/lib/debug/<kernel ver>, what debian/ubuntu does is split the kernel proper and the modules into two separate directories. Not only that, elfutils actually looks for modules with a .debug extension e.g. psmouse.ko.debug, as a result, even though it's searching in the right place, the expected file name is wrong, causing stap to fail when probing modules.

Run the following script as root to setup your debug symbols each time you install a kernel ddeb. Eventually this will integrated into the main package. Note unlike previous workarounds, this doesn't touch your real modules /lib/modules.

This issue is tracked [fix released: quantal, precise] by: https://bugs.launchpad.net/ubuntu/+source/systemtap/+bug/669641

# apt-get install -y elfutils

for file in `find /usr/lib/debug -name '*.ko' -print`
do
        buildid=`eu-readelf -n $file| grep Build.ID: | awk '{print $3}'`
        dir=`echo $buildid | cut -c1-2`
        fn=`echo $buildid | cut -c3-`
        mkdir -p /usr/lib/debug/.build-id/$dir
        ln -s $file /usr/lib/debug/.build-id/$dir/$fn
        ln -s $file /usr/lib/debug/.build-id/$dir/${fn}.debug
done

This will also make our debug symbols more friendly to gdb and company.

List all functions that are accessible by systemtap

# stap -l 'kernel.function("acpi_*")' | sort

# stap -l 'module("ohci1394").function("*")' | sort

and if that wasn't cool enough, using the -L switch instead will show a list of probe points and the local variables accessible at that point

# stap -L 'module("thinkpad_acpi").function("brightness*")' | sort
module("thinkpad_acpi").function("brightness_exit@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6308")
module("thinkpad_acpi").function("brightness_get@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6113") $bd:struct backlight_device* $status:int $res:int
module("thinkpad_acpi").function("brightness_read@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6319") $m:struct seq_file* $level:int
module("thinkpad_acpi").function("brightness_set@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6064") $value:unsigned int $res:int
module("thinkpad_acpi").function("brightness_shutdown@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6303")
module("thinkpad_acpi").function("brightness_suspend@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6298") $state:pm_message_t
module("thinkpad_acpi").function("brightness_update_status@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6097") $bd:struct backlight_device* $level:unsigned int $__func__:char[] const
module("thinkpad_acpi").function("brightness_write@/build/buildd/linux-2.6.32/drivers/platform/x86/thinkpad_acpi.c:6337") $buf:char* $level:int $rc:int $cmd:char* $max_level:int

Determine local variables available at probe point

By dumping the 'locals' var using '$$' which displays it as an associative array flattened to a string, easy to print.

  i8042_controller_selftest locals [param=0xc0 i=?]

and the stap code to generate this:

  printf (%s locals [%s]\n", probefunc(), $$locals)

Easily grab a functions argument list and return value

probe kernel.function("ps2_*").call {
       printf ("%s -> %s\n", thread_indent(1), probefunc())
       printf ("%s args [%s]\n", probefunc(), $$parms)
}

probe kernel.function("ps2_*").return {
       printf ("exit %s <- %s\n", thread_indent(-1), probefunc())
       printf ("%s args [%s]\n", probefunc(), $$return)
}


and this is what it looks like.

    0 kseriod(41): -> ps2_init
ps2_init args [ps2dev=0xf006de08 serio=0xf6d36200 ]
exit 17 kseriod(41): -> ps2_init
ps2_init args []
    0 kseriod(41): -> ps2_command
ps2_command args [ps2dev=0xf006de08 param=0xf75bbeaa command=0x2f2 ]
   42 kseriod(41): -> ps2_sendbyte
ps2_sendbyte args [ps2dev=0xf006de08 byte=0xf2 timeout=0xc8 ]
exit 200048 kseriod(41): -> ps2_sendbyte
ps2_sendbyte args [return=0xffffffffffffffff ]
exit 200070 kseriod(41): -> ps2_command
ps2_command args [return=0xffffffffffffffff ]

Basic syslog integration

Making use of the system() function. Having it print to syslog and stdout simultaneously is an exercise left to the reader.

function syslog(msg:string)
{
    sendit = "/usr/bin/logger -t stap ".msg
    system(sendit)
}

probe scsi.iodispatching
{
    if ($cmd->cmnd[0] == 0x12) {
        syslog(sprintf("%d %s: INQUIRY submitted to h:%d c:%d d:%d l:%d\n", gettimeofday_s(),
                execname(), host_no, channel, dev_id, lun))
    }
}

Aggregate probe points for easier book keeping

Instead of creating a new body for call and return for yet another trace function, you can create probe chains by making use of the comma operator.

probe kernel.function("i8042_controller_reset").call,
  kernel.function("i8042_controller_selftest").call,
  kernel.function("i8042_command").call
{
  printf ("%s -> %s\n", thread_indent(1), probefunc())
  printf ("\t %s args [%s]\n", probefunc(), $$parms)
  printf ("\t %s locals [%s]\n", probefunc(), $$locals)
}

probe kernel.function("i8042_controller_reset").return,
  kernel.function("i8042_controller_selftest").return,
  kernel.function("i8042_command").return
{
  printf ("exit %s -> %s\n", thread_indent(-1), probefunc())
  printf ("%s args [%s]\n", probefunc(), $$return)
}

Get the absolute address of a kernel routine

NOTE: statement/absolute addressing requires GURU mode (-g) to operate

Just grep /proc/kallsyms and then use the address like so

# grep ps2_sendbyte /proc/kallsyms 
c0489820 T ps2_sendbyte

# stap -ge 'kernel.statement(0xc0489820).absolute { printf("HERE! \n") }'

Alternatively, you can disassemble the module/kernel, calculate the offset like so.

probe kernel.statement(0xc0439c86).absolute {
/*
/build/buildd/linux-2.6.31/drivers/input/serio/libps2.c:224
c0439c86:       bb ff ff ff ff          mov    $0xffffffff,%ebx
serio_pause_rx():

224         if (ps2dev->cmdcnt && (command != PS2_CMD_RESET_BAT || ps2dev->cmdc    nt != 1))
225                 goto out;
226 
*/

        printf("(%s) libps2.c:224  \n", probefunc() )
}

I like to intersperse the related code so I can remember what I was working on.

Systemtap actually makes this easier by allowing you to cite the sourcecode file and the line number and will take care of the translation for you.

If you need absolute addressing or are going through a dump then you need to look at the ASM. Assuming you have the debug symbols installed

# objdump -lD /usr/lib/debug/boot/vmlinux-`uname -r` > vmlinux.S

The same is done for any non-stripped KO object. The capital S tells your editor that you're dealing with ASM + C preprocessor statements so it will highlight the code correctly, especially if you intersperse the source code.

Now if you want the source code injected with the relevant ASM you can add the -S option to the above invocation but you need the actual source code in place for this to work and it expects this code to be in the same location it was built from originally on the build server.

To determine this location:

# readlink /lib/modules/`uname -r`/build/source
/build/buildd/linux-2.6.32

Therefore:

# mkdir -p /build/buildd
# pushd /build/buildd
# apt-get apt-get source linux-image-$(uname -r)

Then disassemble the kernel again with the additional statement

# objdump -lDS /usr/lib/debug/boot/vmlinux-`uname -r` > vmlinux.S

Kernel/Systemtap (last edited 2016-05-06 19:50:08 by localhost)