CrashdumpRecipe

Differences between revisions 5 and 6
Revision 5 as of 2012-07-26 09:35:09
Size: 5860
Editor: smb
Comment:
Revision 6 as of 2012-07-26 13:01:24
Size: 7832
Editor: smb
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

= {i} Update not finished yet... =
Line 13: Line 11:
 #> apt-get install linux-crashdump  #> sudo apt-get install linux-crashdump
Line 16: Line 14:
and reboot. This should automatically load the kernel used to boot as the secondary kernel used for crash dumps. Whether a kernel is loaded or not can be verified by: Newer versions of the package will automatically add an entry ''crashkernel=384M-2G:64M,2G-:128M'' to the kernel commandline in grub. However this may cause problems on systems with less than 2G of memory (see [[#Troubleshooting|troubleshooting]]).

On boot t
his should automatically load the kernel used to boot as the secondary kernel used for crash dumps. Whether a kernel is loaded or not can be verified by checking the value of:
Line 20: Line 20:
 1
Line 23: Line 22:
If this is ''0'', then something went wrong. If the returned value is ''1'' the crash kernel has been loaded, if it is ''0'', then something went wrong. The crash kernel can also be loaded by running:

 {{{
 #> sudo /etc/init.d/kdump start
 }}}
Line 33: Line 36:
/!\ Note that this might be disabled in some releases. ''/proc/sys/kernel/sysrq'' needs to be set to 1 in order to let the sysrq keys all work. /!\ Note that this might be disabled in some releases. ''/proc/sys/kernel/sysrq'' needs to be set to 1 in order to let all of the sysrq keys work.
Line 74: Line 77:
When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. To solve this there are the following options:
Line 76: Line 79:
To solve this one can either increase the allocation by changing ''crashkernel='' on the grub command line or in ''/boot/grub/grub.cfg'' (for grub2) or ''/boot/grub/menu.lst'' (for old grub).

Or one tries to r
educe the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in ''/etc/initramfs-tools/initramfs.conf'':
 1. Increase the allocation by changing ''crashkernel='' on the grub command line or in ''/boot/grub/grub.cfg'' (for grub2) or ''/boot/grub/menu.lst'' (for old grub). To avoid loosing the settings when running '''update-grub''' the change can be made in ''/etc/grub.d/10_linux''.
 1. Reduce the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in ''/etc/initramfs-tools/initramfs.conf'':<<BR>>
Line 88: Line 89:
== Ubuntu 12.10 "Quantal Quetzal" ==

 * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/988512|Bug #988512: Missing /boot/vmcoreinfo-{version} file is breaking kdump]]<<BR>>
 Because of some kernel code changes, the vmcoreinfo file cannot be generated. However, the required information can now be obtained from the kernel on doing the dump. But the scripts to load the crash kernel and to create the dump still depend on it (see the no-vmcoreinfo patch in the bug report).
 * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/785394|Bug 785394: Hard-coded crashkernel=... memory reservation in /etc/grub.d/10_linux is insufficient]]<<BR>>
 The default allocation for systems below 2G is not enough for the current initrd size. Manually adapting the size allows to use the crash kernel.
 * The current (1.4.3-1) version of makedumpfile reports to be incompatible with the 3.2 kernel. The dumps created seem to be ok.
Line 91: Line 100:
 Because of some kernel code changes, the vmcoreinfo file cannot be generated. However, the required information can now be obtained from the kernel on doing the dump. But ''/etc/init/kdump'' and ''/usr/???'' still require it. [TDB: add hotfix patch to bug]  Because of some kernel code changes, the vmcoreinfo file cannot be generated. However, the required information can now be obtained from the kernel on doing the dump. But the scripts to load the crash kernel and to create the dump still depend on it (see the no-vmcoreinfo patch in the bug report).
 * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/785394|Bug 785394: Hard-coded crashkernel=... memory reservation in /etc/grub.d/10_linux is insufficient]]<<BR>>
 The default allocation for systems below 2G is not enough for the current initrd size. Manually adapting the size allows to use the crash kernel.
 * The current (1.3.7-2) version of makedumpfile reports to be incompatible with the 3.2 kernel. The dumps created seem to be ok.

Linux Kernel Crash Dump (LKCD)

LKCD is a project that tries to enable enterprise style post-mortem crash analysis in Linux operating systems. It uses a special mode of kexec which allows to automatically boot a secondary kernel whenever a crash (Oops/panic) occurs. This secondary kernel will then save the state and memory of the primary kernel to a certain location of the filesystem (/var/crash on newer releases). This file can then be used by crash to gather detailed information about the problem.

For convenience, the kernel crash dump utility has been packaged in Ubuntu. It can be installed with the following command:

  •  #> sudo apt-get install linux-crashdump

Newer versions of the package will automatically add an entry crashkernel=384M-2G:64M,2G-:128M to the kernel commandline in grub. However this may cause problems on systems with less than 2G of memory (see troubleshooting).

On boot this should automatically load the kernel used to boot as the secondary kernel used for crash dumps. Whether a kernel is loaded or not can be verified by checking the value of:

  •  #> cat /sys/kernel/kexec_crash_loaded

If the returned value is 1 the crash kernel has been loaded, if it is 0, then something went wrong. The crash kernel can also be loaded by running:

  •  #> sudo /etc/init.d/kdump start

Causing a test crash

In order to test a crash, the simplest way is to use the sysrq mechanism. Causing a crash is done by either pressing <sysrq>+c or

  •  #> echo c | sudo tee /proc/sysrq-trigger

Warning /!\ Note that this might be disabled in some releases. /proc/sys/kernel/sysrq needs to be set to 1 in order to let all of the sysrq keys work.

If everything works, there should be some delay (depending on the memory size). Then the system reboots again into the normal mode. Usually apport kicks in and asks about reporting the issue. Alternatively the report file can be found under /var/crash and either placed somewhere else or be unpacked again by calling:

  •  #> apport-unpack <report file> <target directory>

Inspecting the crash dump

Using crash

In order to use the generated crash dump with crash one needs the vmlinux file which has the debugging information. This is part of the kernel ddeb package which can be found at:

http://ddebs.ubuntu.com/pool/main/l/linux/

Warning /!\ Be aware that those packages are huge!

When installed, the debug kernel can be found under /usr/lib/debug/boot/ and crash is started by:

  •  #> crash <debug kernel> <crash dump>

Unfortunately the tool does not allow to look at a 32bit dump on a 64bit system and the other way round. Also it tends to be quite picky about matching up kernel and dump.

Using apport-retrace

To get a local retrace, you need apport-retrace and then run:

  •  #> apport-retrace --stdout --rebuild-package-info /var/crash/linux-image*.crash

Warning /!\ Again, this can take a while because it needs to download the kernel debug package.

Troubleshooting

Allocated memory for the crash kernel

When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. To solve this there are the following options:

  1. Increase the allocation by changing crashkernel= on the grub command line or in /boot/grub/grub.cfg (for grub2) or /boot/grub/menu.lst (for old grub). To avoid loosing the settings when running update-grub the change can be made in /etc/grub.d/10_linux.

  2. Reduce the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in /etc/initramfs-tools/initramfs.conf:

     ...
     MODULES=dep
     ...

Release specific notes

Ubuntu 12.10 "Quantal Quetzal"

Ubuntu 12.04 "Precise Pangolin"

Ubuntu 10.04 "Lucid Lynx"

Ubuntu 9.04 "Jaunty Jackalope"

This page describes a recipe for enabling crash dump vmcore analysis on your Jaunty x86/x86_64 platform. Much of the information was gleaned from the kernel source tree files in Documentation/kdump.

  • 'apt-get install linux-crashdump'
    • This is a meta package that installs all of the tools necessary to acquire and analyse a crash-dump vmcore.
  • Add 'crashkernel=64M@16M' to the kernel command line in /boot/grub/menu.lst.
    • You'll also probably want to remove 'quiet splash'.
  • Reboot the system (into the ordinary kernel). The section of RAM above will now be reserved for the crashkernel (and not available to the normal system).
  • Make note of your root partition, e.g., /dev/sda1
    • kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initrd.img-`uname -r` --append="root=<ROOT_PARTITION> irqpoll maxcpus=1" This loads the crash-dump kernel into the reserved memory, in preparation for a panic.

    Now your kernel is ready to acquire a post-crash vmcore.

Kernel/CrashdumpRecipe (last edited 2021-11-04 14:04:59 by tomreyn)