CrashdumpRecipe
6840
Comment: 1) RM'ed EoL releases. 2) RM'ed Lucid entry as bug reports are all fixed. 3) RM'ed 12.04 fixed bug entry.
|
6287
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
= Ubuntu Kernel Crash Dump = | ||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>|| = Introduction = |
Line 6: | Line 8: |
= Installation = |
|
Line 12: | Line 16: |
On boot this should automatically load the kernel used to boot as the secondary kernel used for crash dumps. Whether a kernel is loaded or not can be verified by checking the value of: {{{ cat /sys/kernel/kexec_crash_loaded }}} |
= Verifying linux-crashdump installation = |
Line 15: | Line 18: |
If the returned value is ''1'' the crash kernel has been loaded, if it is ''0'', then something went wrong. The crash kernel can also be loaded by running: {{{ sudo /etc/init.d/kdump start }}} |
For Trusty, please see [[https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html|here]]. |
Line 18: | Line 20: |
Starting with 13.04 Raring, the same behavior can be achieved by using the new kdump-tools mechanism after modifying /etc/default/kdump-tools by running (see [[#Release specific notes|Release specific notes]]) : {{{ #> sudo kdump-config load }}} == Causing a test crash == In order to test a crash, the simplest way is to use the sysrq mechanism. Causing a crash is done by either pressing ''<sysrq>+c'' or: {{{ echo c | sudo tee /proc/sysrq-trigger }}} /!\ Note that this might be disabled in some releases. ''/proc/sys/kernel/sysrq'' needs to be set to 1 in order to let all of the sysrq keys work. If everything works, there should be some delay (depending on the memory size). Then the system reboots again into the normal mode. Usually ''apport'' kicks in and asks about reporting the issue. Alternatively the report file can be found under ''/var/crash'' and either placed somewhere else or be unpacked again by calling: {{{ apport-unpack <report file> <target directory> }}} == Inspecting the crash dump == === Using crash === |
= Inspecting the crash dump using crash = |
Line 62: | Line 46: |
=== Using apport-retrace === | = Inspecting the crash dump using apport-retrace = |
Line 69: | Line 53: |
== Troubleshooting == | = Troubleshooting = |
Line 71: | Line 55: |
=== Allocated memory for the crash kernel === | == Allocated memory for the crash kernel == |
Line 81: | Line 65: |
=== Crash kernel fails to load: Hang === | == Crash kernel fails to load: Hang == |
Line 109: | Line 93: |
== Ubuntu 14.10 "Utopic Unicorn" == * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1359980|Bug #1359980: [Hyper-V] Unable to perform a full kernel crash on Ubuntu 14.10]] <<BR>> With the default crashkernel=128M@64M, when we trigger the crash a kernel panic gets generated but no vmcore/crash file is generated under /var/crash and the VM will hang and not reboot. However if we modify the kernel parameter as suggested to crashkernel=384M-:256M and trigger a kernel panic, the vm will reboot, and also generate a vmcore/crash file under /var/crash. |
Introduction
The Ubuntu Kernel Crash Dump is a mechanism that enable enterprise style post-mortem crash analysis in Linux operating systems. It uses a special mode of kexec which allows to automatically boot a secondary kernel whenever a crash (Oops/panic) occurs. This secondary kernel will then save the state and memory of the primary kernel to a certain location of the filesystem (/var/crash on newer releases). This file can then be used by crash to gather detailed information about the problem.
Installation
For convenience, the kernel crash dump utility has been packaged in Ubuntu. It can be installed with the following command:
sudo apt-get install linux-crashdump
Newer versions of the package will automatically add an entry crashkernel=384M-2G:64M,2G-:128M to the kernel commandline in grub. However this may cause problems on systems with less than 2G of memory (see troubleshooting).
Verifying linux-crashdump installation
For Trusty, please see here.
Inspecting the crash dump using crash
In order to use the generated crash dump with crash one needs the vmlinux file which has the debugging information. This is part of the kernel ddeb package which can be found at:
http://ddebs.ubuntu.com/pool/main/l/linux/
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF deb http://ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-security main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse EOF sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01 sudo apt-get update sudo apt-get install linux-image-$(uname -r)-dbgsym
Be aware that those packages are huge! (~600 MB)
When installed, the debug kernel can be found under /usr/lib/debug/boot/ and crash is started by:
crash <debug kernel> <crash dump>
Unfortunately the tool does not allow to look at a 32bit dump on a 64bit system and the other way round. Also it tends to be quite picky about matching up kernel and dump.
Inspecting the crash dump using apport-retrace
To get a local retrace, you need apport-retrace and then run:
apport-retrace --stdout --rebuild-package-info /var/crash/linux-image*.crash
Again, this can take a while because it needs to download the kernel debug package.
Troubleshooting
Allocated memory for the crash kernel
When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. To solve this there are the following options:
Increase the allocation by changing crashkernel= on the grub command line or in /boot/grub/grub.cfg (for grub2) or /boot/grub/menu.lst (for old grub). To avoid loosing the settings when running update-grub the change can be made in /etc/grub.d/10_linux.
Reduce the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in /etc/initramfs-tools/initramfs.conf:
... MODULES=dep ...
Crash kernel fails to load: Hang
This can be frustrating to debug, especially if you're unable to record the console messages from the new kexec kernel. A serial console attached to the system is best here to continue debugging. An easy troubleshooting step is to systematically eliminate the additional kernel parameters passed to the crash kernel and retrying. These arguments are kept in /etc/init.d/kdump:
... # Append kdump_needed for initramfs to know what to do, and add # maxcpus=1 to keep things sane. APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices" # --elf32-core-headers is needed for 32-bit systems (ok # for 64-bit ones too). log_action_begin_msg "Loading crashkernel" kexec -p "$KERNEL_IMAGE" --initrd="$INITRD" --append="$APPEND" log_action_end_msg $? ...
Leave $APPEND and kdump_needed. Start by removing reset_devices and then install the new kexec crash kernel configuration:
sudo service kdump start
Then retest; if that doesn't work, remove the next argument, rinse and repeat.
Release specific notes
Ubuntu 12.04 "Precise Pangolin"
Bug 785394: Hard-coded crashkernel=... memory reservation in /etc/grub.d/10_linux is insufficient
The default allocation for systems below 2G is not enough for the current initrd size. Manually adapting the size allows to use the crash kernel.- The current (1.3.7-2) version of makedumpfile reports to be incompatible with the 3.2 kernel. The dumps created seem to be ok.
Ubuntu 14.10 "Utopic Unicorn"
Bug #1359980: [Hyper-V] Unable to perform a full kernel crash on Ubuntu 14.10
With the default crashkernel=128M@64M, when we trigger the crash a kernel panic gets generated but no vmcore/crash file is generated under /var/crash and the VM will hang and not reboot. However if we modify the kernel parameter as suggested to crashkernel=384M-:256M and trigger a kernel panic, the vm will reboot, and also generate a vmcore/crash file under /var/crash.
Kernel/CrashdumpRecipe (last edited 2021-11-04 14:04:59 by tomreyn)