DebuggingKernelSuspend

Differences between revisions 10 and 56 (spanning 46 versions)
Revision 10 as of 2007-03-11 04:20:01
Size: 4200
Editor: cpe-24-59-117-5
Comment: explained that debugging suspend this way needs 2.6.20 kernel (Feisty or newer)
Revision 56 as of 2019-01-28 21:01:56
Size: 8338
Editor: penalvch
Comment: As per LP#1812561 add usb debug note for suspend debug.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; background-image: url('https://librarian.launchpad.net/1812570/bugsquad.png'); background-repeat: no-repeat; background-position: 98% 0.5ex; margin: 0 0 1em 1em; padding: 0.5em;">'''Contents'''[[BR]][[TableOfContents]]|| <<Include(Debugging/Header)>>
||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; background-image: url('https://librarian.launchpad.net/1812570/bugsquad.png'); background-repeat: no-repeat; background-position: 98% 0.5ex; margin: 0 0 1em 1em; padding: 0.5em;"><<TableOfContents>>||
Line 5: Line 6:
This page describes how to debug Suspend to RAM/Resume problems on your computer. Do not confuse this with Suspend to disk. This page describes how to debug suspend to RAM, and resume from suspend to RAM problems with your computer.
Line 7: Line 8:
Suspend and Resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. The usual problem occurs when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem. For issues regarding suspend to disk (i.e. hibernate), please see [[https://wiki.ubuntu.com/DebuggingKernelHibernate|here]].
Line 9: Line 10:
The debugging procedure described below requires a linux kernel 2.6.20 (and presumably newer) for the ability of "/sys/power/pm_trace". Currently (Mar 2007), FeistyFawn is the only ubuntu release that has a new enough kernel, as EdgyEft uses 2.6.17. You can check the kernel you are running by {{{uname -r}}} from a terminal. You can also just look in the directory /sys/power/ (from a terminal type {{{ls /sys/power/}}})
to see if a file called pm_trace exists; it will only do so if the kernel supports the following debugging procedure.
Suspend and resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. Usually problems occur when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem.
Line 12: Line 12:
= Debugging procedure = The debugging procedure described below assumes you have already installed the latest BIOS from your vendor, and requires a linux kernel with the capability of "/sys/power/pm_trace". You can ensure that pm_trace'ing is possible by looking in the directory /sys/power/ (from a terminal type {{{ls /sys/power/}}})
to see if a file called pm_trace exists.
Line 14: Line 15:
Resume problems are difficult to debug since there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC). For those that just have to know, read Documentation/power/s2ram.txt in your kernel sources. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c. = Familiarize yourself with pm_tracing a resume to find buggy drivers =
Line 16: Line 17:
Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. Resume problems are difficult to debug. The approach used here needs to make notes on progress during resume and be able to recover them after a manual reboot. But there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC), so that is what is used. For those that want to know the details, please read Documentation/power/s2ram.txt in your kernel sources, or see [[https://www.kernel.org/doc/Documentation/power/s2ram.txt|here]]. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c.
Line 18: Line 19:
NTP is not sufficient to recover the correct time of day. By default NTP is configured to to only correct for about an hour drift. You must use the 'date' command to get the RTC value within the drift tolerance. Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. But first you'll want to use 'tune2fs -l <partition>' to find your current settings - look at the "Check interval" setting.
Line 20: Line 21:
In order to stimulate your suspend/resume process, enter the following commands (as root): From a fresh install, the default time interval is 15552000, which is 6 months.
Line 22: Line 23:
{{{sync; echo 1 > /sys/power/pm_trace; /etc/acpi/sleep.sh force Remember that when you are all done, you'll want to set your RTC clock back. Note that NTP by itself is not sufficient to recover the correct time of day, since by default NTP is configured to do nothing if the time is off by more than 1000 seconds. You must first use the 'ntpdate' or 'date' command to get the clock close.

In order to simulate your suspend/resume process, enter the following commands:
{{{
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"
Line 28: Line 33:

{{{dmesg > dmesg.txt
{{{
dmesg > dmesg.txt
Line 32: Line 37:
You can edit this file and find line similar to these:

{{{[ 11.323206] Magic number: 0:798:264
You can edit this file and find lines similar to these:
{{{
[ 11.323206] Magic number: 0:798:264
Line 39: Line 44:

{{{hash matches device i2c-9191
{{{
hash matches device i2c-9191
Line 43: Line 48:
The only way to prove this is to remove the module prior to initiating suspend. Repeat as needed... The only way to prove this is to remove the module prior to initiating suspend. Repeat as needed.
Line 45: Line 50:
= Known bugs = If you get a device number rather than a name, lspci and /sys/devices/pci* are your friends.
Line 47: Line 52:
Description of known issues, how to recognise them and stock responses/actions. = Debugging information to provide in your bug report =
Line 49: Line 54:
'''Open'''
||<rowbgcolor="#eeeeee"> '''Bug#''' || '''Description''' || '''Action''' ||
|| TBD. || TBD. || TBD. ||
In order to identify the root cause of the issue, all of the following information needs to be provided, uncompressed and untarred, in order for a developer to work on it.
 1. Which part of the process does the issue occur with, the suspend to ram, or resuming from?
 1. Please advise how you suspended specifically. For example:
  * Executing at a terminal pm-suspend
  * Shutting the lid of your laptop, which is set to suspend on close.
  * Clicking the word Suspend in the GUI.
  * The computer suspends automatically on inactivity.
 1. Please advise how you resumed specifically. For example:
  * Pressed the power button
  * Lifted the lid
  * Clicked a key on the keyboard
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal: {{{
cat /proc/acpi/wakeup > wakeup
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo freezer > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo devices > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo platform > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo processors > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo core > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. While booted into the latest [[https://wiki.ubuntu.com/Kernel/MainlineBuilds|non-daily mainline kernel]], execute at a terminal:{{{
sudo su
echo none > /sys/power/pm_test
exit
}}} After executing this, perform a pm_trace and attach the results to your report. <<BR>>
 1. Please provide the output of the following terminal command: {{{
cat /sys/kernel/debug/suspend_stats }}}
 1. If you have a graphics related issue after resume (corruption, display blank, etc.) please [[https://help.ubuntu.com/community/DebuggingSystemCrash#SSH|SSH]] into your machine and capture both /var/log/Xorg.0.log and /var/log/Xorg.0.log.old. Please attach each separately.
 1. If your issue may be USB related, for example, disabling XHC1 prior to sleeping is a workaround via a terminal: {{{
# echo XHC1 > /proc/acpi/wakeup
}}} then one will want to use the following kernel boot parameter, and attach dmesg in full: {{{
usbcore.dyndbg=+p
}}}
Line 53: Line 107:
'''Closed'''
||<rowbgcolor="#eeeeee"> '''Bug#''' || '''Description''' || '''Action''' ||
|| TBD. || TBD. || TBD. ||
Line 57: Line 108:
= Non-bugs = = Further reading =
Line 59: Line 110:
TBD.  * [[https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power]] - Documentation on pm_trace and additional methods for debugging
 * [[http://ubuntuforums.org/showthread.php?p=3066404|Detailed analysis of ACPI kernel code for debugging a suspend problem]]
Line 61: Line 113:
------
'''Also see:'''
= Links =
Line 64: Line 115:
 * DebuggingProcedures  * [[https://wiki.ubuntu.com/UnderstandingSuspend]]
 * [[https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues]] Best practices to debug suspend issues
 * [[DebuggingKernelHibernate]]
 * [[Hotkeys|Troubleshooting hotkey issues]]
Line 67: Line 121:
CategoryBugSquad CategoryBugSquad CategoryDebugging

Debugging Central

This page is part of the debugging series — pages with debugging details for a variety of Ubuntu packages.

Introduction

This page describes how to debug suspend to RAM, and resume from suspend to RAM problems with your computer.

For issues regarding suspend to disk (i.e. hibernate), please see here.

Suspend and resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. Usually problems occur when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem.

The debugging procedure described below assumes you have already installed the latest BIOS from your vendor, and requires a linux kernel with the capability of "/sys/power/pm_trace". You can ensure that pm_trace'ing is possible by looking in the directory /sys/power/ (from a terminal type ls /sys/power/) to see if a file called pm_trace exists.

Familiarize yourself with pm_tracing a resume to find buggy drivers

Resume problems are difficult to debug. The approach used here needs to make notes on progress during resume and be able to recover them after a manual reboot. But there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC), so that is what is used. For those that want to know the details, please read Documentation/power/s2ram.txt in your kernel sources, or see here. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c.

Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. But first you'll want to use 'tune2fs -l <partition>' to find your current settings - look at the "Check interval" setting.

From a fresh install, the default time interval is 15552000, which is 6 months.

Remember that when you are all done, you'll want to set your RTC clock back. Note that NTP by itself is not sufficient to recover the correct time of day, since by default NTP is configured to do nothing if the time is off by more than 1000 seconds. You must first use the 'ntpdate' or 'date' command to get the clock close.

In order to simulate your suspend/resume process, enter the following commands:

sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

At this point your computer should enter the suspend state within a few seconds. Usually the power LED will slowly flash when in the suspended state. When that has happened, initiate the resume process by pressing the power button. Observe closely if the disk light comes on briefly. This indicates that resume has begun. If resume fails to complete, then press the power button until the computer turns off. Power on your computer making sure that it loads the same kernel that exhibited the resume problem. You have about 3 minutes to start this boot process before the information saved in the RTC gets corrupted.

Start a console and enter:

dmesg > dmesg.txt

You can edit this file and find lines similar to these:

[   11.323206]   Magic number: 0:798:264
[   11.323257]   hash matches drivers/base/power/resume.c:46

There may well be another 'hash matches' line beyond that. If so, then you are in luck because the last one is the likely culprit. For example:

hash matches device i2c-9191

The only way to prove this is to remove the module prior to initiating suspend. Repeat as needed.

If you get a device number rather than a name, lspci and /sys/devices/pci* are your friends.

Debugging information to provide in your bug report

In order to identify the root cause of the issue, all of the following information needs to be provided, uncompressed and untarred, in order for a developer to work on it.

  1. Which part of the process does the issue occur with, the suspend to ram, or resuming from?
  2. Please advise how you suspended specifically. For example:
    • Executing at a terminal pm-suspend
    • Shutting the lid of your laptop, which is set to suspend on close.
    • Clicking the word Suspend in the GUI.
    • The computer suspends automatically on inactivity.
  3. Please advise how you resumed specifically. For example:
    • Pressed the power button
    • Lifted the lid
    • Clicked a key on the keyboard
  4. While booted into the latest non-daily mainline kernel, execute at a terminal:

    cat /proc/acpi/wakeup > wakeup

    After executing this, perform a pm_trace and attach the results to your report.

  5. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo freezer > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  6. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo devices > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  7. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo platform > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  8. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo processors > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  9. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo core > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  10. While booted into the latest non-daily mainline kernel, execute at a terminal:

    sudo su
    echo none > /sys/power/pm_test
    exit

    After executing this, perform a pm_trace and attach the results to your report.

  11. Please provide the output of the following terminal command:

    cat /sys/kernel/debug/suspend_stats 
  12. If you have a graphics related issue after resume (corruption, display blank, etc.) please SSH into your machine and capture both /var/log/Xorg.0.log and /var/log/Xorg.0.log.old. Please attach each separately.

  13. If your issue may be USB related, for example, disabling XHC1 prior to sleeping is a workaround via a terminal:

    # echo XHC1 > /proc/acpi/wakeup

    then one will want to use the following kernel boot parameter, and attach dmesg in full:

    usbcore.dyndbg=+p

Further reading

Links


CategoryBugSquad CategoryDebugging

DebuggingKernelSuspend (last edited 2019-01-28 21:01:56 by penalvch)