DebuggingKernelSuspend
2709
Comment:
|
4830
Due to LP bug 1066060: Added "As well, please attach your dmesg.txt to your bug report." One is unable to confirm the reporter is correct when one cannot see the full dmesg.
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page describes how to debug Suspend to RAM/Resume problems on your computer. Do not confuse this with Suspend to disk. | <<Include(Debugging/Header)>> ||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; background-image: url('https://librarian.launchpad.net/1812570/bugsquad.png'); background-repeat: no-repeat; background-position: 98% 0.5ex; margin: 0 0 1em 1em; padding: 0.5em;"><<TableOfContents>>|| |
Line 3: | Line 4: |
Suspend and Resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. The usual problem occurs when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem. | = Introduction = |
Line 5: | Line 6: |
Resume problems are difficult to debug since there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC). For those that just have to know, read Documentation/power/s2ram.txt in your kernel sources. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c. | This page describes how to debug Suspend to RAM/Resume problems on your computer. Do not confuse this with Suspend to disk (also known as hibernate). [[UnderstandingSuspend]] may have useful background information on where problems can occur. |
Line 7: | Line 8: |
Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. | Suspend and Resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. Usually problems occur when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem. |
Line 9: | Line 10: |
NTP is not sufficient to recover the correct time of day. By default NTP is configured to to only correct for about an hour drift. You must use the 'date' command to get the RTC value within the drift tolerance. | The debugging procedure described below requires a linux kernel with the capability of "/sys/power/pm_trace". You can ensure that pm_trace'ing is possible by looking in the directory /sys/power/ (from a terminal type {{{ls /sys/power/}}}) to see if a file called pm_trace exists. |
Line 11: | Line 13: |
In order to stimulate your suspend/resume process, enter the follwing commands (as root): | = "resume-trace" debugging procedure for finding buggy drivers = |
Line 13: | Line 15: |
sync echo 1 > /sys/power/pm_trace echo mem > /sys/power/state |
Resume problems are difficult to debug. The approach used here needs to make notes on progress during resume and be able to recover them after a manual reboot. But there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC), so that is what is used. For those that want to know the details, read Documentation/power/s2ram.txt in your kernel sources. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c. |
Line 17: | Line 17: |
At this point your computer should enter the suspend state within a few seconds. Usually the power LED will slowly flash when in the suspended state. When that has happened, initiate the resume process by pressing the power button. Observe closely if the disk light comes on briefly. This indicates that resume has begun. If resume fails to complete, then press the power button until the computer turns off. Power on your computer making sure that it loads the same kernel that exhibited the resume problem. Start a console and enter: | Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. But first you'll want to use 'tune2fs -l <partition>' to find your current settings - look at the "Check interval" setting. |
Line 19: | Line 19: |
From a fresh install, the default time interval is 15552000, which is 6 months. Remember that when you are all done, you'll want to set your RTC clock back. Note that NTP by itself is not sufficient to recover the correct time of day, since by default NTP is configured to do nothing if the time is off by more than 1000 seconds. You must first use the 'ntpdate' or 'date' command to get the clock close. OK. With that understood.... In order to simulate your suspend/resume process, enter the following commands: {{{ sudo sh -c "sync; echo 1 > /sys/power/pm_trace; pm-suspend" }}} At this point your computer should enter the suspend state within a few seconds. Usually the power LED will slowly flash when in the suspended state. When that has happened, initiate the resume process by pressing the power button. Observe closely if the disk light comes on briefly. This indicates that resume has begun. If resume fails to complete, then press the power button until the computer turns off. Power on your computer making sure that it loads the same kernel that exhibited the resume problem. You have about 3 minutes to start this boot process before the information saved in the RTC gets corrupted. Start a console and enter: {{{ |
|
Line 20: | Line 35: |
}}} | |
Line 21: | Line 37: |
You can edit this file and find line similar to these: | You can edit this file and find lines similar to these: |
Line 23: | Line 39: |
{{{ | |
Line 25: | Line 42: |
}}} | |
Line 26: | Line 44: |
There may well be another 'hash matches' line beyond that. If so, then you are in luck because it is the likely culprit. For example: | There may well be another 'hash matches' line beyond that. If so, then you are in luck because the last one is the likely culprit. For example: |
Line 28: | Line 46: |
{{{ | |
Line 29: | Line 48: |
}}} | |
Line 31: | Line 51: |
If you get a device number rather than name, lspci and /sys/devices/pci* are your friends. As well, please attach your dmesg.txt to your bug report. = Further hints = [[http://ubuntuforums.org/showthread.php?p=3066404|Detailed analysis of ACPI kernel code for debugging a suspend problem]] ------ '''Also see:''' * [[DebuggingKernelSuspendHibernateResume]] * [[http://lxr.linux.no/linux/Documentation/power/s2ram.txt|How to get s2ram working]] * [[http://lxr.linux.no/linux/Documentation/power/basic-pm-debugging.txt|Testing suspend to disk (hibernate)]] * [[Hotkeys|Troubleshooting hotkey issues]] ---- CategoryBugSquad CategoryDebugging |
This page is part of the debugging series — pages with debugging details for a variety of Ubuntu packages. |
Introduction
This page describes how to debug Suspend to RAM/Resume problems on your computer. Do not confuse this with Suspend to disk (also known as hibernate). UnderstandingSuspend may have useful background information on where problems can occur.
Suspend and Resume use facilities within your BIOS called ACPI, or Advanced Configuration and Power Interface. Linux provides an ACPI subsystem that manages the suspend and resume process. Usually problems occur when resuming, and normally the culprit is a device driver that does not recover from a powered down state. If your computer successfully performs a suspend, then it is quite likely any resume problems are due to another device driver and not the ACPI subsystem.
The debugging procedure described below requires a linux kernel with the capability of "/sys/power/pm_trace". You can ensure that pm_trace'ing is possible by looking in the directory /sys/power/ (from a terminal type ls /sys/power/) to see if a file called pm_trace exists.
"resume-trace" debugging procedure for finding buggy drivers
Resume problems are difficult to debug. The approach used here needs to make notes on progress during resume and be able to recover them after a manual reboot. But there is no non-volatile storage available at the time resume is bringing up your computer. The only hardware on a PC motherboard that retains information across power cycles is the real time clock (RTC), so that is what is used. For those that want to know the details, read Documentation/power/s2ram.txt in your kernel sources. The implementation of suspend/resume debug trace is in drivers/base/power/trace.c.
Caveat Emptor: Using the following debug suggestions will radically change the values in your RTC chip, so much so that your file system will think it has been eons since the last fsck. You can avoid a long fsck delay by using 'tune2fs'. For example, 'tune2fs -i 0 /dev/sda1' disables fsck on boot. But first you'll want to use 'tune2fs -l <partition>' to find your current settings - look at the "Check interval" setting.
From a fresh install, the default time interval is 15552000, which is 6 months.
Remember that when you are all done, you'll want to set your RTC clock back. Note that NTP by itself is not sufficient to recover the correct time of day, since by default NTP is configured to do nothing if the time is off by more than 1000 seconds. You must first use the 'ntpdate' or 'date' command to get the clock close.
OK. With that understood.... In order to simulate your suspend/resume process, enter the following commands:
sudo sh -c "sync; echo 1 > /sys/power/pm_trace; pm-suspend"
At this point your computer should enter the suspend state within a few seconds. Usually the power LED will slowly flash when in the suspended state. When that has happened, initiate the resume process by pressing the power button. Observe closely if the disk light comes on briefly. This indicates that resume has begun. If resume fails to complete, then press the power button until the computer turns off. Power on your computer making sure that it loads the same kernel that exhibited the resume problem. You have about 3 minutes to start this boot process before the information saved in the RTC gets corrupted.
Start a console and enter:
dmesg > dmesg.txt
You can edit this file and find lines similar to these:
[ 11.323206] Magic number: 0:798:264 [ 11.323257] hash matches drivers/base/power/resume.c:46
There may well be another 'hash matches' line beyond that. If so, then you are in luck because the last one is the likely culprit. For example:
hash matches device i2c-9191
The only way to prove this is to remove the module prior to initiating suspend. Repeat as needed...
If you get a device number rather than name, lspci and /sys/devices/pci* are your friends.
As well, please attach your dmesg.txt to your bug report.
Further hints
Detailed analysis of ACPI kernel code for debugging a suspend problem
Also see:
DebuggingKernelSuspend (last edited 2019-01-28 21:01:56 by penalvch)