KarmicBetterSuspendResume

Revision 6 as of 2009-06-08 21:26:23

Clear message

Summary

We integrated suspend and resume apport reporting in Jaunty and got a ton of reports of Resume failures, We would like to improve suspend/resume experience in Karmic. Triaging the suspend resume bugs that got opened recently we have identified that most of the issues were related to modules that deal with graphics card drivers and ethernet/wireless drivers. In this session we intend to do a little bit of brain storming to improve suspend/resume in Karmic.

Release Note

The impact of changes that will be put in place as a result of this session should improve suspend/resume experience on Karmic.

Rationale

Users of Laptops and Netbooks almost never shutdown and power on their devices, they prefer to close the laptop lid and expect the laptops to suspend, and open the lid to resume. This saves the state of the various applications they were running before suspend, and resume using these application upon resume. This continues to be the use case for laptops and netbooks in spite of fast shutdown/reboot times in Jaunty.

Unresolved issues

We have over 600 bugs opened on suspend/resume for jaunty, although I was able to identify some offending drivers that are common to some of the bugs, we are not able to resolve all of them.

Summary of Discussion

There was discussion how things went in the Jaunty cycle when we enabled reporting on suspend/resume issues. There was an overwealming amount of bug reports and we were unable to cope and fairly respond to them in a timely fashion. This work did find several generic bugs but we were unable to satisfactory find common theams. Some improvements were made over the cycle including better reporting and potential to use the hardware DB.

It was pointed out that Rafael had written an OLS paper on the improvements coming in this area.

There was disussions of what our goals should be in the Karmic cycle. We clearly want to continue to improve the suspend/resume experience and if nothing else our testing in Jaunty has shown we have a problem. Also we have a number of improvements in the pipe coming from KMS which we would like to evaluate against Jaunty's results. The direct Goals were:

  • no drivers needing removal
  • a review and cleanup of the pm-util scripts
  • better unstanding of the Jaunty issues via data analysis
    • dupping of common problems together etc

A number of ideas were suggested:

  • investigate the debugging features and why they don't work
  • investigate leaving the console on longer
    • no_console_suspend how would this affect the experience
    • how does KMS affect this
  • /sys/power/pm_test file, tell it which level suspends to go down to
  • look at some Karmic testing, but more focused this time
    • look at very bad machines, and request specific testing on those
    • Can script updating bug reports from testers w/ request to re-test
    • there are many suspend resume bug fixes available in newer kernels,
    • expectation is that things or better, being able to measure that
    • has value
  • look at how we can help the normal people find out how to communicate
  • upstream
    • WIP for lp/bugzilla plugin for kernel bugzilla
  • are we upstreaming bugs where we are using suspend/resume remove of
  • module as a solution
    • we should be opening generic bugs for failures, with upstream bugs
    • linked each
    • we should be getting upstream kernel testing if possible too

We should also consider allowing users to opt in to reporting more information to make these reports better:

  • do you want to send data "i don't mind":
    • success reports
      • success needs data on the environment that leads to success
    • having something in launchpad, allowing us to take more testing
    • with them and reporting success etc

Design

  • Several design goals were discussed at UDS.
    • Not requiring removal of kernel modules.
      • But there are certain modules that are in experimental/staging area in the kernel that causes suspend resume issues, and removing them before suspend fixes resumes problems. The modules need to be identified and maintainers notified of issues these modules have on resume.
    • Review and clean up of pm-utils scripts
      • Perhaps as a first steps modifying the scripts such that we can isolate suspend issues from resume issues might be a good first step.
    • Data mining and analysis of Jaunty suspend resume bugs for clues.
    • Due to the varied nature of hardware on which these problems are seen, a wider community effort in debugging suspend/resume issues is called for.

Implementation

  • Kernel Modules
    • Debugging tip & tricks

    • Identifying problem areas in offending modules
    • Fixing modules and working problems upstream
    • Greater community involvement in Debugging
    • Roll in new suspend/resume support from upstream
  • X related suspend/resume
    • Measure impact of KMS on resolving suspend/resume issues
    • Leaving console on for longer duration.
    • Help originators of bug reports communicate X related failures better
  • Data Mining and Analysing suspend/resume bugs in Jaunty
    • Use simple methods to connect common points of failure in drivers and subsystems
    • Gather information on hardware that are known to cause problems with Suspend/Resume
    • Dupe bugs that are known to be caused by single point of failure together.
  • pm-utils refresh
    • Identify problems areas in the pm-utils scripts and propose improvements
    • Better reporting of suspend and resume points of failure
    • community involvement in review of pm-utils scripts to better handle problem hardware


CategorySpec