Bug59695

Differences between revisions 1 and 17 (spanning 16 versions)
Revision 1 as of 2007-10-30 02:59:30
Size: 4993
Editor: i5387D724
Comment: Raw summary of bug 59695, to make it easier to maintain
Revision 17 as of 2008-03-24 09:07:12
Size: 8430
Editor: d83-183-7-217
Comment: TOC and title
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695 Bug 59695] is about not preventing bad behaviour of drives (high Load_Cycle_Count numbers) on (laptop) harddrives. #title Bug #59695 High frequency of load/unload cycles on some hard disks may shorten lifetime

||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; background-repeat: no-repeat; background-position: 98% 0.5ex; margin: 0 0 1em 1em; padding: 0.5em;">'''Contents'''[[BR]][[TableOfContents]]||

[https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695 Bug 59695 ("default value in power.sh potentially kills laptop disks")] is about not preventing bad behaviour of drives (high Load_Cycle_Count numbers) on (laptop) harddrives.
Line 5: Line 9:
There appear to be two issues here: HDD spin down and "Power cycling", whereas the first one has a [https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/132 relative] short default (90 seconds), but the latter one gets the most complains (it's about '''Load_Cycle_Count'''). There appear to be two issues here: HDD spin down and "Power cycling", whereas the first one has a [https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/132 relative] short default (60 seconds), but the latter one gets the most complains (it's about '''Load_Cycle_Count''').
Line 7: Line 11:
Note: you should replace $HDD everywhere with your device, e.g. "/dev/sda".
If you have several harddrives, you need to change it accordingly and duplicate lines in workarounds.
The disk '''Load_Cycle_Count''' issue appears to be caused by a combination of two problems -- The first is overly-aggressive power management from what might be considered buggy hardware. The second is that Ubuntu appears to be touching the hard drive on a regular basis for one reason or another.

Note: In sections below relating to how to prevent damage to your hard disk, you should replace $HDD everywhere with your device, e.g. "/dev/sda" or "/dev/hda". If you have several harddrives, you need to change it accordingly and duplicate lines in workarounds.
Line 19: Line 24:
By default, Ubuntu should not adjust any power management settings of the harddisk. It [http://mjg59.livejournal.com/77672.html appears to be] the official policy of Ubuntu that by default, Ubuntu should not adjust any power management settings of the harddisk. Unfortunately, this policy has two negative effects: It leaves quite a few people with broken hard drives that would otherwise not be broken, and it quite simply makes people who love Ubuntu feel neglected. This issue has been going on a long time.
Line 21: Line 26:
But the problem appears to be, that some manufactures' defaults are too aggressive and that Ubuntu might cause too many unbuffered disk accesses. The problem appears to be that some manufacturers' defaults are too aggressive and that Ubuntu might cause too many unbuffered disk accesses -- the combination of which can cause over a thousand parks a day on some systems.
Line 23: Line 28:
In /etc/acpi/power.sh, laptop mode gets handled. If it gets enabled, hdparm is called with "-B 1", if it gets disabled with "-B 255".
(Beca
use "-B 254" seems to work better for most people (and won't do harm where -B 255 also works), this should get changed AFAICS.)
In /etc/acpi/power.sh, laptop mode gets handled. If it gets enabled, hdparm is called with "-B 1", if it gets disabled with "-B 254". 254 is the least aggressive setting. 255 is off, but does not work for all disks.
Line 27: Line 31:
This results in spinning down the harddrive after 12*5=90 seconds of inactivitiy. This does not influence '''Load_Cycle_Count''' however. This results in spinning down the harddrive after 12*5=60 seconds of inactivitiy. This does not influence '''Load_Cycle_Count''' however.
Line 30: Line 34:
[https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/47 "/etc/apm" is not supposed to be used], but if it would get, it sets spindown to 90 seconds - but does not affect the general APM setting (hdparm -B). [https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/47 "/etc/apm" is not supposed to be used], but if it would get, it sets spindown to 60 seconds - but does not affect the general APM setting (hdparm -B).
Line 34: Line 38:
[https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/78 Blue has created a script], which acts as a wrapper around hdparm and logs, where it's called (including its arguments) [https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/78 Blue has created a script], which acts as a wrapper around hdparm and logs, where it's called (including its arguments)
Line 37: Line 41:
TODO: provide said helper script, allowing others helping to debug this. TODO: provide said helper script, allowing others to help to debug this.
Line 40: Line 44:
Various workarounds have been provided. Various workarounds have been provided that involve adjusting or even turning off power-management of the hard drive. Please keep in mind that this can do more harm than good, so only apply them if you exactly understand what you are doing.
Line 47: Line 51:
 * http://ubuntuforums.org/showpost.php?p=3675960&postcount=26
  * UbuntuDemon I brought some of these proposed fixes together in the hope that is easier to apply for users who are suffering from this problem
Line 49: Line 56:
Create a file with the following two lines: Create a file called 99-fix-park.sh (keep the '99-' and the '.sh', but you can name the file as you like otherwise) with the following two lines:
{{{
Line 51: Line 59:
 hdparm -B 255 $HDD
and copy it to the following directories: /etc/acpi/suspend.d/, /etc/acpi/resume.d/ and /etc/acpi/start.d/
 hdparm -B 254 $HDD
}}}

and copy it to the following directories: /etc/acpi/resume.d/ and /etc/acpi/start.d/
Line 59: Line 68:
Here's another more verbose setup of laptop-mode-tools from Michael: https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/63 Here's another more verbose setup of laptop-mode-tools from Michael: https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/63

An easy, step-by-step walkthrough for a situation-sensitive solution (AC/batteries/heat) can be found here:
http://vale.homelinux.net/wordpress/?p=199
Line 62: Line 74:
 * Disabling disc APM, e.g. in /etc/acpi/power.sh (laptop_mode_disable) should use 254 instead of 255, as quite some users report that 255 ("off") does nothing, but 254 ("least aggressive") works.  * FIXED: Converted most remaining 255's to 254s, and added an explanation.
Line 64: Line 76:
== Conclusing == == Conclusion ==
Line 69: Line 81:
ubuntu_demon has put together a list of TODOs: https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/81 ubuntu_demon has put together a list of TODOs: https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/81
Line 74: Line 86:

Could you kindly explain how to diagnose whether the settings on a system, at a point of time, are correct. I have followed the instructions but still find the load cycle count increasing. How do I diagnose the problem?

== Useful Links ==

 * https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695 (The bug itself)
 * http://ubuntuforums.org/showpost.php?p=3675960&postcount=26 (The ugly fix and How-to for diagnosing the problem)
 * http://mjg59.livejournal.com/77672.html (Ubuntu dev position)
 * http://www.linux-hero.com/rant/explanation-ubuntu-hard-drive-wear-and-tear (Linux Hero comments about the issue)
 * http://www.linux-hero.com/rant/ubuntu-hard-drive-explosions (Linux Hero comments about the issue)
 * http://en.opensuse.org/Disk_Power_Management (Only applies for SuSe - yes SuSe it's affected)
 * http://www.thinkwiki.org/wiki/Problem_with_hard_drive_clicking (Thinkpad users affected)
 * http://www.lesswatts.org/tips/disks.php (LessWatts tips for better power management)
 * http://usenix.org/events/fast07/tech/schroeder.html (Technical paper about hard disks failures)
 * http://usenix.org/events/fast07/tech/pinheiro.html (Technical paper about hard disks failures)
 * http://forum.notebookreview.com/showthread.php?t=168425 (For the ones that still believe this is Linux only... On Windows too.. and most amazing the solution it's something like the ugly fix)
 *

[https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695 Bug 59695 ("default value in power.sh potentially kills laptop disks")] is about not preventing bad behaviour of drives (high Load_Cycle_Count numbers) on (laptop) harddrives.

Because the bug report has 150 comments already, I've tried to summarize it here.

There appear to be two issues here: HDD spin down and "Power cycling", whereas the first one has a [https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/132 relative] short default (60 seconds), but the latter one gets the most complains (it's about Load_Cycle_Count).

The disk Load_Cycle_Count issue appears to be caused by a combination of two problems -- The first is overly-aggressive power management from what might be considered buggy hardware. The second is that Ubuntu appears to be touching the hard drive on a regular basis for one reason or another.

Note: In sections below relating to how to prevent damage to your hard disk, you should replace $HDD everywhere with your device, e.g. "/dev/sda" or "/dev/hda". If you have several harddrives, you need to change it accordingly and duplicate lines in workarounds.

Check

You can check the current value of Load_Cycle_Count of your harddrive(s) using:

  • sudo smartctl -a $HDD | grep Load_Cycle_Count

(You need the smartmontools package for this. I also had to enable SMART monitoring for my drives using sudo smartctl -s on $HDD)

The values for this differ a lot (e.g. it's 0 on my desktop), but it goes up to > 600.000 for others, depending on the lifetime. TODO: add a section with sample values (including the value of Power_On_Hours).

What Ubuntu does

It [http://mjg59.livejournal.com/77672.html appears to be] the official policy of Ubuntu that by default, Ubuntu should not adjust any power management settings of the harddisk. Unfortunately, this policy has two negative effects: It leaves quite a few people with broken hard drives that would otherwise not be broken, and it quite simply makes people who love Ubuntu feel neglected. This issue has been going on a long time.

The problem appears to be that some manufacturers' defaults are too aggressive and that Ubuntu might cause too many unbuffered disk accesses -- the combination of which can cause over a thousand parks a day on some systems.

In /etc/acpi/power.sh, laptop mode gets handled. If it gets enabled, hdparm is called with "-B 1", if it gets disabled with "-B 254". 254 is the least aggressive setting. 255 is off, but does not work for all disks.

In power.sh also the spindown timeout gets set, according to SPINDOWN_TIME from /etc/default/acpi-support (default: 12). This results in spinning down the harddrive after 12*5=60 seconds of inactivitiy. This does not influence Load_Cycle_Count however.

apm

[https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/47 "/etc/apm" is not supposed to be used], but if it would get, it sets spindown to 60 seconds - but does not affect the general APM setting (hdparm -B). (there's a bug, which also used "power_conserve" here, if on_ac_power return "don't know" - [https://bugs.launchpad.net/bugs/156893 bug 156893])

Debug

[https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/78 Blue has created a script], which acts as a wrapper around hdparm and logs, where it's called (including its arguments) From his report, it appears that hdparm always gets called through apm or the init script (the logfile excerpt appears to be from booting).

TODO: provide said helper script, allowing others to help to debug this.

Workaround

Various workarounds have been provided that involve adjusting or even turning off power-management of the hard drive. Please keep in mind that this can do more harm than good, so only apply them if you exactly understand what you are doing.

Try hdparm -B 255 $HDD or hdparm -B 254 $HDD. (255 is supposed to disable APM, but it does not work for some; so 254 sets it to the less aggressive setting)

There are different methods to keep this setting after reboot/resume. Your mileage may vary. There may be more workarounds in the bug report, but essentially, all are using "hdparm -B" to change the apm handling of the harddrive.

Force hdparm values in acpi hooks

Gilles posted the following workaround: Create a file called 99-fix-park.sh (keep the '99-' and the '.sh', but you can name the file as you like otherwise) with the following two lines:

 #!/bin/sh
 hdparm -B 254 $HDD

and copy it to the following directories: /etc/acpi/resume.d/ and /etc/acpi/start.d/ (https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/10)

laptop-mode-tools

Don posted another workaround: Install laptop-mode-tools and set CONTROL_HD_POWERMGMT=1 in /etc/laptop-mode/laptop-mode.conf (https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/19) Here's another more verbose setup of laptop-mode-tools from Michael: https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/63

An easy, step-by-step walkthrough for a situation-sensitive solution (AC/batteries/heat) can be found here: http://vale.homelinux.net/wordpress/?p=199

Proposed fixes

  • FIXED: Converted most remaining 255's to 254s, and added an explanation.

Conclusion

This bug report has attracted a lot of concerned Ubuntu users and it seems quite clear from the user feedback, that other operating systems/distributions handle this better. However, the workaround should be quite simple and this wiki page is a first attempt, to fix this for the better.

Misc

ubuntu_demon has put together a list of TODOs: https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/81

Comments

Please leave any comments/additions here. You may also edit the page directly, but please try to be clear and helpful. The problem has been confirmed and we know that it's a critical thing - please do not repeat that the bug status should be critical.

Could you kindly explain how to diagnose whether the settings on a system, at a point of time, are correct. I have followed the instructions but still find the load cycle count increasing. How do I diagnose the problem?

DanielHahler/Bug59695 (last edited 2010-04-21 19:46:51 by 188-194-18-172-dynip)