HighTemperatures

Revision 1 as of 2010-05-20 10:30:07

Clear message

There are a number of causes of high temperatures and excessive fan use being reported by a systems sensors. This page intends to provide background information on how you might better isolate the real cause of the issue, to help prevent conflation of issues onto a single bug; a bug which says my machine is too hot will simply collect duplicates and me-toos and become useless. This page also aims to record known issues in this area so that the most appropriate bug can be found, these are arranged by release.

Diagostic Techniques

Monitoring System Sensors

Often bugs are characterised by a feeling that the machine is worse now than sometime in the past. To confirm this it is sensible to get concrete information using the system sensors.

A simple way to get a visual feel for the current temperatures is to run the following command in a terminal window (menu item Applications/Accessories/Terminal):

  • cd /proc/acpi/thermal_zone && watch grep temperature */*

This will display a constantly updating listing of your current temperatures:

  • Every 2.0s: grep temperature TZ00/cooling_mode TZ00...  Thu May 20 11:06:27 2010
    
    TZ00/temperature:temperature:             52 C
    TZ01/temperature:temperature:             47 C
    TZ02/temperature:temperature:             0 C

To provide a permanent record of this information you can paste the command below into a terminal:

  • ( cd /proc/acpi/thermal_zone && \
    while :; do \
      line="`date`:`grep temperature */* | awk '{ printf(\" %03d\", $2) }'`"; \
      echo "$line"; \
      sleep 10; \
    done ) | tee LOG

This will provide a log of the temperatures over time in a file called LOG. Which can be attached to a launchpad bug report:

  • Thu May 20 11:13:40 BST 2010: 051 047 000
    Thu May 20 11:13:50 BST 2010: 051 047 000
    Thu May 20 11:14:00 BST 2010: 051 047 000
    Thu May 20 11:14:10 BST 2010: 051 047 000
    Thu May 20 11:14:20 BST 2010: 051 048 000
    Thu May 20 11:14:30 BST 2010: 051 048 000

Known Issues

Below are a list of known temperature/fan related bugs with information on how to tell which bug you have and also indicating if they are fixed and if in which releases and kernel versions.

Lucid

Numerous Dell systems suffering total fan failure after suspend/resume (FIXED)

526354 -- A number of Dell models suffered from total fan failure following suspend/resume. This tended to exhibit itself in one of two ways. Firstly sensor readings (see above) tended to float up from the normal around 40c level to more like 70c. Secondly under heavy load the machine would drift up to 90c or so and then power off without warning, exhibiting very high fan speeds on reboot for the first few minutes.

This issue was triggered by an embedded controller (EC) interface issue, wherein the EC would become confused following a suspend/resume cycle and no longer control the fans on our behalf. This issue was fixed shortly following the release of Lucid and contained in kernels 2.6.32-22.33 and later.

This issue is only known to affect Dell systems. The key indicator that you are seeing this bug is that fans and temperature are controlled normally until after a suspend/resume. Upgrading to the latest kernel should fix the issue.

ATI Radeon based systems running hot since upgrades to Lucid (Open)

563156 -- There are a number of reports of systems running hot, often with fans running constantly on systems with ATI Graphics. There are reports that switching to fgrlx binary graphics drivers returns fan control to normal.

To confirm this is your issue, it would be good to get temperature readings from a previous release (you can use live CD for this) and from Lucid. Also installing fgrlx binary drivers from Jockey (menu item System/Administration/Hardware Drivers) and comparing temperatures before and after would be useful. Please report back on the bug should you have this issue.

Upgrade to Lucid causes overheating, scratch re-install fixes it

There are sporadic reports that an upgrade to Lucid (all from Karmic so far) may leave you with poor fan control but that a scratch install then resolves things. So far the reports are scetchy and it is possible these are occurances of the EC: issues above, it is possible a kernel update would have fixed this. We await more testing on this one.