HighTemperatures

Differences between revisions 3 and 4
Revision 3 as of 2010-05-20 13:01:05
Size: 4476
Editor: apw
Comment:
Revision 4 as of 2010-05-20 13:14:37
Size: 4670
Editor: apw
Comment:
Deletions are marked like this. Additions are marked like this.
Line 74: Line 74:

It should be noted that between the karmic scratch install and the lucid scratch install temperature improves by 5c or about 20%. It is the upgrade which seems at times to be out of kilter.

There are a number of causes of high temperatures and excessive fan use being reported by a systems sensors. This page intends to provide background information on how you might better isolate the real cause of the issue, to help prevent conflation of issues onto a single bug; a bug which says my machine is too hot will simply collect duplicates and me-toos and become useless. This page also aims to record known issues in this area so that the most appropriate bug can be found, these are arranged by release.

Diagnostic Techniques

Monitoring System Sensors

Often bugs are characterised by a feeling that the machine is worse now than sometime in the past. To confirm this it is sensible to get concrete information using the system sensors.

A simple way to get a visual feel for the current temperatures is to run the following command in a terminal window (menu item Applications/Accessories/Terminal):

  • cd /proc/acpi/thermal_zone && watch grep temperature */*

This will display a constantly updating listing of your current temperatures:

  • Every 2.0s: grep temperature TZ00/cooling_mode TZ00...  Thu May 20 11:06:27 2010
    
    TZ00/temperature:temperature:             52 C
    TZ01/temperature:temperature:             47 C
    TZ02/temperature:temperature:             0 C

To provide a permanent record of this information you can paste the command below into a terminal:

  • ( cd /proc/acpi/thermal_zone && \
    while :; do \
      line="`date`:`grep temperature */* | awk '{ printf(\" %03d\", $2) }'`"; \
      echo "$line"; \
      sleep 10; \
    done ) | tee LOG

This will provide a log of the temperatures over time in a file called LOG. Which can be attached to a launchpad bug report:

  • Thu May 20 11:13:40 BST 2010: 051 047 000
    Thu May 20 11:13:50 BST 2010: 051 047 000
    Thu May 20 11:14:00 BST 2010: 051 047 000
    Thu May 20 11:14:10 BST 2010: 051 047 000
    Thu May 20 11:14:20 BST 2010: 051 048 000
    Thu May 20 11:14:30 BST 2010: 051 048 000

Known Issues

Below are a list of known temperature/fan related bugs with information on how to tell which bug you have and also indicating if they are fixed and if in which releases and kernel versions.

Lucid

Numerous Dell systems suffering total fan failure after suspend/resume (FIXED)

526354 -- A number of Dell models suffered from total fan failure following suspend/resume. This tended to exhibit itself in one of two ways. Firstly sensor readings (see above) tended to float up from the normal around 40c level to more like 70c. Secondly under heavy load the machine would drift up to 90c or so and then power off without warning, exhibiting very high fan speeds on reboot for the first few minutes.

This issue was triggered by an embedded controller (EC) interface issue, wherein the EC would become confused following a suspend/resume cycle and no longer control the fans on our behalf. This issue was fixed shortly following the release of Lucid and contained in kernels 2.6.32-22.33 and later.

This issue is only known to affect Dell systems. The key indicator that you are seeing this bug is that fans and temperature are controlled normally until after a suspend/resume. Upgrading to the latest kernel should fix the issue.

ATI Radeon based systems running hot since upgrades to Lucid (Open)

563156 -- There are a number of reports of systems running hot, often with fans running constantly on systems with ATI Graphics. There are reports that switching to fgrlx binary graphics drivers returns fan control to normal.

To confirm this is your issue, it would be good to get temperature readings from a previous release (you can use live CD for this) and from Lucid. Also installing fgrlx binary drivers from Jockey (menu item System/Administration/Hardware Drivers) and comparing temperatures before and after would be useful. Please report back on the bug should you have this issue.

Upgrade to Lucid causes overheating, scratch re-install fixes it

583099 -- There are sporadic reports that an upgrade to Lucid (all from Karmic so far) may leave you with poor fan control but that a scratch install then resolves things. Reporter has confirmed that a karmic clean install upgraded is showing different levels of idle temperature as compared to a scratch install of lucid on the same hardware. Investigation continues.

It should be noted that between the karmic scratch install and the lucid scratch install temperature improves by 5c or about 20%. It is the upgrade which seems at times to be out of kilter.

Kernel/Debugging/HighTemperatures (last edited 2014-01-03 16:44:50 by penalvch)