HighTemperatures

Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2010-05-20 10:30:07
Size: 4449
Editor: apw
Comment:
Revision 14 as of 2014-01-03 16:44:50
Size: 4199
Editor: penalvch
Comment: Minor grammar fixes.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
There are a number of causes of high temperatures and excessive fan use being reported by a systems sensors. This page intends to provide background information on how you might better isolate the real cause of the issue, to help prevent conflation of issues onto a single bug; a bug which says '''my machine is too hot''' will simply collect duplicates and me-toos and become useless. This page also aims to record known issues in this area so that the most appropriate bug can be found, these are arranged by release. <<Include(Kernel/MenuBar)>>
||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||
Line 3: Line 4:
= Diagostic Techniques = This debugging article is for those who have utilized the techniques noted [[https://help.ubuntu.com/community/PowerManagement/ReducedPower|here]] and it still does not offer significant relief for your high temperatures and overheating. This is due to how there are numerous, different root causes of high temperatures and excessive fan use being reported by a systems sensors. This page intends to provide background information on how you might better isolate the real cause of the issue, to help prevent conflation of issues onto a single bug; a bug which says '''my machine is too hot''' will simply collect duplicates and me-toos and become useless.

= Filing a Bug =

Please ensure you file a '''new''' bug and attach your machine information to it following [[https://wiki.ubuntu.com/ReportingBugs|this article]]. Filing this bug using {{{ubuntu-bug linux}}} from a terminal window (menu item Applications/Accessories/Terminal), allows the report to be fully reviewed and ascertained whether this would be considered a duplicate of another outstanding report. Having the full hardware information for each instance maximizes your chance of getting the root cause addressed. Once the bug is filed please ensure it is tagged {{{kernel-therm}}}.

== Required Information ==

Where you believe you have a difference in thermal behaviour between two kernels or between two releases, please ensure you have your own bug and use the scripts in ''Monitoring System Sensors''
to produce logs of the temperature over time for both the ''before'' and ''after'' scenarios. Include this data with a clear description of the two test cases.

Where the issue is between releases you can use the live CDs for the previous release to attempt to recreate the ''before'' scenario.

= Diagnostic Techniques =
Line 7: Line 21:
Often bugs are characterised by a '''feeling''' that the machine is worse now than sometime in the past.  To confirm this it is sensible to get concrete information using the system sensors. Often bugs are characterised by a '''feeling''' that the machine is worse now than sometime in the past. To confirm this it is sensible to get concrete information using the system sensors.
Line 9: Line 23:
A simple way to get a visual feel for the current temperatures is to run the following command in a terminal window (menu item Applications/Accessories/Terminal): A simple way to get a visual feel for the current temperatures is to run the following command in a terminal window (menu item Applications/Accessories/Terminal): {{{
cd /proc/acpi/thermal_zone && watch grep temperature */*}}}
Line 11: Line 26:
    {{{
cd /proc/acpi/thermal_zone && watch grep temperature */*
}}}

This will display a constantly updating listing of your current temperatures:

   
{{{
This will display a constantly updating listing of your current temperatures: {{{
Line 22: Line 31:
TZ02/temperature:temperature: 0 C
}}}
TZ02/temperature:temperature: 0 C}}}
Line 25: Line 33:
To provide a permanent record of this information you can paste the command below into a terminal:

   
{{{
To provide a permanent record of this information you can paste the command below into a terminal: {{{
Line 33: Line 39:
done ) | tee LOG
}}}
done ) | tee LOG}}}
Line 36: Line 41:
This will provide a log of the temperatures over time in a file called LOG.  Which can be attached to a launchpad bug report:

   
{{{
This will provide a log of the temperatures over time in a file called LOG. Which can be attached to a launchpad bug report: {{{
Line 44: Line 47:
Thu May 20 11:14:30 BST 2010: 051 048 000
}}}
Thu May 20 11:14:30 BST 2010: 051 048 000}}}
Line 50: Line 51:
Below are a list of known temperature/fan related bugs with information on how to tell which bug you have and also indicating if they are fixed and if in which releases and kernel versions. == ATI Radeon based systems running hot since upgrades to Lucid ==
Line 52: Line 53:
Bug:563156 -- There are a number of reports of systems running hot, often with fans running constantly on systems with ATI Graphics. There are reports that switching to fgrlx binary graphics drivers returns fan control to normal.
Line 53: Line 55:
== Lucid ==

=== Numerous Dell systems suffering total fan failure after suspend/resume (FIXED) ===

Bug:526354 -- A number of Dell models suffered from total fan failure following suspend/resume. This tended to exhibit itself in one of two ways. Firstly sensor readings (see above) tended to float up from the normal around 40c level to more like 70c. Secondly under heavy load the machine would drift up to 90c or so and then power off without warning, exhibiting very high fan speeds on reboot for the first few minutes.

This issue was triggered by an embedded controller (EC) interface issue, wherein the EC would become confused following a suspend/resume cycle and no longer control the fans on our behalf. This issue was fixed shortly following the release of Lucid and contained in kernels 2.6.32-22.33 and later.

This issue is only known to affect Dell systems. The key indicator that you are seeing this bug is that fans and temperature are controlled normally until after a suspend/resume. Upgrading to the latest kernel should fix the issue.


=== ATI Radeon based systems running hot since upgrades to Lucid (Open) ===

Bug:563156 -- There are a number of reports of systems running hot, often with fans running constantly on systems with ATI Graphics. There are reports that switching to fgrlx binary graphics drivers returns fan control to normal.

To confirm this is your issue, it would be good to get temperature readings from a previous release (you can use live CD for this) and from Lucid. Also installing fgrlx binary drivers from Jockey (menu item System/Administration/Hardware Drivers) and comparing temperatures before and after would be useful. Please report back on the bug should you have this issue.


== Upgrade to Lucid causes overheating, scratch re-install fixes it ==

There are sporadic reports that an upgrade to Lucid (all from Karmic so far) may leave you with poor fan control but that a scratch install then resolves things. So far the reports are scetchy and it is possible these are occurances of the EC: issues above, it is possible a kernel update would have fixed this. We await more testing on this one.
To confirm this is your issue, please file a new report. In this new report, please obtain temperature readings from a previous release (you can use a live environment for this) and from the [[http://cdimage.ubuntu.com/daily-live/current/|latest development release]]. Also installing fgrlx binary drivers from Jockey (menu item System/Administration/Hardware Drivers) and comparing temperatures before and after would be useful.

This debugging article is for those who have utilized the techniques noted here and it still does not offer significant relief for your high temperatures and overheating. This is due to how there are numerous, different root causes of high temperatures and excessive fan use being reported by a systems sensors. This page intends to provide background information on how you might better isolate the real cause of the issue, to help prevent conflation of issues onto a single bug; a bug which says my machine is too hot will simply collect duplicates and me-toos and become useless.

Filing a Bug

Please ensure you file a new bug and attach your machine information to it following this article. Filing this bug using ubuntu-bug linux from a terminal window (menu item Applications/Accessories/Terminal), allows the report to be fully reviewed and ascertained whether this would be considered a duplicate of another outstanding report. Having the full hardware information for each instance maximizes your chance of getting the root cause addressed. Once the bug is filed please ensure it is tagged kernel-therm.

Required Information

Where you believe you have a difference in thermal behaviour between two kernels or between two releases, please ensure you have your own bug and use the scripts in Monitoring System Sensors to produce logs of the temperature over time for both the before and after scenarios. Include this data with a clear description of the two test cases.

Where the issue is between releases you can use the live CDs for the previous release to attempt to recreate the before scenario.

Diagnostic Techniques

Monitoring System Sensors

Often bugs are characterised by a feeling that the machine is worse now than sometime in the past. To confirm this it is sensible to get concrete information using the system sensors.

A simple way to get a visual feel for the current temperatures is to run the following command in a terminal window (menu item Applications/Accessories/Terminal):

cd /proc/acpi/thermal_zone && watch grep temperature */*

This will display a constantly updating listing of your current temperatures:

Every 2.0s: grep temperature TZ00/cooling_mode TZ00...  Thu May 20 11:06:27 2010

TZ00/temperature:temperature:             52 C
TZ01/temperature:temperature:             47 C
TZ02/temperature:temperature:             0 C

To provide a permanent record of this information you can paste the command below into a terminal:

( cd /proc/acpi/thermal_zone && \
while :; do \
  line="`date`:`grep temperature */* | awk '{ printf(\" %03d\", $2) }'`"; \
  echo "$line"; \
  sleep 10; \
done ) | tee LOG

This will provide a log of the temperatures over time in a file called LOG. Which can be attached to a launchpad bug report:

Thu May 20 11:13:40 BST 2010: 051 047 000
Thu May 20 11:13:50 BST 2010: 051 047 000
Thu May 20 11:14:00 BST 2010: 051 047 000
Thu May 20 11:14:10 BST 2010: 051 047 000
Thu May 20 11:14:20 BST 2010: 051 048 000
Thu May 20 11:14:30 BST 2010: 051 048 000

Known Issues

ATI Radeon based systems running hot since upgrades to Lucid

563156 -- There are a number of reports of systems running hot, often with fans running constantly on systems with ATI Graphics. There are reports that switching to fgrlx binary graphics drivers returns fan control to normal.

To confirm this is your issue, please file a new report. In this new report, please obtain temperature readings from a previous release (you can use a live environment for this) and from the latest development release. Also installing fgrlx binary drivers from Jockey (menu item System/Administration/Hardware Drivers) and comparing temperatures before and after would be useful.

Kernel/Debugging/HighTemperatures (last edited 2014-01-03 16:44:50 by penalvch)