ThermalIssues

Thermald

The thermald daemon prevents machines from overheating and was introduced in the 14.04 Ubuntu Trusty LTS release. It monitors thermal sensors and will modify cooling controls to keep the hardware cool. Thermald uses the available CPU temperature sensors and will keep the CPU from overheating. If hardware supplies a skin temperature sensor then by default thermald will endeavour to keep the skin temperature under 45 degrees C.

Thermald can control cooling via:

* active or passive cooling devices as presented in sysfs

* the Running Average Power Limit (RAPL) driver (Sandybridge upwards)

* the Intel P-state CPU frequency driver (Sandybridge upwards)

* the CPU freq driver

* the Intel PowerClamp driver

Thermald works in two modes:

Zero configuration mode

By default, thermald runs in zero configuration mode. This is is normally sufficient for most systems. Thermald will attempt to use the CPU Digital Thermal Sensor (DTS) to sense the temperature and use the P-state driver, Running Average Power Limit (RAPL), PowerClamp and cpufreq to control cooling.

XML configuration mode

One can fine tune, optimise and workaround buggy ACPI configurations with a user defined XML configuration file.

Installation

To install thermald on Ubuntu Trusty, use:

sudo apt-get install thermald

Thermald will install and start running automatically in the default zero configuration mode.

Modifying the default configuration

The thermald configuration file /etc/thermald/thermal-conf.xml can be modified to optimize thermald for your specific hardware. The configuration is based around the ACPI thermal model, where platform regions are divided into thermal zones and these contain physical devices, cooling controls and thermal sensors. A thermal zone may contain one or more thermal sensors. Cooling controls modify the behavior of cooling devices, such as fan or a driver interface that can throttle back a device too cool it down, such as a CPU. Cooling can be active such as a fan (which can create noise and consume more power but will not reduce system performance) or passive (such as CPU performance throttling).

Cooling devices can be activated when a trip point temperature threshold has been reached. Thermal zones may have one or more trip points.

In theory the ACPI thermal governor should be able to uses the ACPI thermal zone information to keep the machine' temperature under control. The reality is that the machine's ACPI configuration may be buggy or not correctly tuned. The thermald configuration can be used to correct or optimise the system's thermal behaviour.

Some cooling devices may not have a thermal sysfs interface, so the thermald configuration allows one to connect a cooling zone to a cooling device. Sometimes a platform has thermal zones but no associated actions for cooling, in which case the thermald configuration can be used to marry up cooling devices with thermal zones.

For more configuration information, consult the thermal-conf.xml man page.

Below is a very simple configuration that kicks in various passive cooling devices of the CPU package reaches 80 degrees C. Each cooling device is given a different influence value to allow thermald to select each device in the specified rank (highest to lowest influence ranking).

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Example Laptop</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalSensors>
                <ThermalSensor>
                        <Type>pkg-temp-0</Type>
                        <Path>/sys/class/thermal/thermal_zone1/</Path>
                        <AsyncCapable>1</AsyncCapable>
                </ThermalSensor>
        </ThermalSensors>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu package</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>pkg-temp-0</SensorType>
                                        <Temperature>80000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                        <CoolingDevice>
                                                <index>1</index>
                                                <type>rapl_controller</type>
                                                <influence> 50 </influence>
                                                <SamplingPeriod> 10 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>2</index>
                                                <type>intel_pstate</type>
                                                <influence> 40 </influence>
                                                <SamplingPeriod> 10 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>3</index>
                                                <type>intel_powerclamp</type>
                                                <influence> 30 </influence>
                                                <SamplingPeriod> 10 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>4</index>
                                                <type>cpufreq</type>
                                                <influence> 20 </influence>
                                                <SamplingPeriod> 8 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>5</index>
                                                <type>Processor</type>
                                                <influence> 10 </influence>
                                                <SamplingPeriod> 5 </SamplingPeriod>
                                        </CoolingDevice>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

Once one has modified the configuration file, stop thermald and restart it:

sudo service thermald restart

If the configuration is incorrect or one wants to observe thermald running, one can use:

sudo service thermald stop
sudo thermald --no-daemon --loglevel=debug

..and one can halt this with Control-C. When the configuration looks correct, restart thermald using:

sudo service thermald start

References

* https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon

* https://lists.01.org/mailman/listinfo/linux-thermal-daemon

Kernel/PowerManagement/ThermalIssues (last edited 2014-10-25 20:46:48 by dsmythies)