IdentifyingIssues

To minimise power consumpution applications need to be behaving correctly. Unfortunately some applications are far from perfect and cause unnecessary work for the system by causing frequent CPU wakeups or disk I/O activity.

Wakeups

Applications and daemons generally are not running 100% of the time and can be in an idle state waiting for I/O to complete. Many block (wait) for I/O to complete using poll() or select() and these allow the application to also timeout after a specified amount of time. The application may use the timeout to update some status or just re-do the poll() or select(). Badly behaving applications may wait for a very short time, causing multiple wakeups a second which impacts on the ability of the CPU to stay in a deep idle state. Some applications may even have a zero timeout, which causes a busy loop and never lets the CPU enter an idle state.

Identifying rogue processes is a two step process. Firstly, we need to identify tasks that are frequently waking up the system. Tools such as powertop help identify rogue tasks, for Ubuntu 12.04 Precise LTS we recommend using the older version of powertop, install it as follows:

sudo apt-get install powertop-1.13

and run for 60 seconds as follows:

sudo powertop-1.13 -d -t 60 > powertop.log

Looking at the powertop.log you will see a section titled "Top causes for wakeups:" and this will list running processes in order of wakeups per second. An excerpt from this is a follows:

   1.4% ( 10.0)   gwibber-service
   1.4% (  9.9)   ubuntuone-syncd
   0.8% (  5.5)   [ata_piix] <interrupt>
   0.7% (  5.0)   syndaemon

Lets examine ubuntuone-syncd. It appears to be waking up 9.9 times a second, so what is it doing? First we identify its process ID:

ps -e | grep ubuntuone-syncd
2331 ?        00:01:03 ubuntuone-syncd

..and now we then attach strace to that PID as follows:

sudo strace -p 2331
Process 2331 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
read(10, 0x7fffe58ab1e0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
read(22, 0x285a2c4, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout)
read(10, 0x7fffe58ab1e0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
read(22, 0x285a2c4, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout)

..hit control-C to stop strace.

The final argument to poll() [see man 2 poll] is the timeout (in milliseconds, and a negative value is in infinite timeout). In this case poll waits for 101 milliseconds and then returns 0, which indicates a timeout occurred and none of the file descriptors were ready for I/O. If one observes this process with strace for a while one notices that the poll is frequently timingout on the 101 millisecond wait, so this is effectively a 9.9Hz polling loop which needs fixing (see bug LP#906924).

Applications may also using select() in the same way, with the final argument being being the address of a struct timeval and a return of zero from select() indicating a timeout. See man 2 select for more details.

Process Activity

There are many useful tools to monitor process activity and allow one to find heavy CPU utilization, such as top, htop, ps and vmstat. Sometimes a seemingly idle system is busy running lots of very short lived processes that are sometimes hard to spot with these tools. Ubuntu 14.04 LTS introduces the forkstat tool to find and log process activity allowing one to identify these short lived processes. Most are legitimate, however, sometimes a rogue cron job or daemon may be creating many short lived power consuming redundant processes.

Disk Wakeups

Badly behaving applications may also be frequently waking up the disk drive and forcing data to be sync'd to disk using fsync() or sync(). For example, debug logging may be still turned on and frequently writing debug messages. Applications may have unnecessarily opened files with O_SYNC which depedening on the filesystem causes file data and metadata to be written to disk before returning from a writes, this causes disk wakeups and writes where instead data could be held in the buffer cache.

The easiest problem to identify is frequent disk I/O from seemingly idle applications. Simply run powertop-1.13 for 60 seconds as follows:

sudo powertop-1.13 -d -t 60 > powertop.log

Looking at the powertop.log you will see a section titled "Disc accesses:" and this will list processes that were writing to disk and the name of the file and device it was writing to. An excerpt from this is a follows:

Disk accesses:
The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite-journ' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'INBOX.msf' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5

Let us explore the 'indicator-weath' writes. The name is truncated because powertop fetches the name from the /proc/$pid/comm field. We need to identify the process ID and the full pathname to the file as follows:

ps -e | grep indicator-weath
 2024 ?        00:00:04 indicator-weath

lsof -p 2024 | grep indicator-weather.log
indicator 2024 ubuntu    3u   REG                8,5   154647 11796492 /home/ubuntu/.cache/indicator-weather.log

And since it is a log, we can see any recent activity:

tail -10 /home/king/.cache/indicator-weather.log
[Fetcher] 2012-01-16 16:57:52,805 - DEBUG - Weather: night, show 'weather-clear-night' icon
[Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: fill in menu with params: city='Gatwick', temp='Temperature: 3 ˚C', humid='Humidity: 65%', wind='Wind: E at 8 mph', sunrise='Sunrise: 07:59:41', sunset='Sunset: 16:22:13', puretemp=3 ˚C
[Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: menu_normal: filling in a menu for found places
[Fetcher] 2012-01-16 16:57:52,816 - DEBUG - Indicator: update_label: setting label to ' '
[Fetcher] 2012-01-16 16:57:52,821 - DEBUG - Indicator: setting refresh label to 'Refresh (just now)'
[Fetcher] 2012-01-16 16:57:52,823 - DEBUG - Indicator: scheduling refresh label update in 1 min
[Fetcher] 2012-01-16 16:57:52,833 - DEBUG - Weather: night, show 'weather-clear-night' icon
[Fetcher] 2012-01-16 16:57:52,835 - DEBUG - Settings: setting '23387567'='{'humidity': u'Humidity: 65%', 'sunrise': 'Sunrise: 07:59:41', 'sunset': 'Sunset: 16:22:13', 'wind': u'Wind: E at 8 mph', 'humidex': None, 'icon': 'weather-clear-night', 'temper': u'Temperature: 3 \u02daC', 'condition': u'Clear', 'label': '3 \xcb\x9aC'}'
[Fetcher] 2012-01-16 16:57:52,836 - DEBUG - Indicator: update_label: setting label to '3 ˚C'
[Fetcher] 2012-01-16 16:57:52,837 - DEBUG - Indicator: scheduling update in 15 mins

Looking at the log more deeply we can see that the applet is writing DEBUG messages to the log every minute which is keeping the disk drive from being in an idle state, so this needs fixing (e.g. LP#917253).

power-usage-report

power-usage-report is a script which calls fatrace for measuring file access activity from programs and powertop-1.13 to measure process and device wakeups. The script generates a report which is appropriate to attach to bug reports. It is now part of fatrace version 0.4 and is availble in Ubuntu 12.04 LTS Precise Pangolin.

To install:

sudo apt-get install fatrace powertop-1.13

and run:

sudo power-usage-report

The output includes several sections covering disk access statistics (which prevent the disk from spinning down), wakeups (causing CPU power usage) and device activity. The Disk/wakeups are sorted in descending order, ranked by process.

$ sudo power-usage-report
Measurement will begin in 5 seconds. Please make sure that the
computer is idle, i. e. do not press keys, start or operate programs, and that
programs are not busy with active tasks other than the one you want to examine.
Starting measurement for 60 seconds...
Measurement complete. Generating report...
======= unity-panel-ser: 4 file access events ======
/usr/share/zoneinfo/UTC: 1 reads
/etc/timezone:
/etc/localtime: 2 reads
/usr/share/zoneinfo/GB-Eire: 1 reads

======= deja-dup-monito: 2 file access events ======
/etc/localtime: 2 reads

======= gnome-settings-: 1 file access events ======
/etc/fstab: 1 reads

====== Wakeups ======
  36.2% ( 61.8)   [i915] <interrupt>
  32.8% ( 55.9)   compiz
  13.1% ( 22.3)   java
   3.2% (  5.5)   [ahci] <interrupt>
   2.1% (  3.6)   swapper/2
   1.1% (  2.0)   [iwlwifi] <interrupt>
[...]

====== Devices ======
An audio device is active 100.0% of the time:
hwC0D2 Realtek ALC889 

A USB device is active 100.0% of the time:
/sys/bus/usb/devices/1-1

Recent USB suspend statistics
Active  Device name
100.0%  USB device 1-1.6 : USB Multimedia Keyboard (LITEON Technology)
  0.0%  /sys/bus/usb/devices/2-1
100.0%  /sys/bus/usb/devices/1-1
  0.0%  USB device usb4 : xHCI Host Controller (Linux 3.3.0-1-generic xhci_hcd)
  0.0%  USB device usb3 : xHCI Host Controller (Linux 3.3.0-1-generic xhci_hcd)
  0.0%  USB device usb2 : EHCI Host Controller (Linux 3.3.0-1-generic ehci_hcd)
100.0%  USB device usb1 : EHCI Host Controller (Linux 3.3.0-1-generic ehci_hcd)

Runtime Device Power Management statistics
Active  Device name
  0.0%  04:00.0 FireWire (IEEE 1394): Texas Instruments XIO2200A IEEE-1394a-2000 Controller (PHY/Link) 
  0.0%  02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 
  0.0%  00:1f.2 SATA controller: Intel Corporation Panther Point 6 port SATA Controller [AHCI mode] 
  0.0%  00:1d.0 USB controller: Intel Corporation Panther Point USB Enhanced Host Controller #1 
[...]

Devices without runtime PM

03:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge 
00:1f.3 SMBus: Intel Corporation Panther Point SMBus Controller 
00:1f.0 ISA bridge: Intel Corporation Panther Point LPC Controller 
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge 

Recent audio activity statistics
Active  Device name
100.0%  hwC0D2 Realtek ALC889 
100.0%  hwC0D3 Intel PantherPoint HDMI 

Recent SATA AHCI link activity statistics
Active  Partial Slumber Device name

Deeper Inspection with health-check

health-check monitors processes in various ways to help identify areas where it is consuming too many resources. One can trace one or more processes (including all their threads and child processes too) for a full story of system activity.

to install:

sudo add-apt-repository ppa:colin-king/white
sudo apt-get update && sudo apt-get install health-check

One can attach to a running process using the -p option and follow all threads using the -f option, for example to do 120 seconds of tracing on compiz:

sudo health-check -f -p compiz -d 120

One also start a process and trace it too:

sudo health-check -u youruserid -f firefox

Consult the health-check manual page for more options.

Deeper Inspection with Cachegrind

Cachegrind simulates the cache interactation with a given program. It simulates a machine with independent first-level instruction (I1) and data caches (D1) and also a unified second-level cache (L2) as found on many modern computers.

Cachegrind is tool inside valgrind, to install use:

sudo apt-get install valgrind

and to run, use:

valgrind --tool=cachegrind program-name

For more in-depth details, refer to the cache-grind documentation

Filing Bugs

If you identify rogue applications or daemons that are frequently looping on poll()/select() or needlessly doing frequent writes to disk then file a bug against the application and also add "The Ubuntu Power Consumption Project" using "Also affects project".

Kernel/PowerManagement/IdentifyingIssues (last edited 2014-03-21 09:51:25 by colin-king)