<> == Preable == To minimise power consumpution applications need to be behaving correctly. Unfortunately some applications are far from perfect and cause unnecessary work for the system by causing frequent CPU wakeups or disk I/O activity. The following describes how to track down issues using tools available in Ubuntu and also some in [[https://launchpad.net/~colin-king/+archive/white|ppa:colin-king/white]] == Wakeups == Applications and daemons generally are not running 100% of the time and can be in an idle state waiting for I/O to complete. Many block (wait) for I/O to complete using poll() or select() and these allow the application to also timeout after a specified amount of time. The application may use the timeout to update some status or just re-do the poll() or select(). Badly behaving applications may wait for a very short time, causing multiple wakeups a second which impacts on the ability of the CPU to stay in a deep idle state. Some applications may even have a zero timeout, which causes a busy loop and never lets the CPU enter an idle state. === Using PowerTop === Identifying rogue processes is a two step process. Firstly, we need to identify tasks that are frequently waking up the system. Tools such as powertop help identify rogue tasks, for Ubuntu 12.04 Precise LTS we recommend using the older version of powertop, install it as follows: {{{sudo apt-get install powertop-1.13}}} and run for 60 seconds as follows: {{{sudo powertop-1.13 -d -t 60 > powertop.log}}} Looking at the powertop.log you will see a section titled "Top causes for wakeups:" and this will list running processes in order of wakeups per second. An excerpt from this is a follows: {{{ 1.4% ( 10.0) gwibber-service 1.4% ( 9.9) ubuntuone-syncd 0.8% ( 5.5) [ata_piix] 0.7% ( 5.0) syndaemon }}} Lets examine ubuntuone-syncd. It appears to be waking up 9.9 times a second, so what is it doing? First we identify its process ID: {{{ ps -e | grep ubuntuone-syncd 2331 ? 00:01:03 ubuntuone-syncd }}} ..and now we then attach strace to that PID as follows: {{{ sudo strace -p 2331 Process 2331 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 read(10, 0x7fffe58ab1e0, 16) = -1 EAGAIN (Resource temporarily unavailable) read(22, 0x285a2c4, 4096) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout) read(10, 0x7fffe58ab1e0, 16) = -1 EAGAIN (Resource temporarily unavailable) read(22, 0x285a2c4, 4096) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout) }}} ..hit control-C to stop strace. The final argument to poll() [see man 2 poll] is the timeout (in milliseconds, and a negative value is in infinite timeout). In this case poll waits for 101 milliseconds and then returns 0, which indicates a timeout occurred and none of the file descriptors were ready for I/O. If one observes this process with strace for a while one notices that the poll is frequently timingout on the 101 millisecond wait, so this is effectively a 9.9Hz polling loop which needs fixing (see bug LP#906924). Applications may also using select() in the same way, with the final argument being being the address of a struct timeval and a return of zero from select() indicating a timeout. See man 2 select for more details. === Using eventstat === Eventstat is specifically designed to find system wakeup events, and by default will dump out all the causes of wakeups every second. It is a lightweight tool and runs well on low-power platforms. One can also specify the period between each measurement and number of measurments to take, for example: {{{ sudo apt-get install eventstat sudo eventstat 10 6 -r results.csv }}} ..this will dump out event causes every 10 seconds 6 times in total and write a report into a comma separated file results.csv The CSV data can be then parsed or imported into tools such as LibreOffice and graphs or reports generated from the captured data. For more information, refer to [[http://kernel.ubuntu.com/~cking/eventstat/|Eventstat]] and the eventstat man page. === Using pidstat === pidstat can be used to find out which processes are context switching the most. Voluntary context switches should be minimised where possible (since the program relinquishes control by itself); involuntary context switches are generally unavoidable if a process is being interrupted. pidstat -w will show the contex switching stats, for example, install and run using: {{{ sudo apt-get install sysstat pidstat -w }}} The cswch/s column reports the number of voluntary context switches and the nvcswch/s column reports the involuntary context switches. == Process Activity == There are many useful tools to monitor process activity and allow one to find heavy CPU utilization, such as top, htop, ps and vmstat. Sometimes a seemingly idle system is busy running lots of very short lived processes that are sometimes hard to spot with these tools. Ubuntu 14.04 LTS introduces the forkstat tool to find and log process activity allowing one to identify these short lived processes. Most are legitimate, however, sometimes a rogue cron job or daemon may be creating many short lived power consuming redundant processes. forkstat example - run fortstat for 40 seconds and show fork/exec/exit stats at end of run: {{{ sudo apt-get install forkstat forkstat -D 40 -S }}} find just coredumps and processes that exit: {{{ forkstat -e core,exit }}} Refer to [[http://kernel.ubuntu.com/~cking/forkstat|forkstat]] or the forkstat man page for more details. == Memory Activity == It is also useful to see if any processes are actively leaking or consuming more memory than expected. Memory can be shared amongst processes, so it makes sense to monitor the shared memory to get a clearer view of fair memory utilisation. For example, consider 128K shared between 2 processes, this is fairly reported by the shared memory allocations as 64K per process. Introduced into Ubuntu 14.10 will be the smemstat tool that can view either the current memory allocations or the change in memory over time. To view the current memory allocations of all the user space processes, use: {{{ sudo apt-get install smemstat sudo smemstat }}} and to view the memory change over time, for example, every 10 seconds, 60 times: {{{ sudo smemstat 10 60 }}} One can even specify which processes to monitor, for example, monitor every 5 seconds firefox and thunderbird: {{{ sudo smemstat -p thunderbird,firefox 5 }}} Refer to [[http://kernel.ubuntu.com/~cking/smemstat/|smemstat]] or the smemstat man page for more details. == Disk Wakeups == Badly behaving applications may also be frequently waking up the disk drive and forcing data to be sync'd to disk using fsync() or sync(). For example, debug logging may be still turned on and frequently writing debug messages. Applications may have unnecessarily opened files with O_SYNC which depedening on the filesystem causes file data and metadata to be written to disk before returning from a writes, this causes disk wakeups and writes where instead data could be held in the buffer cache. The easiest problem to identify is frequent disk I/O from seemingly idle applications. Simply run powertop-1.13 for 60 seconds as follows: {{{sudo powertop-1.13 -d -t 60 > powertop.log}}} Looking at the powertop.log you will see a section titled "Disc accesses:" and this will list processes that were writing to disk and the name of the file and device it was writing to. An excerpt from this is a follows: {{{ Disk accesses: The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite-journ' on /dev/ecryptfs The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite' on /dev/ecryptfs The application 'thunderbird-bin' is writing to file 'INBOX.msf' on /dev/ecryptfs The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5 The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5 The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5 The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5 The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5 The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5 The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5 The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5 }}} Let us explore the 'indicator-weath' writes. The name is truncated because powertop fetches the name from the /proc/$pid/comm field. We need to identify the process ID and the full pathname to the file as follows: {{{ ps -e | grep indicator-weath 2024 ? 00:00:04 indicator-weath }}} {{{ lsof -p 2024 | grep indicator-weather.log indicator 2024 ubuntu 3u REG 8,5 154647 11796492 /home/ubuntu/.cache/indicator-weather.log }}} And since it is a log, we can see any recent activity: {{{ tail -10 /home/king/.cache/indicator-weather.log [Fetcher] 2012-01-16 16:57:52,805 - DEBUG - Weather: night, show 'weather-clear-night' icon [Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: fill in menu with params: city='Gatwick', temp='Temperature: 3 ˚C', humid='Humidity: 65%', wind='Wind: E at 8 mph', sunrise='Sunrise: 07:59:41', sunset='Sunset: 16:22:13', puretemp=3 ˚C [Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: menu_normal: filling in a menu for found places [Fetcher] 2012-01-16 16:57:52,816 - DEBUG - Indicator: update_label: setting label to ' ' [Fetcher] 2012-01-16 16:57:52,821 - DEBUG - Indicator: setting refresh label to 'Refresh (just now)' [Fetcher] 2012-01-16 16:57:52,823 - DEBUG - Indicator: scheduling refresh label update in 1 min [Fetcher] 2012-01-16 16:57:52,833 - DEBUG - Weather: night, show 'weather-clear-night' icon [Fetcher] 2012-01-16 16:57:52,835 - DEBUG - Settings: setting '23387567'='{'humidity': u'Humidity: 65%', 'sunrise': 'Sunrise: 07:59:41', 'sunset': 'Sunset: 16:22:13', 'wind': u'Wind: E at 8 mph', 'humidex': None, 'icon': 'weather-clear-night', 'temper': u'Temperature: 3 \u02daC', 'condition': u'Clear', 'label': '3 \xcb\x9aC'}' [Fetcher] 2012-01-16 16:57:52,836 - DEBUG - Indicator: update_label: setting label to '3 ˚C' [Fetcher] 2012-01-16 16:57:52,837 - DEBUG - Indicator: scheduling update in 15 mins }}} Looking at the log more deeply we can see that the applet is writing DEBUG messages to the log every minute which is keeping the disk drive from being in an idle state, so this needs fixing (e.g. LP#917253). === iotop === iotop is a useful tool for showing I/O activity across the system and helps to identify the top I/O producing processes. To install and run, use: {{{ sudo apt-get install iotop sudo iotop }}} == power-usage-report == [[http://bazaar.launchpad.net/~pitti/fatrace/trunk/view/head:/power-usage-report|power-usage-report]] is a script which calls fatrace for measuring file access activity from programs and powertop-1.13 to measure process and device wakeups. The script generates a report which is appropriate to attach to bug reports. It is now part of fatrace version 0.4 and is availble in Ubuntu 12.04 LTS Precise Pangolin. To install: {{{sudo apt-get install fatrace powertop-1.13}}} and run: {{{sudo power-usage-report}}} The output includes several sections covering disk access statistics (which prevent the disk from spinning down), wakeups (causing CPU power usage) and device activity. The Disk/wakeups are sorted in descending order, ranked by process. {{{ $ sudo power-usage-report Measurement will begin in 5 seconds. Please make sure that the computer is idle, i. e. do not press keys, start or operate programs, and that programs are not busy with active tasks other than the one you want to examine. Starting measurement for 60 seconds... Measurement complete. Generating report... ======= unity-panel-ser: 4 file access events ====== /usr/share/zoneinfo/UTC: 1 reads /etc/timezone: /etc/localtime: 2 reads /usr/share/zoneinfo/GB-Eire: 1 reads ======= deja-dup-monito: 2 file access events ====== /etc/localtime: 2 reads ======= gnome-settings-: 1 file access events ====== /etc/fstab: 1 reads ====== Wakeups ====== 36.2% ( 61.8) [i915] 32.8% ( 55.9) compiz 13.1% ( 22.3) java 3.2% ( 5.5) [ahci] 2.1% ( 3.6) swapper/2 1.1% ( 2.0) [iwlwifi] [...] ====== Devices ====== An audio device is active 100.0% of the time: hwC0D2 Realtek ALC889 A USB device is active 100.0% of the time: /sys/bus/usb/devices/1-1 Recent USB suspend statistics Active Device name 100.0% USB device 1-1.6 : USB Multimedia Keyboard (LITEON Technology) 0.0% /sys/bus/usb/devices/2-1 100.0% /sys/bus/usb/devices/1-1 0.0% USB device usb4 : xHCI Host Controller (Linux 3.3.0-1-generic xhci_hcd) 0.0% USB device usb3 : xHCI Host Controller (Linux 3.3.0-1-generic xhci_hcd) 0.0% USB device usb2 : EHCI Host Controller (Linux 3.3.0-1-generic ehci_hcd) 100.0% USB device usb1 : EHCI Host Controller (Linux 3.3.0-1-generic ehci_hcd) Runtime Device Power Management statistics Active Device name 0.0% 04:00.0 FireWire (IEEE 1394): Texas Instruments XIO2200A IEEE-1394a-2000 Controller (PHY/Link) 0.0% 02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 0.0% 00:1f.2 SATA controller: Intel Corporation Panther Point 6 port SATA Controller [AHCI mode] 0.0% 00:1d.0 USB controller: Intel Corporation Panther Point USB Enhanced Host Controller #1 [...] Devices without runtime PM 03:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge 00:1f.3 SMBus: Intel Corporation Panther Point SMBus Controller 00:1f.0 ISA bridge: Intel Corporation Panther Point LPC Controller 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge Recent audio activity statistics Active Device name 100.0% hwC0D2 Realtek ALC889 100.0% hwC0D3 Intel PantherPoint HDMI Recent SATA AHCI link activity statistics Active Partial Slumber Device name }}} == Deeper Inspection with health-check == [[http://kernel.ubuntu.com/~cking/health-check/|health-check]] monitors processes in various ways to help identify areas where it is consuming too many resources. One can trace one or more processes (including all their threads and child processes too) for a full story of system activity. to install: {{{ sudo apt-get install health-check }}} One can attach to a running process using the -p option and follow all threads using the -f option, for example to do 120 seconds of tracing on compiz: {{{ sudo health-check -f -p compiz -d 120 }}} One also start a process and trace it too: {{{ sudo health-check -u youruserid -f firefox }}} Consult [[http://kernel.ubuntu.com/~cking/health-check/|health-check]] or the health-check manual page for more options. == Deeper Inspection with Cachegrind == Cachegrind simulates the cache interactation with a given program. It simulates a machine with independent first-level instruction (I1) and data caches (D1) and also a unified second-level cache (L2) as found on many modern computers. Cachegrind is tool inside valgrind, to install use: {{{ sudo apt-get install valgrind }}} and to run, use: {{{ valgrind --tool=cachegrind program-name valgrind --tool=cachegrind ps ==4338== Cachegrind, a cache and branch-prediction profiler ==4338== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al. ==4338== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==4338== Command: ps ==4338== --4338-- warning: L3 cache found, using its data for the LL simulation. PID TTY TIME CMD 4191 pts/16 00:00:00 bash 4338 pts/16 00:00:01 cachegrind-amd6 ==4338== ==4338== I refs: 7,827,007 ==4338== I1 misses: 8,054 ==4338== LLi misses: 2,280 ==4338== I1 miss rate: 0.10% ==4338== LLi miss rate: 0.02% ==4338== ==4338== D refs: 2,318,066 (1,554,969 rd + 763,097 wr) ==4338== D1 misses: 5,480 ( 4,470 rd + 1,010 wr) ==4338== LLd misses: 3,645 ( 2,798 rd + 847 wr) ==4338== D1 miss rate: 0.2% ( 0.2% + 0.1% ) ==4338== LLd miss rate: 0.1% ( 0.1% + 0.1% ) ==4338== ==4338== LL refs: 13,534 ( 12,524 rd + 1,010 wr) ==4338== LL misses: 5,925 ( 5,078 rd + 847 wr) ==4338== LL miss rate: 0.0% ( 0.0% + 0.1% ) }}} Note: I = Instruction cache, D = Data cache Many caches misses may indicate that the code could be optimised to be more cache efficient, for example, 2D array re-ordering or try to localise memory accesses in critical highly used data structures. For more in-depth details, refer to the cache-grind [[http://valgrind.org/docs/manual/cg-manual.html|documentation]] == Finding Hot-Spots in Programs == If one finds that a program needs some optimisation then perf is an excellent tool gain some excellent low-level analysis of where all the CPU cycles are being consumed. Install perf from the linux-tools package, one has to select the appropriate package since this is tied to the kernel, for example, for a LG Nexus 4 (mako), use: {{{ sudo apt-get install linux-mako-tools }}} or more generally use: {{{ sudo apt-get install linux-tools }}} Perf is very powerful and can be directed to perform many kinds of deep profiling, so the following section is just a very brief primer to illustrate some of the uses of perf: === Example #1, "tree" report === Record Perf samples at 1000 samples/sec for 1 minute: {{{ sudo perf record -a -g -F 1000 sleep 60 }}} And report once complete: {{{ perf report > report-tree.log }}} === Example #2, No tree recording === {{{ sudo perf record -a -F 1000 sleep 60 perf report > report.log }}} === Example #3, See snapshot of CPU activity in cycles and cache stalls === {{{ sudo perf stat -v -d -p PID }}} .. and press ^C to stop === Example #4, View current activity in real-time === {{{ sudo perf top }}} See also the perf [[https://perf.wiki.kernel.org/index.php/Tutorial|Tutorial]] As said, perf is really useful as spotting where the CPU cycle hot spots are occuring in code, and hence where to try and optimise these code paths. == System Loading == Sometimes it is useful to know how much power a device consumes when fully loaded or for how long it can run on battery before it runs out of power. stress-ng is clean room implementation of the stress utility with plenty of more options and better ways to stress out a machine. Stress-ng is used in the QA lab to measure the power consumed of some of the kernel core components (such as the scheduler). Install stress-ng (currently only in PPA:colin-king/white) {{{ sudo add-apt-repository ppa:colin-king/white sudo apt-get update sudo apt-get install stress-ng }}} Stress CPU, IO, context switching and pipe I/O for 60 seconds on 4 CPUs: {{{ stress-ng -c 4 -d 4 -s 4 -p 4 -t 60 }}} See [[http://kernel.ubuntu.com/~cking/stress-ng|stress-ng]] and the stress-ng man page for more details of all the stress tests available. == Battery Power Monitoring == Sometimes it can be instructive to get a rough idea of how much power a device is consuming by monitoring the battery discharge rate. This is *NOT* recommended for exact measurements as it is based on the derived from the battery gas gauge which can be subject to internal hardware and firmware smoothing, tweaking and re-calibrations. For some analysis on this, refer to [[http://smackerelofopinion.blogspot.co.uk/2012/11/non-linear-characteristics-in-draining.html|this article]] Using powerstat, one can gather reasonable battery data and one is forced to run the test over a fairly long duration for better results. Powerstat is much like vmstat, but shows power consumption stats too. To install and use: {{{ sudo apt-get install powerstat }}} Then ensure the device is not charging and run: {{{ powerstat }}} or to see more process activity, use: {{{ sudo powerstat }}} It is worth saying again that the values are derived from the battery and so are subject to the '''''limited accuracy''''' of the gas gauge in the device. See also [[http://kernel.ubuntu.com/~cking/powerstat/|powerstat]] and the powerstat man page. == Estimating CPU power consumption == It may be useful to get some idea of how much power is consumed when exercising the CPU on some devices. The power-calibrate tool attempts to get an estimate of the power consumed for each 1% of CPU utilisation and for 1 wakeup context switch. power-calibrate exercises the CPU multiple times and makes hundreds of battery capacity based power measurements and computes a linear best fit on the range of CPU load results to calculate the power utilisation. It also calculates the R^2 coefficient of determination, values approaching 1.0 indicate a good linear fit and hence some trust worthy results. The output can be dumped into a JSON formatted file that can be combined with the JSON output from health-check allowing one to roughly estimate the CPU power consumption of an application. For example: {{{ sudo add-apt-repository ppa:colin-king/white sudo apt-get update power-calibrate -c -o calibration.json CPU load User Nice Sys Idle IO Run Ctxt/s IRQ/s Watts Volts Amps 10% x 1 13.2 0.0 2.2 84.5 0.1 1.2 328.0 200.5 0.609 3.771 0.162 20% x 1 25.6 0.0 2.0 72.4 0.0 1.3 408.8 231.0 0.709 3.761 0.189 30% x 1 25.1 0.0 2.4 72.5 0.1 1.4 476.2 255.0 0.753 3.754 0.201 40% x 1 42.5 0.0 2.9 54.7 0.0 1.4 546.9 280.6 0.864 3.744 0.231 50% x 1 36.1 0.0 2.6 61.3 0.0 1.6 612.3 297.9 0.993 3.734 0.266 60% x 1 37.0 0.0 2.2 60.7 0.1 1.5 590.2 319.1 0.962 3.730 0.258 70% x 1 35.4 0.0 0.7 63.9 0.1 1.8 375.8 334.3 1.090 3.721 0.293 80% x 1 40.9 0.0 0.6 58.3 0.2 1.8 348.4 314.9 1.115 3.714 0.300 90% x 1 45.8 0.0 0.6 53.4 0.2 1.9 347.2 344.2 1.255 3.705 0.339 100% x 1 51.0 0.0 0.8 48.1 0.1 2.2 342.9 329.4 1.320 3.699 0.357 Power (Watts) = (% CPU load * 0.018912) + 0.267133 1% CPU load is about 0.018912 Watts (about 5.066016 mA) Coefficient of determination R^2 = 0.758722 (fair) }}} '''NOTE:''' This can take around 1 hour to calibrate. Also, it is worth disabling deep sleep on phone devices using powerd-cli and setting the display to always on and suspending powerd-cli and then running power-calibrate. '''NOTE:''' Depending on the battery and device the values may be fair to good. If one requires accurate data then one has to resort to using a high precision multimeter instead. Once the device is calibrated, one can combine the data with health-check for a rough estimate of power consumption for a particular program's execution. E.g. gather data with health-check: {{{ sudo health-check -f -c -o unity.json -d 60 -p unity8 }}} And then combine these JSON files with the [[http://kernel.ubuntu.com/git?p=cking/power-calibrate.git;a=blob;f=scripts/power-calibrate-json-parse.py|script in the power-calibrate git repository]] as follows: {{{ ./power-calibrate-json-parse.py calibration.json unity.json }}} ..and this computes an '''''estimated power estimate'''''. == Ubuntu Phone Wakelock Detection == Wakelocks on phone devices stop the device going into deep suspend. Deep suspend is the lowest power state except for turning the device off. The suspend-blocker tool can be used to examine wakelock utilisation and can also examine the kernel log to detect reasons for suspend failures or early suspend aborts. To use suspend-blocker: {{{ sudo add-apt-repository ppa:colin-king/white sudo apt-get update sudo apt-get install suspend-blocker }}} === Active Wakelock Detection === suspend-blocker with the -w option is used to see what is keeping device from sleeping. For example, to monitor wakelocks for 120 seconds use: {{{ sudo suspend-blocker -w 120 Wakelock Active Count Expire Wakeup Total Sleep Prevent Name count count count time % time % time % mmc0_detect 0.00 0.00 0.00 0.00 0.00 0.00 0.00 msm_hsic_host 0.00 23.00 0.00 0.00 29.34 29.34 0.00 msm_otg 0.00 0.00 0.00 0.00 100.00 100.00 0.00 pm8921_eoc 0.00 0.00 0.00 0.00 100.00 100.00 0.00 qcom_rx_wakelock 0.00 13.00 13.00 0.00 0.59 0.57 0.00 qmuxd_port_wl_9 0.00 23.00 0.00 0.00 0.00 0.00 0.00 wlan 0.00 4.00 2.00 0.00 2.55 2.54 0.00 Requested test duration: 120.00 seconds, actual duration: 120.11 seconds }}} Note that this was run with USB + adb which blocks deep suspend, hence the msm_otg wakelock is 100% (totally blocking deep suspend). The larger the "Total time%" the greater % time the device has been blocked from deep suspend. We need to aim to reduce these as much as possible. === Kernel Log History === suspend-blocker can also analyse the output from the kernel log to see the past history of wakeup activity, showing blocking wakelocks names, histogram of times between suspend and resume and a list of causes of resume: {{{ sudo add-apt-repository ppa:colin-king/white sudo apt-get update suspend-blocker -b -r -H /var/log/kern.log /var/log/kern.log: Suspend blocking wakelocks: wlan 3 60.00% qcom_rx_wakelock 2 40.00% Resume wakeup causes: Time between successful suspends: 0.000 - 0.124 seconds 73 31.06% 0.125 - 0.249 seconds 0 0.00% 0.250 - 0.499 seconds 3 1.28% 0.500 - 0.999 seconds 0 0.00% 1.000 - 1.999 seconds 152 64.68% 2.000 - 3.999 seconds 2 0.85% 4.000 - 7.999 seconds 2 0.85% 8.000 - 15.999 seconds 0 0.00% 16.000 - 31.999 seconds 1 0.43% 32.000 - 63.999 seconds 2 0.85% Duration of successful suspends: 0.250 - 0.499 seconds 17 7.20% 0.500 - 0.999 seconds 20 8.47% 1.000 - 1.999 seconds 67 28.39% 2.000 - 3.999 seconds 36 15.25% 4.000 - 7.999 seconds 55 23.31% 8.000 - 15.999 seconds 34 14.41% 16.000 - 31.999 seconds 7 2.97% Suspends: 270 suspends aborted (53.36%). 236 suspends succeeded (46.64%). total time: 1071.260801 seconds (73.41%). minimum: 0.399514 seconds. maximum: 28.996670 seconds. mean: 4.539241 seconds. mode: 1 seconds. median: 2 seconds. Time between successful suspends: total time: 388.091768 seconds (26.59%). minimum: 0.000793 seconds. maximum: 63.177278 seconds. mean: 1.651454 seconds. mode: 1 seconds. median: 1 seconds. }}} Refer to [[http://kernel.ubuntu.com/~cking/suspend-blocker/|suspend-blocker]] or the suspend-blocker man page for more details. == CPU utilisation == There are many tools to look at CPU utilisation, such as top, vmstat, etc. cpustat is a very lightweight process monitoring tool that simply reports the per process CPU utilisation in terms of user time, system time and total time. It is less sophisticated than top, but has a smaller memory and CPU footprint, ideal for small devices such as low power phones. To use, install with: {{{ sudo add-apt-repository ppa:colin-king/white sudo apt-get update sudo apt-get install cpustat }}} and run, for example, report CPU utilisation over 60 seconds just once: {{{ sudo cpustat 60 1 %CPU %USR %SYS PID Task 0.72 0.02 0.70 1080 /system/bin/mpdecision 0.35 0.23 0.12 2280 unity8 0.18 0.10 0.08 1450 /usr/lib/upower/upowerd 0.13 0.00 0.13 3893 [kworker/0:2] 0.13 0.00 0.13 3941 [kworker/0:3] 0.13 0.12 0.02 2238 /usr/lib/arm-linux-gnueabihf/indicator-power/indicator-power-service 0.12 0.00 0.12 4040 ./cpustat 0.12 0.08 0.03 726 dbus-daemon 0.10 0.05 0.05 2342 maliit-server 0.08 0.08 0.00 4038 [kworker/u:1] 0.07 0.05 0.02 1345 NetworkManager 0.07 0.03 0.03 1 /sbin/init 0.07 0.00 0.07 133 [mmcqd/0] 0.07 0.00 0.07 914 /lib/systemd/systemd-udevd 0.07 0.00 0.07 1887 init 0.05 0.00 0.05 1066 /sbin/healthd 0.05 0.00 0.05 3140 [kworker/0:1] 0.05 0.00 0.05 2669 [kworker/u:33] 0.03 0.00 0.03 1283 [MC_Thread] 0.03 0.02 0.02 1968 upstart-event-bridge 0.03 0.02 0.02 2016 upstart-dbus-bridge 0.03 0.03 0.00 893 /usr/bin/powerd 0.03 0.00 0.03 1079 /system/bin/thermald 0.03 0.02 0.02 908 upstart-udev-bridge 0.02 0.02 0.00 2250 /usr/lib/arm-linux-gnueabihf/indicator-datetime/indicator-datetime-service 0.02 0.00 0.02 707 rsyslogd 0.02 0.00 0.02 2227 /usr/lib/arm-linux-gnueabihf/indicator-messages/indicator-messages-service 0.02 0.02 0.00 2011 upstart-dbus-bridge 0.02 0.02 0.00 1071 /system/bin/rild 0.02 0.00 0.02 2281 /usr/lib/arm-linux-gnueabihf/indicator-network/indicator-network-service 0.02 0.00 0.02 852 [flush-179:0] 0.02 0.02 0.00 1984 /usr/lib/telepathy/mission-control-5 0.02 0.02 0.00 964 /usr/lib/accountsservice/accounts-daemon 0.02 0.02 0.00 699 upstart-file-bridge }}} Also refer to [[http://kernel.ubuntu.com/~cking/cpustat/|cpustat]] and the cpustat man page. == CPU P and C states == Studying the time the processors spend at different performance states (P states) and different power states (C states) as well as interrupt activity can also help identify if an "idle" machine is actually busier than it should be. Tools such as [[https://wiki.linaro.org/WorkingGroups/PowerManagement/Resources/Tools/Idlestat|idlestat]] (Ubuntu 14.10 onwards) can be helpful to identify busy states. {{{ sudo idlestat --trace -f trace.log -t 60 Total trace buffer: 1846688 kB Log is 60.069154 secs long with 23580 events clusterA@state hits total(us) avg(us) min(us) max(us) C7-IVB 6707 54894275.00 8184.62 0.00 20103.00 core0@state hits total(us) avg(us) min(us) max(us) C1-IVB 1 1048.00 1048.00 0.00 1048.00 C7-IVB 4764 57593327.00 12089.28 0.00 193461.00 cpu0@state hits total(us) avg(us) min(us) max(us) C1-IVB 12 22896.00 1908.00 8.00 7980.00 C1E-IVB 33 16155.00 489.55 12.00 11194.00 C3-IVB 20 6007.00 300.35 69.00 443.00 C7-IVB 3239 58182733.00 17963.18 1.00 592587.00 2501 9 1256644.00 139627.11 1.00 1223175.00 2200 16 4789.00 299.31 1.00 3794.00 2100 7 393.00 56.14 17.00 93.00 2000 7 190.00 27.14 2.00 74.00 1400 38 3564.00 93.79 18.00 585.00 1200 3087 542216.00 175.64 1.00 12382.00 cpu0 wakeups name count irq043 ahci 18 irq045 i915 16 irq009 acpi 40 ... }}} == Filing Bugs == If you identify rogue applications or daemons that are frequently looping on poll()/select() or needlessly doing frequent writes to disk then file a bug against the application and also add [[https://bugs.launchpad.net/ubuntu-power-consumption|"The Ubuntu Power Consumption Project"]] using "Also affects project". == Analysis Techniques == Tracking down issues in entire systems and individual programs may require some careful analysis. Below are some recommendations: === Correlating measurements === One can learn a lot about system activity by gathering data from several sources and correlating the data. For example, gathering system activity data using vmstat, iostat as well as measuring power consumption using a multimeter. To easily compare data, a simple trick is to find the min and max of each data source and normalise the data to a range of 0..1 and then compare data sources by simply plotting the normalised data. This allows one to quickly spot trends, for example the number of interrupts is related to I/O activity and peaks in system CPU load and power consumption. === Gather data multiple times === Gathering data via multimeters or system tools should be run several times against known stable repeatable tests. It is useful to then compare the samples and calculate the standard deviation of a set of results to see how much variation there is in the sample set. === Double check results === Simply put, do the results look sane? Are they repeatable? Never trust a single set of samples. Is there a trend, for example, plotting CPU utilisation against power consumption - is it linear, polynomial or exponential, or do you see lots of variations that mean the data is generally not stable? === Ensure the test environment is "clean" === Sporadic system activity can peturb results, so it is useful to try to create an test environment on the target device that is stable and not busy from other activity (such as cron jobs running). === Assume Nothing === Don't bring any pre-conceived ideas into the results. Don't try and make the results match your pre-conceived ideas - if there are anomalies in the results don't assume they are wrong, they may be hinting at something deeper that one has overlooked.