Testing

Differences between revisions 3 and 14 (spanning 11 versions)
Revision 3 as of 2010-07-06 17:10:15
Size: 366
Editor: 85-210-145-82
Comment:
Revision 14 as of 2012-11-13 18:29:11
Size: 4429
Editor: brad-figg
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
 * [[KernelTeam/InstallonUSBkey]] -- Install ubuntu on USB key, used for testing.
 * [[KernelTeam/KernelCompatibilityTesting]] -- Ubuntu kernel compatibility testing.
=== Power Consumption ===
Line 6: Line 5:
== Calls for Testing (open) == We have 7 tests, each runs a particular kind of way of exercising a system
and each test produces 5 results:
Line 8: Line 8:
 * None  . average current drawn in mA (milli Amps)
 . maximum current drawn in mA
 . minimum current drawn in mA
 . standard deviation (in mA)
 . test duration in second.
Line 10: Line 14:
== Calls for Testing (complete) ==
 * [[KernelTeam/Grub2Testing]] -- testing of the updated grub2 bootloader for x86
Each test runs for either a specified number of power measurements
(which I call "samples") or until a task is completed. Some tests, such
as an idle system run for a fixed time, we are just interested in the
average current drawn over time. However, some tests such as I/O
activity run until a specific amount of data is copied, so this way we
can see if we improve performance at the cost of power consumed. For
example, we may get a job run faster *and* it uses the same power per
second, which means the total power consumed is less too. Or, we may
find a job runs faster and uses more power per second, and so the total
power consumed is more.

Each set of results from a test is based on taking multiple samples
from a high-precision Fluke multimeter and then calculate the average, min, max
and standard deviation for the samples gathered. The standard deviation
allows us to get an idea of how variable the sampled data is, and hence
how confident we are in the accuracy of the results. For example, we
may have a test that produces a bad average result, but we can see the
the standard deviation is high, so we know that the data is not very
accurate for that test.

So, we're really interested in the average measurement as this shows the
average current drawn when running a test. However, it is also good to
track the min and max values to see what the upper and lower bounds are.
We also have the standard deviation which indicates how reliable the
results are for this test. And we also have the duration of the test,
which allows us to see if a test is slowing down or getting faster each
time we run it.

==== dd_copy ====

This copies 262144 MB of data from /dev/zero via cat (twice) and then to
/dev/zero. This exercises the kernel's ability to do context switching
and kernel and user space copying of data. It is a simple but effective
way of exercising efficient memory copying in the kernel and user space
as well as producing a lot of page I/O traffic and it also pushes the
kernel scheduler. It is useful to track how much current is being drawn
*and* the duration of the test since we can see if the copying is
becoming more efficient in terms of current drawn and duration. We also
perhaps scheduler regressions with this.

So this could be classed as a copy + scheduler power/efficiency test.

==== idle_system ====

This measures the current drawn when a system is doing nothing; it is a
standard Ubuntu configuration with a system just doing nothing apart
from occasionally gathering samples which we are measuring. This runs in
total for 295 seconds (60 samples, 5 seconds between each sample). This
allows us to see if new daemon and service activity is causing power
regressions.

==== stress_all ====

This runs the Linux "stress" test loading all the CPUs in busy loops,
loading all the CPUs with I/O sync() calls and loading all the CPUs with
virtual memory activity. This simulates a fully loaded machine. This
runs for 295 seconds (60 samples, 5 seconds between each sample).

==== stress_IO_Sync ====

Like the "stress_all" test but just loads all the CPUs with I/O sync()
calls. Stresses the I/O subsystem. This allows us to see if the I/O
subsystem is regression.

==== stress_VM ====

Like the "stress_all" test but just loads the virtual memory subsystem
on all CPUs. This allows us to see if the kernel vm subsystem is
regressing.

==== stress_CPU ====

Like the "stress_all" test but just loads all the CPUs which run busy
loops computing sqrt() on floating point data. Theoretically, this test
should not change much over time since this relies on just an unchanging
floating point sqrt() instruction and a loop, and gcc won't change this
over time - it is fully optimal. If it does, we know something has
changed in the rest of the system.

==== chromium browser ====

Fires up the chromium browser on the system and uses a browser extension
to simulate a user scrolling through pages and moving to other web
sites with a variety of content.

Power Consumption

We have 7 tests, each runs a particular kind of way of exercising a system and each test produces 5 results:

  • average current drawn in mA (milli Amps)
  • maximum current drawn in mA
  • minimum current drawn in mA
  • standard deviation (in mA)
  • test duration in second.

Each test runs for either a specified number of power measurements (which I call "samples") or until a task is completed. Some tests, such as an idle system run for a fixed time, we are just interested in the average current drawn over time. However, some tests such as I/O activity run until a specific amount of data is copied, so this way we can see if we improve performance at the cost of power consumed. For example, we may get a job run faster *and* it uses the same power per second, which means the total power consumed is less too. Or, we may find a job runs faster and uses more power per second, and so the total power consumed is more.

Each set of results from a test is based on taking multiple samples from a high-precision Fluke multimeter and then calculate the average, min, max and standard deviation for the samples gathered. The standard deviation allows us to get an idea of how variable the sampled data is, and hence how confident we are in the accuracy of the results. For example, we may have a test that produces a bad average result, but we can see the the standard deviation is high, so we know that the data is not very accurate for that test.

So, we're really interested in the average measurement as this shows the average current drawn when running a test. However, it is also good to track the min and max values to see what the upper and lower bounds are. We also have the standard deviation which indicates how reliable the results are for this test. And we also have the duration of the test, which allows us to see if a test is slowing down or getting faster each time we run it.

dd_copy

This copies 262144 MB of data from /dev/zero via cat (twice) and then to /dev/zero. This exercises the kernel's ability to do context switching and kernel and user space copying of data. It is a simple but effective way of exercising efficient memory copying in the kernel and user space as well as producing a lot of page I/O traffic and it also pushes the kernel scheduler. It is useful to track how much current is being drawn *and* the duration of the test since we can see if the copying is becoming more efficient in terms of current drawn and duration. We also perhaps scheduler regressions with this.

So this could be classed as a copy + scheduler power/efficiency test.

idle_system

This measures the current drawn when a system is doing nothing; it is a standard Ubuntu configuration with a system just doing nothing apart from occasionally gathering samples which we are measuring. This runs in total for 295 seconds (60 samples, 5 seconds between each sample). This allows us to see if new daemon and service activity is causing power regressions.

stress_all

This runs the Linux "stress" test loading all the CPUs in busy loops, loading all the CPUs with I/O sync() calls and loading all the CPUs with virtual memory activity. This simulates a fully loaded machine. This runs for 295 seconds (60 samples, 5 seconds between each sample).

stress_IO_Sync

Like the "stress_all" test but just loads all the CPUs with I/O sync() calls. Stresses the I/O subsystem. This allows us to see if the I/O subsystem is regression.

stress_VM

Like the "stress_all" test but just loads the virtual memory subsystem on all CPUs. This allows us to see if the kernel vm subsystem is regressing.

stress_CPU

Like the "stress_all" test but just loads all the CPUs which run busy loops computing sqrt() on floating point data. Theoretically, this test should not change much over time since this relies on just an unchanging floating point sqrt() instruction and a loop, and gcc won't change this over time - it is fully optimal. If it does, we know something has changed in the rest of the system.

chromium browser

Fires up the chromium browser on the system and uses a browser extension to simulate a user scrolling through pages and moving to other web sites with a variety of content.

Kernel/Testing (last edited 2012-11-13 18:29:11 by brad-figg)