ARMValidationDashboard

Differences between revisions 11 and 18 (spanning 7 versions)
Revision 11 as of 2010-05-31 11:20:02
Size: 17030
Editor: fdu90
Comment: Edited user stories after feedback from Scott
Revision 18 as of 2010-06-04 12:08:03
Size: 22778
Editor: fdu90
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
As a part of the automated testing efforts on ARM we need a dashboard interface for visualizing the current state of the image. This interface must allow the user to see, at a glance, the state of functional tests as well as performance tests, as well as other useful data that is described in more detail below. As a part of the automated testing efforts on ARM we need a dashboard interface for visualizing the current state of the image. This interface must allow the user to see, at a glance, the state of functional tests, performance tests, as well as other useful data that is described in more detail below. Please note that each particular test is beyond the scope of this specification. This specification is only concerned with the infrastructure that allows to deploy a centralized test submission, processing and presentation web application.
Line 13: Line 13:
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-automated-testing-framework
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-ui-and-test-heads
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-webkit-and-browser-performance
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-image-building-tool
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-testsuites-and-profilers
 * https://blueprints.edge.launchpad.net/ubuntu/+spec/arm-m-validation-dashboard (this one)
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-automated-testing-framework
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-ui-and-test-heads
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-webkit-and-browser-performance
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-image-building-tool
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-testsuites-and-profilers
 * https://blueprints.launchpad.net/ubuntu/+spec/arm-m-validation-dashboard (this one)
Line 30: Line 30:
 1. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release.

Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified bugs being targeted for the upcoming milestone).

 1. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold.

Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.

 1. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted.

Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure.

 1. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system
.

Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.

 1. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team.

David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.

== Assumptions ==

 * Easy to deploy, including, outside of Canonical. All infrastructure components must be packaged and provided as a PPA, ready to install on a Lucid server. All device components must be packaged and uploaded to Maverick (TODO: which component? are we going straight-to-ppa or do we attempt to hit the main archive?)
 * Focused on one device.
   FIXME - is this _really_ true? - if not how to avoid multiple devices and benchmarks
   There seems to be a conflict of interests - we'd like to see this for >1 kind of device but vendors will not enjoy
   any device benchmark information being displayed, especially alongside competing devices
 * One-way connectivity. Devices participating in the test can connect to the infrastructure services but reverse connection cannot be assumed. (TODO: what about IPv6?)
 * Distributed environment. Devices and infrastructure components are places in diverse geographical and administrative zones.
 * Launchpad integration for bug management. There is no plan to support third party bug trackers in the first release.
 1. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release. Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified bugs being targeted for the upcoming milestone).

 1. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold. Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.

 1. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted. Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure.

 1. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system. Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.

 1. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team. David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.
Line 63: Line 42:

TODO:
 * UI design for each core use case (TODO)
   (we want a list of tasks users have to perform to get to the goal they are after (with regard to the use case lists above)
 * UI design vs UI design of other Canonical technologies (TODO)
 * Design web pages we need to provide (DONE)

Dashboard will feature the following pages/views:

=== Project Timeline ===
Recurring component of each page, shown at the top.
Key aspects:
 * Shows milestones
 * Shows number of days till next milestone
 * Shows number of days/weeks? till final milestone
 * Allows to click at a past day to see historical records

Project timeline could also hold global image/project properties action menu:
 * Edit (add/remove/modify) test suites and benchmarks

=== Day Overview ===

Main view contains daily summary of key aspects influencing upcoming release.
This is also the default view for the application. The page contains the following components:
 * packages
   * summary indicators, mostly numbers and links for detail pages)
     (could be a horizontal bar like in test cases below)
     * total number and details link
     * newly added
     * modified (version change)
     * packages that failed to build
   * action links:
     * see package details
 * test cases
   * total tests suites and test cases
   * progress indicator: horizontal bar with the following components
     * skipped tests
     * successful tests
     * failed tests
     * pending tests (there is no indicator of a 'running' test)
     * (never ran tests) - this is optional and will be displayed for historic entries, not the current day
   * action links:
     * see all tests (details)
     * edit skipped tests [optional]
 * benchmarks
   * selected benchmark results (value + spark line)
     * synthetic benchmarks
       * CPU
       * FPU
       * GPU (if possible)
       * IO:
         * USB thumb drive
         * USB 2.5" HDD
         * SD card
         * Network
         * NAND [optional]
     * end-user / application benchmarks
        * time to boot
        * time to website/cached
        * time to ... (etc)
   * notable changes (value, spark line, delta)
     (things that are not included by default but have changed radically since last test)
   * action links:
     * see all benchmarks (details)
     * define selected items
 * devices
   * all devices we have at our disposal
     * percentage of time devoted to:
       * running tests
       * being 'owned' by someone
       * being idle
       * being offline
 * bugs [optional]
   * all bugs filed yesterday that affect this image
     (could use specific tag, project to detect)

=== Image Details ===

This page can be reached from the 'daily overview' pages. It should contain basic information
about all packages that were used to build the image. If possible each package should be a link to a launchpad page. For packages that are not tracked by launchpad (PPAs, custom packages) and to packages that are a part of a private PPA no link will be provided.

[Optional]. This view could also provide a diff against any other day. Such difference views could be easy accessible from the project timeline.

=== Test Suite Details ===

This page can be reached from test suite list and test day overview (selected suites)

Test suite details provides access to the following components:
 * Test suite URL: bzr branch that contains this test suite
 * Description
 * List of test cases
 * Summary of historical results (pass/fail/skipped)

Actions:
 * Disable whole suite for specific hardware
 * Disable specific tests for specific hardware

=== Test Case Details ===

This page can be reached from test suite details.

Test case details provides access to the following components:
 * Historical results (pass/fail/skip)
 * Preview of the relevant part of the log file that was harvested to get this result.

=== Benchmark Details ===

TODO.

Mostly similar to test suite, except for some presentation differences.

=== Benchmark probe (single benchmark item) Details ===
The dashboard has the following core concepts:
 * software image (set of packages shipped together)
 * test collection (scripts, programs and sources that constitute a test)
 * test collection run (act of running tests on device)
 * test result (single pass/fail result within a test collection run)
 * performance measurement (single quantitative measurement within a test collection run)
 * performance metric/probe (a well-defined concept that can be used to compare performance measurements)

Dashboard is designed around the concept of test collections and test collection runs. Typical/expected tests include group of unit tests for a major library, a stand-alone test suite designed to check compliance or correctness of some APIs or existing (including binary-only) application scripted to perform some scenarios. Each test collection run (abbreviated to test run from now on) is performed on specific device (computer). The result of that run is a tree of log files. Log files are uploaded to the 'gateway' component of the dashboard for storage.

The second major concept is log file analysis. Each test collection has a log processing script that is designed to understand the format of the log files and translate them to one of two entities:
 * pass/fail test result
 * performance measurement and associated performance metric (probe)
All data that is displayed by the dashboard can be traced back to a log file. The system preserves this information for credibility and assistance in manual analysis.

Pass/fail test results are simple to understand, they are an indication of some test that succeeded or failed. The identity of such tests is not maintained. That is, it is not possible to compare two test runs and see if the same pass/fail test succeeded in both automatically. This limitation is made by design.

In contrast performance measurements always need a performance metric to be meaningful. This allows to define metrics in the system, compare them across time, hardware and other factors. Metric also designates the units of each measurements. The units may be time, bytes/second, pixels/second or any other, as required by particular use case.

This decision is based on an assumption that typical qualitative (pass/fail) tests are far more numerous than quantitative tests (benchmarks) and maintaining identity support in the log processors would be an additional effort with little gain.

== Implementation ==

=== Components ===

Dashboard is a collection of components that are maintained together. Those components are:
 * Dashboard, web application for user interaction (frontend)
 * Backend, application logic server with XML-RPC APIs for interaction with others (backend)
 * Data Gateway, custom FTP service for uploading and downloading files (gateway)
 * Log Analyzer, sandboxed process for analyzing uploaded files (analyzer)
 * SQL database, database of all shared application state
 * Python APIs for talking to various parts of the service
 * Command line tools for manipulating the system, registering tests, registering test runs, etc

The picture below shows how those components look like together in full deployment scenarios.

{{drawing:QADashboardArchitecture}}

=== Limitations ===

 * Python 2.5+ required
 * Django 1.1+ required
 * PostgreSQL 8.4+ required
 * Deployment supported on Ubuntu Server 10.10+
 * One-way connectivity sufficient for working correctly.
 * IPv6 not supported officially but may work (no additional IPv6 code required)

=== Ubuntu Package Details ===

The launch control project is separated into the following packages:
 * launch-control
   A meta-package that depends on all components of the launch control suite.
 * launch-control-dashboard
   Web front-end for the whole application (the actual dashboard).
 * launch-control-backend
   Back-end for the whole application (database and application logic)
 * launch-control-data-gateway
   Data gateway service (for dropping test results from devices)
 * launch-control-log-analyzer
   Log analysis service.
 * launch-control-tools
   Command line tools for manipulating a launch-control installation.
 * launch-control-common
   Private APIs and other common files for the whole suite.
 * python-launch-control
   Public APIs wrapped as a python library.

=== Component Details ===

==== Dashboard Web Application ====

 * Front-end of the system
 * Allows to browse projects
   * List with pagination
 * Allows to browse project releases
   * requires project context
 * Allows to browse development snapshots / software images
   * requires project release context
   * shows all software images recorded in the system
   * link to software profile manifest for each image
 * Allows to browse test collections
   * shows basic information:
     * origin/url
     * license
     * shows capabilities (test/benchmarks/others)
   * links to test collection run viewer
 * Allows to browse test collection runs (acts of running the test somewhere)
   * search/filter by:
     * project (e.g. Linaro, Ubuntu)
     * image version (e.g "Linaro 10.11, 2010-07-12-01", Linaro 10.11 release, built on the 12 of June 2010, first image for that day)
     * specific device class (e.g. Beagle Board)
     * specific device (e.g. Bob's Beagle Board)
     * submitter (e.g. Bob, anonymous)
     * software software profile property (e.g. libc version=1.2.3)
     * specific hardware class property (e.g. memory=512MB)
 * Allows to display information for specific test collection run:
   * display basic information about test collection run:
     * image version (date-time + serial for date)
     * software profile (packages and versions)
     * hardware profile (various bits)
     * test device (if registered)
     * submitter (if registered)
   * display all failed tests:
     * with references from log file
   * display all successful tests:
     * with AJAX'ed references to log file
     * hidden by default (summary view only)
  * display all benchmark measurements:
     * when in hardware context:
       * show baseline for this hardware (if any)
       * highlight when deviates
     * when in project context:
       * show baseline for this project (if any)
       * highlight when deviates
     * when in image version/history context:
       * show results (y-axis) across image version (x-axis)
       * when in package context:
         * show specific package version (x-axis)
 * Allows to show aggregate results of certain test runs:
   * select results matching:
     * test collection (e.g. Linux Test Suite)
     * software image version
     * hardware device class (e.g. Beagle Board)
     * hardware device (e.g. Bob's Beagle Board)
     * package version (e.g. libc6 v=1.2.3)
   * for non-test benchmarks:
     * show pass/fail counts
     * with options to aggregate
   * for each probe in all benchmarks:
     * show results across software image versions/time
     * show additional data series for:
       * different device class/hardware profile
 * Allows to show image 'health' summary:
   * Test failures
   * Package build failures
   * Benchmarks deviated from baseline
   * Unresolved bugs targeting upcoming milestone
   * Unfinished work items targeting upcoming milestone

==== Backend Service ====

 * Back-end of the application
 * Shares database with the dashboard
 * Exposes log submission interface:
   * setupSubmission(device_uuid, type, size):
     * takes arguments:
       * device_uuid - ID of the device
       * type one of:
         * LOG_SUBMISSION
         * LOG_ANALYSIS [optional]
         * SW_IMAGE_MANIFEST
         * HW_PROFILE
         * SW_PROFILE
       * size - size of the submission in bytes
     * returns:
        * submission_id
        * submission_URL - FTP URL where the files can be uploaded
     * may raise exception:
        * InvalidDevice
        * InvalidType
        * NotEnoughSpace
     * on success:
        * asks the gateway to prepare submission directory and give write access to device_uuid
   * completeSubmission(device_uuid, submission_id)
     * takes arguments:
        * device_uuid - ID of the device
        * submision_id - as obtained from setupSubmission()
     * does not return anything
     * may raise exception:
        * InvalidDevice
        * InvalidSubmission
     * on success:
        * asks the gateway to mark the submission directory read only
        * if type == LOG_SUBMISSION:
          * schedules log for processing
 * Exposes queue interface for taking log processing jobs:
   * getNextAnalysisJob(job_server_name, job_server_key):
     * takes arguments:
       * job_server_name - hostname of the job server - informative purpose only
       * job_server_key - shared secred of the job server
     * returns:
       * submission_id - id of the submission to analyze
       * analysis_id - unique to this request
     * associates submission with job server
     * changes status submission status to busy (by storing analysis_id)
     * sets a timeout to return results based on average processing time for same test collection [optinal]
   * completeAnalysisJob(analysis_id, status):
     * takes arguments:
       * analysis_id = same job ID that was obtained from getNextJob
       * status = Finished | Failed
     * does not return anything
     * may raise exception:
       * ValueError - wrong id or status
     * on success:
       * job processing results are available in shared storage
       * processing results are loaded into the database
 * Exposes provisioning interface:
   * configureNewDevice():
     * does not take any arguments
     * returns:
       * device_uuid - freshly assigned to this device
     * on success:
       * sets provisioning status of that device to INCOMPLETE
   * updateHardwareProfile(device_uuid, submission_id):
     * takes arguments:
       * device_uuid - id of the device
       * submission_id - id of the submission
     * does not return anything
     * may raise exception:
       * InvalidDevice
       * InvalidSubmission
     * on success:
       * harvests basic profile information from the log file
       * recalculates device information for test scheduler
       * updates device and
   * updateSoftwareProfile(device_uuid, submission_id):
     * takes arguments:
       * device_uuid - id of the device
       * submission_id - id of the submission
     * does not return anything
     * may raise exception:
       * InvalidDevice
       * InvalidSubmission
     * on success:
       * harvests basic profile information from the log file
       * recalculates device information for test scheduler
       * sets provisioning status of that device to COMPLETE
 * Exposes scheduling interface for automatic test requests:
   * getNextTestJob(device_uuid):
     * takes arguments:
       * device_uuid - id of the device
     * returns:
       * test name - well-known name of the test collection to run
     * may raise exception:
       * NothingToDo - no activity required, sleep for one hour
       * DeviceNotProvisioned - device is not provisioned yet

==== Data Gateway Service ====

 * Implemented as a FTP daemon
   * files stored in designated tree (/srv/launch-control/gateway)
   * uses http://code.google.com/p/pyftpdlib/ for FTP
 * Management service for talking with the backend and reconfiguring the ftp service
 * Authenticates using:
   * device UUID, submission ID
   * analysis server account/password
 * Allows uploading submissions/files
 * Allows downloading files for analysis

==== Log Analyzer Service ====

 * Bach processing system for analyzing submitted log files
   * talks to the back-end to get things to do
   * talks to the data gateway to access logs
 * Runs log analyzers on submitted log files
   * Updates the result database via internal database link [variant-1]
   * Produces standardized result format document and uploads it to the data gateway [variant-2]
 * Runs inside a sandbox/chroot [optional]
 * Runs on additional compute nodes [optional]

==== Database model ====
Line 178: Line 306:
Mostly similar to test case, except for some presentation differences.

== Implementation ==

Choose basic web technology: DONE (django)
Choose database system: DONE (PSQL)
Design data model (including data ingest requirements, data presentation requirements, data transformations and storage): TODO
Design web widgets/pieces/components that we will need and determine how each fits into the data model: TODO? (is this required in the spec?)

=== Architecture Overview ===

{{drawing:QADashboardArchitecture}}

The dashboard is a part of a larger set of blueprints/projects that together provide the QA infrastructure. The main components are:
 * Automatic Test Framework
 * Test Cases (dealing with providing actual tests)
 * WebKit testing/benchmarking
 * Dashboard:
   * frontend - web pages, interaction with launchpad, etc
   * backend - database model, processes, interactions with other components

==== Database Model ====

Make database model image: TODO

=== UI Changes ===

Should cover changes required to the UI, or specific UI that is required to implement this

=== Code Changes ===

Code changes should include an overview of what needs to change, and in some cases even the specific details.
==== Python APIs ====

TODO

==== Command line tools ====

 * Called launch-control-tool
 * usable on the device and for debugging
 * interface based on commands with options (like bzr)
 * commands for provisioning devices:
   * configure-new-device
   * update-software-profile <device-uuid> <submission-id>
   * update-hardware-profile <device-uuid> <submission-id>
 * commands for talking with the gateway:
   * setup-submission <device-uuid> <type>
   * complete-submission <device-uuid> <submission-id>
 * commands for requesting test jobs:
   * get-next-test-job <device-uuid>
 * commands for log analysis service:
   * get-next-analysis-job <node-id> <node-secret>
   * complete-analysis-job <node-id> <node-secret> <job-id> <status> [optional for variant 2 <submission-id>]
 * commands for integration with image builder
   * ingest-software-image-manifest <project-name> <release-name> <image-id> <manifest-file>
Line 214: Line 333:

== Test/Demo Plan ==

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

== Unresolved issues ==

* Do we roll our own technology or do we adapt existing frameworks (like Hudson)
* How and where do we store the user database?
  We need at least two types of users: viewers and editors. Editors could alter per-image/per-device state such as which tests should not be run. Editors could add more test suites, news, milestones etc. This could be, to a certain degree, avoided by pulling all non-volatile non-test information from external sources (feeds/rsr).
* How do we provision devices (bind particular instance of the dashboard with a particular device and ensure it can be identified across upgrades/reflashes/etc)?
* Where is the test scheduler (is it in this spec or in the automated test framework spec)

Summary

As a part of the automated testing efforts on ARM we need a dashboard interface for visualizing the current state of the image. This interface must allow the user to see, at a glance, the state of functional tests, performance tests, as well as other useful data that is described in more detail below. Please note that each particular test is beyond the scope of this specification. This specification is only concerned with the infrastructure that allows to deploy a centralized test submission, processing and presentation web application.

This specification is a part of a larger project, see other blueprints for reference:

Release Note

No user visible changes

Rationale

We need to easily see how various development efforts are affecting the image over time. A dashboard interface helps us to visualize, in one place, the results of running tests on multiple machines. The dashboard can also display results of performance measurements across different image build dates to allow developers quickly see how their efforts are affecting performance. Targets and baselines can be set for any performance metric so that it is possible to detect deviations and track goals.

User stories

  1. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release. Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified bugs being targeted for the upcoming milestone).
  2. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold. Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.
  3. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted. Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure.
  4. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system. Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.
  5. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team. David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.

Design

The dashboard has the following core concepts:

  • software image (set of packages shipped together)
  • test collection (scripts, programs and sources that constitute a test)
  • test collection run (act of running tests on device)
  • test result (single pass/fail result within a test collection run)
  • performance measurement (single quantitative measurement within a test collection run)
  • performance metric/probe (a well-defined concept that can be used to compare performance measurements)

Dashboard is designed around the concept of test collections and test collection runs. Typical/expected tests include group of unit tests for a major library, a stand-alone test suite designed to check compliance or correctness of some APIs or existing (including binary-only) application scripted to perform some scenarios. Each test collection run (abbreviated to test run from now on) is performed on specific device (computer). The result of that run is a tree of log files. Log files are uploaded to the 'gateway' component of the dashboard for storage.

The second major concept is log file analysis. Each test collection has a log processing script that is designed to understand the format of the log files and translate them to one of two entities:

  • pass/fail test result
  • performance measurement and associated performance metric (probe)

All data that is displayed by the dashboard can be traced back to a log file. The system preserves this information for credibility and assistance in manual analysis.

Pass/fail test results are simple to understand, they are an indication of some test that succeeded or failed. The identity of such tests is not maintained. That is, it is not possible to compare two test runs and see if the same pass/fail test succeeded in both automatically. This limitation is made by design.

In contrast performance measurements always need a performance metric to be meaningful. This allows to define metrics in the system, compare them across time, hardware and other factors. Metric also designates the units of each measurements. The units may be time, bytes/second, pixels/second or any other, as required by particular use case.

This decision is based on an assumption that typical qualitative (pass/fail) tests are far more numerous than quantitative tests (benchmarks) and maintaining identity support in the log processors would be an additional effort with little gain.

Implementation

Components

Dashboard is a collection of components that are maintained together. Those components are:

  • Dashboard, web application for user interaction (frontend)
  • Backend, application logic server with XML-RPC APIs for interaction with others (backend)
  • Data Gateway, custom FTP service for uploading and downloading files (gateway)
  • Log Analyzer, sandboxed process for analyzing uploaded files (analyzer)
  • SQL database, database of all shared application state
  • Python APIs for talking to various parts of the service
  • Command line tools for manipulating the system, registering tests, registering test runs, etc

The picture below shows how those components look like together in full deployment scenarios.

Limitations

  • Python 2.5+ required
  • Django 1.1+ required
  • PostgreSQL 8.4+ required
  • Deployment supported on Ubuntu Server 10.10+
  • One-way connectivity sufficient for working correctly.
  • IPv6 not supported officially but may work (no additional IPv6 code required)

Ubuntu Package Details

The launch control project is separated into the following packages:

  • launch-control
    • A meta-package that depends on all components of the launch control suite.
  • launch-control-dashboard
    • Web front-end for the whole application (the actual dashboard).
  • launch-control-backend
    • Back-end for the whole application (database and application logic)
  • launch-control-data-gateway
    • Data gateway service (for dropping test results from devices)
  • launch-control-log-analyzer
    • Log analysis service.
  • launch-control-tools
    • Command line tools for manipulating a launch-control installation.
  • launch-control-common
    • Private APIs and other common files for the whole suite.
  • python-launch-control
    • Public APIs wrapped as a python library.

Component Details

Dashboard Web Application

  • Front-end of the system
  • Allows to browse projects
    • List with pagination
  • Allows to browse project releases
    • requires project context
  • Allows to browse development snapshots / software images
    • requires project release context
    • shows all software images recorded in the system
    • link to software profile manifest for each image
  • Allows to browse test collections
    • shows basic information:
      • origin/url
      • license
      • shows capabilities (test/benchmarks/others)
    • links to test collection run viewer
  • Allows to browse test collection runs (acts of running the test somewhere)
    • search/filter by:
      • project (e.g. Linaro, Ubuntu)
      • image version (e.g "Linaro 10.11, 2010-07-12-01", Linaro 10.11 release, built on the 12 of June 2010, first image for that day)
      • specific device class (e.g. Beagle Board)
      • specific device (e.g. Bob's Beagle Board)
      • submitter (e.g. Bob, anonymous)
      • software software profile property (e.g. libc version=1.2.3)
      • specific hardware class property (e.g. memory=512MB)
  • Allows to display information for specific test collection run:
    • display basic information about test collection run:
      • image version (date-time + serial for date)
      • software profile (packages and versions)
      • hardware profile (various bits)
      • test device (if registered)
      • submitter (if registered)
    • display all failed tests:
      • with references from log file
    • display all successful tests:
      • with AJAX'ed references to log file
      • hidden by default (summary view only)
    • display all benchmark measurements:
      • when in hardware context:
        • show baseline for this hardware (if any)
        • highlight when deviates
      • when in project context:
        • show baseline for this project (if any)
        • highlight when deviates
      • when in image version/history context:
        • show results (y-axis) across image version (x-axis)
        • when in package context:
          • show specific package version (x-axis)
  • Allows to show aggregate results of certain test runs:
    • select results matching:
      • test collection (e.g. Linux Test Suite)
      • software image version
      • hardware device class (e.g. Beagle Board)
      • hardware device (e.g. Bob's Beagle Board)
      • package version (e.g. libc6 v=1.2.3)
    • for non-test benchmarks:
      • show pass/fail counts
      • with options to aggregate
    • for each probe in all benchmarks:
      • show results across software image versions/time
      • show additional data series for:
        • different device class/hardware profile
  • Allows to show image 'health' summary:
    • Test failures
    • Package build failures
    • Benchmarks deviated from baseline
    • Unresolved bugs targeting upcoming milestone
    • Unfinished work items targeting upcoming milestone

Backend Service

  • Back-end of the application
  • Shares database with the dashboard
  • Exposes log submission interface:
    • setupSubmission(device_uuid, type, size):
      • takes arguments:
        • device_uuid - ID of the device
        • type one of:
          • LOG_SUBMISSION
          • LOG_ANALYSIS [optional]
          • SW_IMAGE_MANIFEST
          • HW_PROFILE
          • SW_PROFILE
        • size - size of the submission in bytes
      • returns:
        • submission_id
        • submission_URL - FTP URL where the files can be uploaded
      • may raise exception:
      • on success:
        • asks the gateway to prepare submission directory and give write access to device_uuid
    • completeSubmission(device_uuid, submission_id)
      • takes arguments:
        • device_uuid - ID of the device
        • submision_id - as obtained from setupSubmission()
      • does not return anything
      • may raise exception:
      • on success:
        • asks the gateway to mark the submission directory read only
        • if type == LOG_SUBMISSION:
          • schedules log for processing
  • Exposes queue interface for taking log processing jobs:
    • getNextAnalysisJob(job_server_name, job_server_key):
      • takes arguments:
        • job_server_name - hostname of the job server - informative purpose only
        • job_server_key - shared secred of the job server
      • returns:
        • submission_id - id of the submission to analyze
        • analysis_id - unique to this request
      • associates submission with job server
      • changes status submission status to busy (by storing analysis_id)
      • sets a timeout to return results based on average processing time for same test collection [optinal]
    • completeAnalysisJob(analysis_id, status):
      • takes arguments:
        • analysis_id = same job ID that was obtained from getNextJob
        • status = Finished | Failed
      • does not return anything
      • may raise exception:
      • on success:
        • job processing results are available in shared storage
        • processing results are loaded into the database
  • Exposes provisioning interface:
    • configureNewDevice():
      • does not take any arguments
      • returns:
        • device_uuid - freshly assigned to this device
      • on success:
        • sets provisioning status of that device to INCOMPLETE
    • updateHardwareProfile(device_uuid, submission_id):
      • takes arguments:
        • device_uuid - id of the device
        • submission_id - id of the submission
      • does not return anything
      • may raise exception:
      • on success:
        • harvests basic profile information from the log file
        • recalculates device information for test scheduler
        • updates device and
    • updateSoftwareProfile(device_uuid, submission_id):
      • takes arguments:
        • device_uuid - id of the device
        • submission_id - id of the submission
      • does not return anything
      • may raise exception:
      • on success:
        • harvests basic profile information from the log file
        • recalculates device information for test scheduler
        • sets provisioning status of that device to COMPLETE
  • Exposes scheduling interface for automatic test requests:
    • getNextTestJob(device_uuid):
      • takes arguments:
        • device_uuid - id of the device
      • returns:
        • test name - well-known name of the test collection to run
      • may raise exception:

Data Gateway Service

  • Implemented as a FTP daemon
  • Management service for talking with the backend and reconfiguring the ftp service
  • Authenticates using:
    • device UUID, submission ID
    • analysis server account/password
  • Allows uploading submissions/files
  • Allows downloading files for analysis

Log Analyzer Service

  • Bach processing system for analyzing submitted log files
    • talks to the back-end to get things to do
    • talks to the data gateway to access logs
  • Runs log analyzers on submitted log files
    • Updates the result database via internal database link [variant-1]
    • Produces standardized result format document and uploads it to the data gateway [variant-2]
  • Runs inside a sandbox/chroot [optional]
  • Runs on additional compute nodes [optional]

Database model

TODO

Python APIs

TODO

Command line tools

  • Called launch-control-tool
  • usable on the device and for debugging
  • interface based on commands with options (like bzr)
  • commands for provisioning devices:
    • configure-new-device
    • update-software-profile <device-uuid> <submission-id>

    • update-hardware-profile <device-uuid> <submission-id>

  • commands for talking with the gateway:
    • setup-submission <device-uuid> <type>

    • complete-submission <device-uuid> <submission-id>

  • commands for requesting test jobs:
    • get-next-test-job <device-uuid>

  • commands for log analysis service:
    • get-next-analysis-job <node-id> <node-secret>

    • complete-analysis-job <node-id> <node-secret> <job-id> <status> [optional for variant 2 <submission-id>]

  • commands for integration with image builder
    • ingest-software-image-manifest <project-name> <release-name> <image-id> <manifest-file>

Migration

Currently there is no direct migration plan. Things we could consider is migrating some bits and pieces of technology already up and running either at Canonical or somewhere else in the community/other parties that is open source and integrate their tests into our framework. If that becomes true we might want to look at migration from qa-tool-foo to our new technology.

BoF agenda and discussion

Goal: define visualization interface for QA control that shows daily/snapshot/other summary of the 'health' of the image for a given platform

Different images based on a common base:

  • server
  • desktop
  • netbookcurrent

Stuff we want to show:

  • difference from yesterday
  • performance history
  • performance regressions
  • performance targets (as infrastructure to see if it works during the cycle)

Dashboard mockup:

  • Two columns: 1)
    • - Current build status
      • FTBFS count New package count, number of packages Latest build date/time
      - Test result - Build history
    2)
    • - News - Performance Targets

Q: What about some UI for scheculing test runs: A: we're not targeting this for the first release but we want have a UI for doing that in the future

Q: How does our project relate to other ubuntu QA projects A:

Stuff to check:

  • buildbot (python) hudson (java)

Action item: check hudson out? Zygmunt Krynicki Hudson instance for Bzr at canonical: http://babune.ladeuil.net:24842/view/Ubuntu/job/selftest-jaunty/buildTimeTrend

We want to store the log file of each test run just in case (for unexpected successes)


CategorySpec

Specs/M/ARMValidationDashboard (last edited 2010-06-04 12:08:03 by fdu90)