UnifiedSystemMonitoring

Differences between revisions 2 and 3
Revision 2 as of 2007-04-24 14:45:27
Size: 6694
Editor: d213-103-71-150
Comment:
Revision 3 as of 2007-04-24 15:10:48
Size: 4214
Editor: d213-103-71-150
Comment: added some use cases
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
     * Bogdan is intermittently annoyed by sound stuttering, lack of responsiveness and other performance issues. However, by the time he opens a console and runs ''top'', or he opens the System Monitor, the problem disappears. (These operations naturally take more time when the system is loaded.) However, he can just rewind the history a bit because the System Monitor has access to detailed logs of '''what happened during the last few hours'''.
Line 29: Line 29:
  * Bob is the maintainer for the boot process for Ubuntu. In the Dapper cycle, he would like to work on getting the boot time down to two seconds from boot manager to GDM screen. He creates an entry for the specification in Launchpad, proposes it for the UBZ sprint, and starts writing out a braindump of it in the Ubuntu wiki. Magnus, who is in charge of UBZ scheduling, thinks it sounds fishy but approves it to make sure that the change is discussed and documented properly. He marks it as priority Medium because he isn't sure Bob will have time free for implementing it during Dapper.   * George recently noticed the boot process takes longer, but he doesn't know exactly why. He installs ''bootchart'', which has access to '''already recorded, medium-detail logs''' of the last few weeks, and uses bootchart's tools (customized versions of generic widgets) to investigate. He notices that his boot got slower by a 20 seconds on a certain date. He then uses Synaptic's new history panel to check what updates happened before that date, and quickly has a likely suspect.
Line 31: Line 31:
  * Pedro works on Malone, in Launchpad. Before UBZ, he remembers that the dependency handling in the bug tracker is really not optimal. He writes out a Summary and Rationale in a Launchpad wiki page, registers it as a specification in Launchpad, and suggests it for UBZ. Monica, Launchpad manageress, thinks that this is really not the time to be talking about it and rejects the application for UBZ. He then indicates it for the next conference, UBB, and marks its priority is Low.   * Linus wants to know if the new IO scheduler added to the kernel is really better. He can easily '''use the performance logs from a large number of users''' (properly anonymized, gathered by a [http://popcon.ubuntu.com/ Popularity Contest]-like tool) from a couple of months before and after the update, and can run more meaningful statistics than on a limited test system.
Line 33: Line 33:
  * Jason is an Ubuntu and Rosetta user. He has noticed that changes made to translations are making their way into language packs but not to the upstream versions, and adds a specification that describes a way for getting upstream to use language packs. Monica also has a plan for this but hadn't described it in a spec, so she adds it to the UBZ spec list, and adds Carlos, Rosetta maintainer, as drafter for it.

== Scope ==

This specification covers feature specifications for Ubuntu and Launchpad. It is not meant as a more general specification format.

== Design ==

A specification should be built with the following considerations:

  * The person implementing it may not be the person writing it. It should be clear enough for someone to be able to read it and have a clear path towards implementing it. If it doesn't, it needs more detail.

  * That the use cases covered in the specification should be practical situations, not contrived issues.

  * Limitations and issues discovered during the creation of a specification should be clearly pointed out so that they can be dealt with explicitly.

  * If you don't know enough to be able to competently write a spec, you should either get help or research the problem further. Avoid spending time making up a solution: base yourself on your peers' opinions and prior work.

Specific issues related to particular sections are described further below.

=== Summary ===

The summary should not attempt to say '''why''' the spec is being defined, just '''what''' is being specified.

=== Rationale ===

This should be the description of '''why''' this spec is being defined.

=== Scope and Use Cases ===

While not always required, but in many cases they bring much better clarity to the scope and scale of the specification than could be obtained by talking in abstract terms.

=== Implementation Plan ===

This section is usually broken down into subsections, such as the packages being affected, data and system migration where necessary, user interface requirements and pictures (photographs of drawings on paper work well).

== Implementation ==

To implement a specification, the assignee should observe the use cases carefully, and follow the design specified. He should make note of places in which he has strayed from the design section, adding rationale describing why this happened. This is important so that next iterations of this specification (and new specifications that touch upon this subject) can use the specification as a reference.

The implementation is very dependent on the type of feature to be implemented. Refer to the team leader for further suggestions and guidance on this topic.

== Outstanding Issues ==

The specification process requires experienced people to drive it. More documentation on the process should be produced.

The drafting of a specification requires English skills and a very good understanding of the problem. It must also describe things to an extent that someone else could implement. This is a difficult set of conditions to ensure throughout all the specifications added.

There is a lot of difficulty in gardening obsolete, unwanted and abandoned specifications in the Wiki.

== BoF agenda and discussion ==

We'll have a first public session on this on the first Monday in UBZ.
----
CategorySpec

performance
privacy
  * On a whim, Dexter takes a look at his computer usage logs, and notices interesting patterns in the delays when he leaves his computer unattended. He gathers '''already-accumulated data from other willing users''', writes his doctoral thesis on the subject, and develops a smarter algorithm for turning off the display and for locking the screen. The world energy usage lowers by a few hundred gigawatt. The global cost of security breaches is lowered by many millions of dollars. ~-(Note: By this time, Ubuntu is the dominant OS, in no small part due to its excellent monitoring facilities. [https://bugs.beta.launchpad.net/ubuntu/+bug/1 Bug #1] has been closed for some time.)-~

Work in progress!

Summary

This specification describes the way Ubuntu should gather and record run-time measurements of itself and the machine it's running on. It also describes a set of tools and libraries to facilitate displaying this information to the user. Think of it as extending top and System Monitor with a long-term history feature.

Rationale

Modern computers have many ways of measuring their run-time properties (e.g. temperature, battery charge, processor voltage and frequency). Most OS also allow measuring many of their parameters while running (e.g. resource usage and availability).

Most operating systems have tools that allow viewing some of these properties. The venerable program top is an example, as is the Gnome System Monitor or the Gnome Power Manager. Most of these tools have been written with a single purpose in mind; this is a noble thing in principle, but in this case it causes several problems.

The first problem is data availability. Most monitoring programs have been written to allow monitoring the instantaneous value of some parameters. This means most have no or very limited features for recording the data over long periods of time. In particular, a common problem is that monitoring is done only when requested, not continuously. (The Gnome Power Manager is a notable exception. However, it only remembers info until shut-down.)

The other problem is data accessibility: each program generates data separately, usually in it's own format, so it is difficult for users to view and even analyze it. In particular, tools like the Gnome System Monitor's resource usage graphs are only useful for rough estimates.

Bottom line: a single (modular) "monitoring daemon" could run continuously and accumulate information in a consistent way. The process can be managed with a unified interface, and a library with a few well-chosen utils and widgets would make tools like the Gnome System Monitor much more useful without huge efforts.

Use Cases

  • Bogdan is intermittently annoyed by sound stuttering, lack of responsiveness and other performance issues. However, by the time he opens a console and runs top, or he opens the System Monitor, the problem disappears. (These operations naturally take more time when the system is loaded.) However, he can just rewind the history a bit because the System Monitor has access to detailed logs of what happened during the last few hours.

  • George recently noticed the boot process takes longer, but he doesn't know exactly why. He installs bootchart, which has access to already recorded, medium-detail logs of the last few weeks, and uses bootchart's tools (customized versions of generic widgets) to investigate. He notices that his boot got slower by a 20 seconds on a certain date. He then uses Synaptic's new history panel to check what updates happened before that date, and quickly has a likely suspect.

  • Linus wants to know if the new IO scheduler added to the kernel is really better. He can easily use the performance logs from a large number of users (properly anonymized, gathered by a [http://popcon.ubuntu.com/ Popularity Contest]-like tool) from a couple of months before and after the update, and can run more meaningful statistics than on a limited test system.

  • On a whim, Dexter takes a look at his computer usage logs, and notices interesting patterns in the delays when he leaves his computer unattended. He gathers already-accumulated data from other willing users, writes his doctoral thesis on the subject, and develops a smarter algorithm for turning off the display and for locking the screen. The world energy usage lowers by a few hundred gigawatt. The global cost of security breaches is lowered by many millions of dollars. (Note: By this time, Ubuntu is the dominant OS, in no small part due to its excellent monitoring facilities. [https://bugs.beta.launchpad.net/ubuntu/+bug/1 Bug #1] has been closed for some time.)

UnifiedSystemMonitoring (last edited 2008-08-06 16:22:21 by localhost)