KernelMaverickTracingSupport

Differences between revisions 10 and 12 (spanning 2 versions)
Revision 10 as of 2010-04-21 16:22:35
Size: 5513
Editor: cpe-75-180-27-10
Comment:
Revision 12 as of 2010-04-21 16:45:38
Size: 5698
Editor: cpe-75-180-27-10
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
 * '''Packages affected''': linux, trace-cmd, kernelshark, perf  * '''Packages affected''': linux, perf, trace-cmd (New), kernelshark (New)
Line 29: Line 29:
 * Mario has a laptop with an rf switch that causes the kernel to emit a "scheduling while atomic" bug when it is flipped. An arsenal script determines this is a "scheduling while atomic" bug and asks Mario to run "kernel-trace-report -c __schedule_bug:traceoff: <bug #>". The script enables tracing, Mario reproduces the bug, and then the tracer uploads compressed ftrace binary data to the launchpad bug. The kernel developer uses kernelshark to analyze the data and realize quickly that the rfkill processing must be deferred to a worker thread in the kernel. Bug resolution time of a few hours.  * Mario has a laptop with an rf switch that causes the kernel to emit a "scheduling while atomic" bug when it is flipped. An arsenal script determines this is a "scheduling while atomic" bug and asks Mario to run:
 {{{
kernel-trace-report -c '__schedule_bug:traceoff:' <bug #>
 }}}
The script enables tracing, Mario reproduces the bug, and then the tracer uploads compressed ftrace binary data to the launchpad bug. The kernel developer uses kernelshark to analyze the data and realize quickly that the rfkill processing must be deferred to a worker thread in the kernel. Bug resolution time of a few hours.
Line 31: Line 35:
 * Luigi is seeing a warning message in his system log and creates a launchpad bug about them. The warnings suggest an out of range value for pkt->seq in the e100 driver. A kernel developer asks him to run "kernel-trace-report -p 'e100.c:157 pkt->seq' <bug #>". The script enables tracing, Luigi reproduces the bug, and compressed ftrace data is uploaded to the launchpad report. The kernel developer uses kernelshark to analyze the data and is able to see what the values of pkt->seq are in the e100 driver when the bug occurs. All this can be performed without the user needing to reboot the running kernel.  * Luigi is seeing a warning message in his system log and creates a launchpad bug about them. The warnings suggest an out of range value for pkt->seq in the e100 driver. A kernel developer asks him to run:
 {{{
kernel-trace-report -p 'e100.c:157 pkt->seq' <bug #>
 }}}
The script enables tracing, Luigi reproduces the bug, and compressed ftrace data is uploaded to the launchpad report. The kernel developer uses kernelshark to analyze the data and is able to see what the values of pkt->seq are in the e100 driver when the bug occurs. All this can be performed without the user needing to reboot the running kernel.
Line 40: Line 48:
You can have subsections that better describe specific parts of the issue.  * Package trace-cmd and kernelshark.
 * Ensure kernel configuration supports all reasonable tracing options
 * Create kernel-trace-report script as a wrapper to trace-cmd and perf-probe. Script should be able to:
  * Continue running even if X or terminal dies.
  * Use launchpadlib to upload compressed trace data to launchpad bug report.
  * Automatically upload when tracing stops.
  * Allow manual stop.
  * Allow local use (i.e. don't upload at end).
  * Support full trace-cmd and perf-probe utilization.
Line 46: Line 62:
=== UI Changes ===

Should cover changes required to the UI, or specific UI that is required to implement this
Line 53: Line 65:

=== Migration ===

Include:
 * data migration, if any
 * redirects from old URLs to new ones, if any
 * how users will be pointed to the new way of doing things, if necessary.

Summary

The lightweight ftrace tracer in Linux allows for precise and controlled tracing throughout the kernel. In the Maverick kernel we will have support for function profiling, dynamic tracepoints, and an interface for retrieving trace data in a compact manner. The new trace-cmd and kernelshark tools, and improvements to perf, will allow for more efficient debugging and better analysis.

In Ubuntu, we need to ensure the correct kernel configuration options are set, package up trace-cmd and kernelshark, ensure that our version of perf supports dynamic tracepoint setting, and provide tools to wrap the functionality into an easy to use utility that integrates with Launchpad.

Release Note

New kernel tracing functionality will enable more efficient debugging of kernel issues. Analysis tools are provided to examine the data more rapidly and accurately. Scripts are provided so an end user can easily run them and upload results to a launchpad bug.

Rationale

Kernel issues are often difficult and time consuming to examine and resolve. The ftrace tracing functionality of the kernel provides a new mechanism to debug issues in a lightweight, yet feature-rich manner. One issue with kernel debugging is the time it can take to get reasonable debug information. Often this requires building new kernels or modules to add printk debug statements. Kprobe events, available in the Maverick kernel, work with ftrace to solve this issue by providing a way to insert tracepoints into a running kernel.

At the analysis stage, kernelshark can be used to examine raw data without being confined to a format specified at compile time. Being a GUI application, it also provides an efficient means of taking large amounts of data and focusing down on the information relevant to the problem at hand.

Currently, when we need debugging information we ask users to run a series of copy-pasted commands in a terminal and then manually upload them to launchpad. Providing a script to do all the tedious commands and then upload the data to launchpad for the user will help reduce errors and increase efficiency of the process.

User stories

  • Mario has a laptop with an rf switch that causes the kernel to emit a "scheduling while atomic" bug when it is flipped. An arsenal script determines this is a "scheduling while atomic" bug and asks Mario to run:
    kernel-trace-report -c '__schedule_bug:traceoff:' <bug #>
    The script enables tracing, Mario reproduces the bug, and then the tracer uploads compressed ftrace binary data to the launchpad bug. The kernel developer uses kernelshark to analyze the data and realize quickly that the rfkill processing must be deferred to a worker thread in the kernel. Bug resolution time of a few hours.
  • Luigi is seeing a warning message in his system log and creates a launchpad bug about them. The warnings suggest an out of range value for pkt->seq in the e100 driver. A kernel developer asks him to run:

    kernel-trace-report -p 'e100.c:157 pkt->seq' <bug #>

    The script enables tracing, Luigi reproduces the bug, and compressed ftrace data is uploaded to the launchpad report. The kernel developer uses kernelshark to analyze the data and is able to see what the values of pkt->seq are in the e100 driver when the bug occurs. All this can be performed without the user needing to reboot the running kernel.

Assumptions

  • The Maverick kernel will be based on at least 2.6.33.
  • The trace-cmd, perf-probe, and kernelshark utilities will be mature enough to not hinder their usage (not full of bugs or usage issues, etc.)

Design

  • Package trace-cmd and kernelshark.
  • Ensure kernel configuration supports all reasonable tracing options
  • Create kernel-trace-report script as a wrapper to trace-cmd and perf-probe. Script should be able to:
    • Continue running even if X or terminal dies.
    • Use launchpadlib to upload compressed trace data to launchpad bug report.
    • Automatically upload when tracing stops.
    • Allow manual stop.
    • Allow local use (i.e. don't upload at end).
    • Support full trace-cmd and perf-probe utilization.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

KernelTeam/Specs/KernelMaverickTracingSupport (last edited 2010-04-21 16:56:18 by cpe-75-180-27-10)