Ubuntu Open Week - Kernel QA - Leann Ogasawara - Fri, Nov 6, 2009
(01:01:26 PM) ogasawara: Hi Everyone! (01:01:26 PM) ogasawara: Welcome to the Kernel QA session - A Kernel Bug's Life Cycle. (01:01:41 PM) ogasawara: My name is Leann Ogasawara and I help manage the Ubuntu Kernel Team's incoming and existing bugs against the kernel. (01:01:51 PM) ogasawara: Having to deal with such a large volume of bugs is an on going challenge for us. (01:02:02 PM) ogasawara: I thought this session would be a good opportunity to discuss the life cycle of a kernel bug and what bug reporters and triagers should expect. (01:02:08 PM) sabdfl left the room ("So long, farewell, auf wiedersehen, goodbye ...."). (01:02:17 PM) ogasawara: The life cycle of a kernel bug is pretty straight forward. (01:02:23 PM) ogasawara: 1: Report the Bug (01:02:23 PM) ogasawara: 2: Triage the Bug (01:02:23 PM) ogasawara: 3: Fix the Bug (01:02:33 PM) ogasawara: Let's get started and begin with Part 1: Report the Bug (01:02:43 PM) ogasawara: Ubuntu kernel bugs should be filed against the "linux" kernel package. (01:02:54 PM) ogasawara: It's important to make sure the bug is filed against the linux package to help ensure it gets looked at by the Ubuntu kernel team. (01:03:09 PM) ogasawara: Reporting a kernel bug can be done by running the following command from a Terminal (Applications->Accessories->Terminal): (01:03:18 PM) ogasawara: ubuntu-bug linux (01:03:25 PM) ogasawara: Running the command will automatically gather general kernel debug information and attach it to the bug being filed. (01:03:38 PM) ogasawara: This includes information such as dmesg output, lspci, kernel version, etc. (01:03:49 PM) ogasawara: dmesg output provides a log of kernel messages that often contains helpful debug information. (01:04:00 PM) ogasawara: lspci output lets us know about hardware a system has. (01:04:11 PM) ogasawara: And the kernel version lets us know the exact kernel version the bug is being reported against. (01:04:22 PM) ogasawara: Part of the bug reporting process also involves writing a title for the bug. (01:04:34 PM) ogasawara: Please make sure the title of the bug report is descriptive. (01:04:44 PM) ogasawara: Don't use something like "Suspend Fails" or "Wireless is broken". (01:04:56 PM) ogasawara: It's better to use for example "Suspend fails to resume on my Dell Inspiron 1420". (01:05:09 PM) ogasawara: Always include hardware or driver information in a kernel bug's title when applicable. (01:05:23 PM) ogasawara: The reason I say this is because kernel bugs are often hardware specific. (01:05:34 PM) ogasawara: Even though someone may be experiencing the same symptom of a bug, they should really open a new report if they have different hardware than the original bug reporter. (01:05:48 PM) ogasawara: Different hardware uses different drivers which likely require different fixes, hence the reason for opening separate bug reports. (01:06:00 PM) ogasawara: Remember, hijacking someone else's bug report is bad. (01:06:10 PM) ogasawara: We can always mark a bug as a duplicate of another bug later on if necessary. (01:06:23 PM) ogasawara: When something like "Suspend Fails" is used as the title, everyone with suspend/resume issues ends up subscribing and commenting to the bug. (01:06:38 PM) ogasawara: This invites others to post completely unrelated information to the bug. (01:06:49 PM) ogasawara: Even worse, the bug will often get a flurry of "me too" comments posted. (01:07:00 PM) ogasawara: "Me too" comments serve no useful purpose in helping fix a bug and only bloat a bug report. (01:07:13 PM) ogasawara: This results in impossible to follow bug reports which are not likely to get much attention from the kernel team. (01:07:27 PM) ogasawara: If you are affected by the same issue, Launchpad now has an +affectsmetoo functionality. Just click on the "Does this bug affect you?" link in the bug report. (01:07:44 PM) ogasawara: If you are the original bug reporter and feel someone commenting on your bug has a separate issue, don't be afraid to kindly tell them they have a separate issue and to open a new report. (01:08:05 PM) ogasawara: These are the many reasons why I stress the importance of a descriptive bug title. (01:08:17 PM) ogasawara: Along with providing a descriptive title for the bug report, it's also just as important to provide a well written bug description. (01:08:31 PM) ogasawara: For the bug's description, it's always great to include steps to reproduce the issue if possible. (01:08:46 PM) ogasawara: This helps others to confirm they do indeed have the same bug. (01:08:59 PM) ogasawara: Additionally, it will help the developers debug the situation by either being able to reproduce the issue or get an idea what might be the root cause of the issue. (01:09:19 PM) ogasawara: Also, the bug description is a great place to mention if this is a regression or not. (01:09:36 PM) ogasawara: If the bug is a regression, it's important to also tag the bug as a regression. (01:09:56 PM) ogasawara: At the bottom or each bug report's description there should be a "Tags" line and a yellow pencil edit icon to add, remove, or update a bug's tag(s). (01:10:12 PM) ogasawara: There are usually 4 different regression tags that kernel bugs will use: (01:10:22 PM) ogasawara: 1) regression-potential - A bug discovered in the development release that was not present in the stable release. (01:10:37 PM) ogasawara: For example, right now Lucid is known as the development release and Karmic is the previous stable release. (01:10:52 PM) ogasawara: If someone finds a regression while testing Lucid while we are still in the development phase, this would be tagged "regression-potential". (01:11:04 PM) ogasawara: 2) regression-release - A regression in a new stable release. (01:11:17 PM) ogasawara: For example, Karmic just had it's official release. If a regression is found in Karmic, this would be tagged "regression-release". (01:11:30 PM) ogasawara: regression-potential bugs could very well become regression-release bugs. (01:11:39 PM) ogasawara: 3) regression-update - A regression introduced by an updated package in the stable release. (01:11:50 PM) ogasawara: For example, if Jaunty released a new kernel update and if a regression were discovered due to the update, this would be tagged "regression-update" (01:12:04 PM) ogasawara: 4) regression-proposed - A regression introduced by a package in -proposed (01:12:15 PM) ogasawara: Prior to any updates being released, packages sit in what's called -proposed. See https://wiki.ubuntu.com/Testing/EnableProposed . If a regression is found in -proposed, this would be tagged "regression-proposed" (01:12:29 PM) ogasawara: For more information, refer to https://wiki.ubuntu.com/QATeam/RegressionTracking (01:12:40 PM) ogasawara: If the bug is a regression, making sure to also specifically note the most recent version of the kernel where the bug did not occur and the version where the bug was first introduced is most helpful. (01:12:58 PM) ogasawara: This can help isolate the set of kernel patches which should be examined. (01:13:08 PM) ogasawara: With this version information a git bisect could also be used to determine the specific patch which introduced the regression. (01:13:22 PM) ogasawara: For those of you who don't know, git is the revision control system which is used by the upstream kernel as well as the Ubuntu kernel. For more information on git refer to: (01:13:32 PM) ogasawara: http://www.kernel.org/pub/software/scm/git/docs/ (01:13:40 PM) ogasawara: Regarding the git bisect, it's basically a multi-step process to systematically narrow down a specific commit which introduced a regression. (01:13:55 PM) ogasawara: It involves a series of steps of marking a known "good" and "bad" kernel version and proceeding to build and test kernels. (01:14:08 PM) ogasawara: It usually only takes a few iterations to narrow down a specific patch which is causing issues. (01:14:21 PM) ogasawara: For more information please refer to: (01:14:22 PM) ogasawara: http://www.kernel.org/doc/local/git-quick.html#bisect (01:14:29 PM) ogasawara: Lastly, it's also good to comment on the frequency the bug is occurring. (01:14:39 PM) ogasawara: Can the bug be triggered at will or does it happen randomly? If it happens randomly, how often does it happen? (01:14:57 PM) ogasawara: Once a bug has been reported, I have some additional tips to keep in mind that will help the kernel team work with the bug. (01:15:10 PM) ogasawara: First, make sure bug reports are kept up to date. Even a small comment that the issue still exists against the latest 2.6.xx-yy.zz kernel is useful. (01:15:22 PM) ogasawara: Also, when asked to test the latest development kernel, please don't be difficult and reply with "I can't believe you want me to test a newer kernel! This bug is against Hardy, which is an LTS release so it should be fixed there!" (01:15:39 PM) ogasawara: We understand where the frustration is coming from, but the hostile remark does not help solve the bug. (01:15:50 PM) ogasawara: Rants in general do not help resolve a bug, they rather have the opposite affect of annoying the developers trying to fix the issue. (01:16:06 PM) ogasawara: The fact of the matter is that before any kernel bug can qualify for a Stable Release Update, the bug should be confirmed as fixed in the actively developed kernel. (01:16:20 PM) ogasawara: Refer to http://wiki.ubuntu.com/StableReleaseUpdates for the Stable Release Update bug criteria and procedures. (01:16:31 PM) ogasawara: Also, if a bug has been resolved, don't be afraid to close the bug report. (01:16:44 PM) ogasawara: Marking the bug "Fix Released" helps make the kernel team's (and bug control team's) triaging efforts one step easier. (01:17:00 PM) ogasawara: We can always use additional help triaging kernel bugs which brings us to Part 2 of a kernel bug's life cycle, triaging the bug. (01:17:20 PM) ogasawara: But before we move on, I'm going to take a moment to field questions (if there are nay). Remember to post them in #ubuntu-classroom-chat , I'll copy and reply to them here. (01:18:32 PM) ogasawara: aight, lets move on (01:18:52 PM) ogasawara: Part 2: Triage the Bug (01:19:01 PM) ogasawara: Remember, as a triager we are often the first point of contact for a bug reporter. (01:19:13 PM) ogasawara: It's important that we help move a bug into a good working state as well as help educate the bug reporter to submit better bug reports in the future. (01:19:25 PM) ogasawara: So how does that happen? (01:19:30 PM) ogasawara: First, make sure Ubuntu kernel bugs are assigned to the Ubuntu linux kernel package. (01:19:40 PM) ogasawara: http://bugs.launchpad.net/ubuntu/+source/linux (01:19:46 PM) ogasawara: If a bug reporter did not correctly file the bug against the linux kernel package, help reassign the bug and kindly remind them to report future kernel bugs against the linux kernel package. (01:20:01 PM) ogasawara: Failing to do so may result in the bug getting overlooked. (01:20:11 PM) ogasawara: It may be helpful to also point them at https://wiki.ubuntu.com/Bugs/FindRightPackage . (01:20:25 PM) ogasawara: Next, we want to make sure a bug is really not a duplicate of another bug. (01:20:54 PM) ogasawara: openweek7_> QUESTION: Does 'ubuntu-bug linux' allow to significantly automate identifying duplicate errors? (01:21:23 PM) ogasawara: openweek7_: yes, during the bug filing process launchpad will present you a list of bugs this might be a duplicate of (01:21:56 PM) ogasawara: When marking bugs as duplicates, this is where we as kernel triagers need to be careful. (01:22:03 PM) ogasawara: Kernel bugs are usually hardware specific. (01:22:15 PM) ogasawara: Just because someone may be experiencing the same symptom of another bug, it doesn't necessarily mean they have the same bug. (01:22:28 PM) ogasawara: When in doubt, don't mark it as a duplicate and ask for a second opinion. (01:22:39 PM) ogasawara: Additionally, if you see someone comment on a bug and they don't have the same hardware, ask them to open a new bug report and explain why. (01:22:52 PM) ogasawara: This really helps prevent bugs from becoming wildly out of control with unrelated comments and impossible for a developer to follow, let alone fix. (01:23:08 PM) ogasawara: Next, help make sure the title of the bug as well as the bug description is informative. (01:23:19 PM) ogasawara: Like I mentioned earlier when reporting a bug, a title of "Sounds is broken" or "Suspend fails" is not informative. (01:23:32 PM) ogasawara: As a triager, if a bug doesn't have an informative title, help fix the title by making it more descriptive. (01:23:47 PM) ogasawara: One way to help improve the title is to mention the affected hardware or driver in the title. (01:24:00 PM) ogasawara: Another role of your job as a triager is to help improve the bug's description. (01:24:12 PM) ogasawara: One common improvement to a bug description would be to copy any relevant bits of debug information from the attached log files and paste them into the description of the bug. (01:24:25 PM) ogasawara: This could include items such as error messages found in a reporter's dmesg output or hardware information found in the lspci output. (01:24:41 PM) ogasawara: You may also want to post a comment as to why you're fixing the title or description of a bug to help remind the bug reporter to choose a better title/description in the next bug they report. (01:24:56 PM) ogasawara: when triaging, it's also helpful to tag the bug when applicable. (01:25:05 PM) ogasawara: Because the volume of kernel bugs is so high, tags are a useful way for triagers and developers to group and search for a category of bugs. (01:25:20 PM) ogasawara: One example of tags we talked about earlier were for regressions - https://wiki.ubuntu.com/QATeam/RegressionTracking (01:25:29 PM) ogasawara: If you see a bug is a regression and it has not been tagged, please feel free to add the appropriate tag. (01:25:42 PM) ogasawara: A list of common bug tags used by the kernel can be found at https://wiki.ubuntu.com/Bugs/Tags under the "Kernel Specific" section. (01:25:56 PM) ogasawara: Next, if you find you have the same hardware as a bug being reported, try to reproduce the bug yourself. (01:26:07 PM) ogasawara: It's not unheard of for hardware to become faulty. Being able to help confirm this is or is not the result of hardware going bad is important. (01:26:21 PM) ogasawara: Now I know the next part is sometimes a bit controversial, but it's also best if the issue has been confirmed against the latest kernel available. (01:26:52 PM) ogasawara: I realize this is a touchy subject for some individuals and some reporters often object to always being asked to "test the latest". (01:27:07 PM) ogasawara: However, when you are dealing with the kernel, keep in mind there are literally thousands of commits between each release. (01:27:19 PM) ogasawara: Then consider that each commit touches more than just one line of code and you've now hit insanity trying to isolate one fix (if it even exists) for a single bug. (01:27:41 PM) ogasawara: It's much easier if someone triaging the bug could kindly ask if the issue remains with the actively developed kernel. (01:28:08 PM) ogasawara: Finally, one of the most important aspects of triaging kernel bugs is making sure the appropriate debug information is attached. (01:28:25 PM) ogasawara: For the kernel this means dmesg output, lspci, kernel version info, etc. (01:28:37 PM) ogasawara: If a bug has been reported without this information, I recommend that instead of asking bug reporters to attach these files individually, have them run apport-collect. (01:28:53 PM) ogasawara: apport-collect will automatically gather and attach package specific debug information for a bug. (01:29:05 PM) ogasawara: For example, if we wanted kernel debug info attached to pretend bug 987654, the apport-collect command would look like: (01:29:07 PM) ubottu: Error: Launchpad bug 987654 could not be found (01:29:12 PM) ogasawara: heh (01:29:23 PM) ogasawara: apport-collect -p linux 987654 (01:29:34 PM) ogasawara: There's less room for error having a bug reporter run one command versus having a bug reporter run multiple commands to capture multiple log files. (01:29:48 PM) ogasawara: But as I mentioned earlier, it's best to just use ubuntu-bug to report the bug in the first place. (01:29:58 PM) ogasawara: fagan> Question: when dealing with a hardware specific bug how are you able to confirm it? (01:30:20 PM) ogasawara: fagan: if you happen to have the same hardware, hopefully the reporter has steps to reproduce for you to test and confirm the issue (01:30:55 PM) ogasawara: fagan: unfortunately if you do not have the same hardware, the best you can do it make sure appropriate debug info is attached for the developers to dig in to (01:31:11 PM) ogasawara: In the process of attempting to triage a bug, if you've asked a bug reporter to provide more information, be sure to set the bug's status to Incomplete. (01:31:29 PM) ogasawara: Also be sure to subscribe yourself to a bug so that you are automatically notified when they have responded with the requested information. (01:31:44 PM) ogasawara: Once the bug looks ready for a developer to begin working on it, set the status of the bug to Triaged and make sure the Importance is set. (01:31:57 PM) ogasawara: Note that being able to set a bug to Triaged and also to set the Importance requires that you be a member of the Ubuntu Bug Control team in Launchpad. (01:32:12 PM) ogasawara: To learn how to join the ubuntu-bugcontrol team, refer to https://launchpad.net/~ubuntu-bugcontrol (01:32:22 PM) ogasawara: I'd also like to bring up one last thing to keep in mind when triaging kernel bugs . . . and that's forwarding the bug upstream. (01:32:33 PM) ogasawara: Before a bug can be forwarded upstream, it should be confirmed to exist when running the latest upstream kernel. (01:32:50 PM) ogasawara: The Ubuntu kernel team has started building vanilla mainline kernel builds for users to test with. (01:33:01 PM) ogasawara: See https://wiki.ubuntu.com/KernelTeam/MainlineBuilds (01:33:06 PM) ogasawara: If a bug exists with the upstream kernel, the bug should be forwarded upstream so that the upstream kernel developers are also aware of the issue. (01:33:16 PM) ogasawara: Additionally, it may be discovered that the bug is fixed upstream and we should pull the fix back into the Ubuntu kernel. (01:33:29 PM) ogasawara: If a bug has already been reported to the upstream kernel bugzilla, http://bugzilla.kernel.org/ , we should make sure we set up an upstream bug watch from the Launchpad bug report to the upstream bug report. (01:33:50 PM) ogasawara: See https://wiki.ubuntu.com/Bugs/Watches for more information on how to set an upstream bug watch. (01:34:07 PM) ogasawara: And since I think we have time, I'd also like to take a moment here to point out some extra specific triaging/debugging tips and tricks that people may find useful. (01:34:17 PM) ogasawara: but I'll field two questions really quick (01:34:28 PM) ogasawara: nameiner> QUESTION: When is the time to set a bug's status to confirmed or is this for the devs? (01:34:49 PM) ogasawara: nameiner: anyone can set a bug's status to Confirmed . . . (01:35:17 PM) ogasawara: nameiner: the time to do so is when you have been able to confirm the bug yourself (01:35:42 PM) ogasawara: nameiner: otherwise if you think enough debug information it attached to confirm the issue, feel free to set it to confirmed then as well (01:36:07 PM) ogasawara: nameiner: if you can then get a hold of someone in ubuntu-bugcontrol (usuall in #ubuntu-bugs), they can review and set it to Triaged (01:36:23 PM) ogasawara: openweek1____> Question: Why the Edubuntu 9.10 does not have LTSP module on it? (01:36:59 PM) ogasawara: openweek1____: unfortunately I don't know the answer to that one, best to file a bug so it can be investigated :) (01:37:24 PM) ogasawara: ok, moving on to helpful triaging/debugging tips and tricks (01:37:34 PM) ogasawara: First, triaging update/install bugs . . . (01:37:44 PM) ogasawara: Having dealt with a good number of these types of bugs myself, I took the liberty to document some of the more common types of update/install issues I saw while triaging. (01:37:57 PM) ogasawara: See https://wiki.ubuntu.com/KernelTeam/DebuggingUpdateErrors (01:38:07 PM) ogasawara: That wiki outlines what some of the common error messages look like and the master bug the issue is likely a duplicate of. (01:38:20 PM) ogasawara: There are also examples of bugs being reported that really are not valid bugs. (01:38:31 PM) ogasawara: For example, an update/install failing due to the fact the user ran out of disk space is not a bug. We have no control over how much disk space someone has. (01:38:48 PM) ogasawara: The next tip is for triaging wifi issues. . . (01:39:00 PM) ogasawara: Some may or may not know that for the past few releases we've been packaging an updated compat-wireless stack from upstream via the linux-backports-modules package. (01:39:17 PM) ogasawara: This allows users to run a newer compat-wireless stack which may in fact contain a fix for an issue they are seeing. (01:39:30 PM) ogasawara: Most recently for karmic we've actually been packaging the upstream stable compat-wireless release. (01:39:41 PM) ogasawara: If somone is experiencing wifi issues and uses a driver supported via the compat-wireless stack, they should try installing and testing the linux-backports-modules-wireless-karmic-generic package. (01:40:05 PM) ogasawara: A list of supported drivers which would be eligible for testing using the linux-backports-modules package can be seen at http://wireless.kernel.org/en/users/Drivers . (01:40:20 PM) ogasawara: Similarly, the next tip for triaging sound issues follows the same philosophy. . . (01:40:33 PM) ogasawara: For the Karmic release, we also packaged an updated alsa-driver snapshot for testing. (01:40:43 PM) ogasawara: If someone is experiencing sound issues, it might not hurt to try installing and testing the linux-backports-modules-alsa-karmic-generic package. (01:41:03 PM) ogasawara: And lastly, a tip for triaging kernel panics. . . (01:41:14 PM) ogasawara: One of the recent additions to Karmic is the linux-crashdump utility. (01:41:26 PM) ogasawara: In the event you have a kernel panic and the system is unrecoverable, linux-crashdump can help at least capture the contents of the panic and the events leading up to the panic to help provide some post event diagnosis and analysis. (01:41:50 PM) ogasawara: For more information, refer to https://wiki.ubuntu.com/KernelTeam/CrashdumpRecipe (01:42:04 PM) ogasawara: In general, another good source for common debugging/triaging tips can be found at https://wiki.ubuntu.com/KernelTeam/KnowledgeBase#Debugging (01:42:19 PM) ogasawara: As always, feel free to also take a look at https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies for more triaging information. (01:42:33 PM) ogasawara: The volume of kernel bugs can be daunting, especially considering the limited number of resources the kernel team has. (01:42:48 PM) ogasawara: If anyone has the desire to want to start helping out with Triaging kernel bugs, please let me know! I'd be more than happy to help you get started. (01:43:19 PM) ogasawara: So before we move on to Part 3, Fixing the Bug, I'd like to field any additional questions if there are any . . . (01:44:10 PM) ogasawara: Ok, lets move on . . . (01:44:25 PM) ogasawara: Part 3: Fix the bug (01:44:35 PM) ogasawara: Once a bug has been reported and moved into a Triaged state, this implies the bug has enough information and is ready for a developer to hopefully begin debugging the issue and working on a fix. (01:44:57 PM) ogasawara: When a developer begins working on a bug, they should assign the bug to their self and also set the Status of the bug to In Progress. (01:45:12 PM) ogasawara: Please do not assign someone else to a bug unless you have their permission. (01:45:24 PM) ogasawara: Likewise, do not mark a bug as In Progress unless you are actively working on fixing the bug. (01:45:38 PM) ogasawara: Ignoring these requests will result in the bug likely getting overlooked as others will think someone else is already addressing the issue. (01:46:02 PM) ogasawara: When a developer has a possible patch to test, they will typically build a test kernel for the bug reporter to try. (01:46:14 PM) ogasawara: As a bug reporter, please be responsive with testing and feedback. (01:46:25 PM) ogasawara: Ideally, the patch being tested will have originated from upstream or will have already been submitted upstream. (01:46:39 PM) ogasawara: If a patch has not been pushed upstream, it needs to go upstream first. (01:46:50 PM) ogasawara: It's preferable for a patch to first be accepted upstream and then be pulled back down into the Ubuntu kernel. (01:47:04 PM) ogasawara: Once a patch has been committed to the Ubuntu kernel git repository, the status of the bug should be changed to Fix Committed. (01:47:18 PM) ogasawara: This is usually done by the developer. (01:47:26 PM) ogasawara: Only when a package containing the fix has been released to the archive should the status of the bug change to Fix Released. (01:47:43 PM) ogasawara: Assuming the developer made a note of the bug # in the changelog, the launchpad janitor should automatically update the status of the bug to Fix Released when a package containing the fix has been released. (01:47:59 PM) ogasawara: Once a bug has been marked Fix Released please refrain from re-opening the bug unless you are the original bug reporter and the issue remains unresolved. (01:48:17 PM) ogasawara: If someone is still experiencing issues after a fix has been released, it's likely they are experiencing a different issue which warrants opening a new and separate bug report. (01:48:35 PM) ogasawara: Likewise, if a bug has been marked Won't Fix or Invalid, please do not re-open the bug. (01:48:47 PM) ogasawara: There should have been a reason explaining why the bug is being marked Won't Fix or Invalid. (01:49:00 PM) ogasawara: Just because the status changes back to New, it doesn't change the reason the bug was originally closed. (01:49:24 PM) ogasawara: So that should cover the general life cycle of an Ubuntu Kernel Bug. (01:49:32 PM) ogasawara: Are there any other questions? (01:49:50 PM) ogasawara: Otherwise we can end 10min early for people to stretch before the amazing bethlynn takes over. (01:50:42 PM) ogasawara: Ok, I think we'll end a few mins early. Thanks everyone! (01:50:56 PM) ogasawara: And feel free to ping me if you have any other questions or want to get involved.