MilestoneReports

Milestone Reports

In this place every Ubuntu Touch milestone is described, along with its successes and shortcomings.

Milestone OTA-4

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww22-2015

OTA-4 - the big switch of stable to the rc vivid stable phone overlay based images.

Timeline

  • 2015.05.21:

Successes

Problems

Recommendations

Milestone ww13-ota

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww19-ota

OTA-3.5 - the hotfix OTA, last of the ubuntu-rtm/14.09 updates.

Timeline

  • 2015.05.05: All approved fixes land by evening this day, QA waits for custom and device tarball creation for the final image. Image #273 is built, after which the last device tarball is signed-off and released. #274 is the promotion candidate.

  • 2015.05.06 - 2015.05.07: Candidate image passes sanity and regression test suite. One worrying issue is a new crash when viewing the HERE T&C - but it seems to be already reproducible on the last promoted image.

  • 2015.05.07: QA finds an issue with long reboot times of the promotion candidate. After bisecting and testing, the issue is identified and fix prepared swiftly.

  • 2015.05.08: New candidate image passes testing and is copied to the RC channel for final assessment by BQ.

  • 2015.05.12: Image released, phased updates started - period of 24h by using and automated script.

  • 2015.05.13: Phasing finished, OTA-3.5 fully released.

Successes

Since only selected fixes were approved for landing, the number of silos set as ready was relatively sane and there were no last minute chaos as in previous milestones. Many people helped out bi-secting components to identify the reboot-time regression - once the source was identified the issue was swiftly fixed and landed.

Problems

Sometimes the time that BQ requires to finally sign-off our OTA images is a bit too random. This slows down the release time for users everywhere.

Recommendations

General recommendation is to stop finally waiting for manufacturers sign-off before releasing the image to end-users.

Milestone ww13-ota

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww13-ota

OTA-3

Timeline

  • 2015.04.08: Some spreadsheet issues started appearing starting this week. Landing gates closed. Device tarball sign-off in progress. First promotion candidate is built, testing starts.

  • 2015.04.09: Regression testing is ongoing. Blocking bugs identified, a new version of oxide (1.6.2) building. New image re-spinned and delta testing.

  • 2015.04.10: New oxide regressed. Another oxide building (1.6.3) and published to the rtm archive. New image building.

  • 2015.04.13: Latest image pushed to the RC channel, handed over to PES.

  • 2015.04.14: RC image handed over to BQ.

  • 2015.04.15: Image copied to stable. Phased update period set to 24 hours with a 6%-per-hour step. Updates of the phased-percentage made semi-manually by sil2100 every hour.

  • 2015.04.15: Image released to all users.

Successes

Fixes were well selected so there was no QA-sign-off madness of everyone wanting to land everything on one day. Also, this was the first phased-update OTA and it seems that the phasing worked perfectly.

Problems

Due to haste with the release, we prematurely released oxide 1.6.2 without testing it deeply if it doesn't introduce any regressions. Another problem was the manual update phasing after release, since there were a few hours where there was no one who could move the phases forward - and users noticed that period.

Recommendations

A script for complete controlled auto-phasing is now ready, so the next OTA should be much better.

Milestone ww11-ota

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww11-ota

This milestone was originally supposed to be ww13-ota, but we decided to do the OTA-2 2 weeks earlier due to shipment dates.

Timeline

  • 2015.03.09: Initial plan was to wait for final battery changes and U1 fixes and start testing as soon as possible, since QA had to run an additional OTA-test plan to make sure updates are working. Additional fixes get approved for landing for this milestone and get signed off.

  • 2015.03.10: More bugfixes are signed-off for the milestone. Waiting for the final device tarball. Tarball lands (and final click-package release request fails sign-off) and the promotion candidate is built (krillin #255).

  • 2015.03.11: Promotion candidate passes sanity and OTA testing. Regression testing starts. Around UTC evening a regression in the wizard is found (power dialog not appearing). Upstream creates a quick fix which is tested and landed. New promotion candidate is built (#256). QA only re-tests the system-settings bits and continues regression testing.

  • 2015.03.12: Testing continues. No new regressions found so far.

  • 2015.03.13: QA finds a possible regression in the webapps notification system. The issue is confirmed not to be a regression in the end. Image gets a green light from QA. Autopilot tests pass-rate +1. Image gets copied to the RC channel as image #21 and gets handed off to BQ.

Successes

Problems

Recommendations

Milestone ww09-2015

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww09-2015

Even though this milestone was originally planned for week 9 (i.e. release on Friday the 27th), due to client request we are forced to move the whole milestone one week before. So the planned promotion date will be Friday the 20th instead.

Timeline

  • 2015.02.16: Week of the early milestone begins, critical and factory fixes landing. As agreed between all the parties, QA requests autopilot test results from SDK, system-apps and web-apps teams for non-critical and non-factory fixes.

  • 2015.02.17: Calendar-app is requested to land, QA starts testing the application for addition to the image. Fixes landing.

  • 2015.02.18: Landing gates closed. Calendar-app is revealed to have multiple issues and not ready for landing, decision is made to skip it for this release. Landing of an ubuntu-system-settings related silo caused huge regressions in autopilot testing for this component, a quick workaround is deployed (root cause investigated in vivid).

  • 2015.02.19: First promotion candidate is tested, but missing an important telephony factory fix. QA tests without telephony. Wizard regression found, fix in the works. Both land and new candidate is build - #241.

  • 2015.02.20: There were some issues with a corrupt channels.json file, making latest built image non-OTA-able - later fixed by Oliver. Image #241 passes regression testing and is promoted to the RC channel for BQ-testing on Monday

Later:

  • 2015.02.23: RC image is passed to BQ.

  • 2015.02.25: BQ gives a +1 on promoting image to the stable channel. Image is promoted.

  • 2015.02.26: CKT reports a factory blocker bug, device tarball fix is deployed. A hack-around is made to include the new device tarball in the old rootfs of the promoted image. This gets sent out to CKT.

Successes

Even in such a short time-frame all criticals and factory fixes (besides one) as reported by BQ have been fixed.

Problems

Generally the biggest problem here was the short time period (and US holidays), since this meant that we did not have enough time to deal with all factory fixes as soon as we would like to. The lack of focus on automated testing also made some things troublesome, as many landings were bumped back because of broken tests.

Recommendations

Milestone ww07-2015

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww07-2015

Timeline

  • 2015.02.10: a moderate number of fixes has been submitted for the milestone, with a steady low velocity of landings throughout the previous week. Most CKT factory fixes have landed in ubuntu-rtm as well before this date.

  • 2015.02.11: QA finishes signing off remaining critical silos. A final device tarball with factory changes is prepared and signed-off. Image build slow due to builders being busy, finally promotion candidate is created (krillin #234) during UTC evening and US-tz QA picks it up and starts running sanity testing.

  • 2015.02.12: Sanity tests for promotion candidate pass, QA proceeds with regression testing.

  • 2015.02.13: No new issues blocking issues have been spotted and krillin image #234 is promoted to the RC channel (currently ubuntu-touch/rc/bq-aquaris.en) as image #18. Mako and x86 images are left unpromoted for the time being.

Successes

This time there were no problems spotted during regression testing, so developers were able to keep up the quality.

Problems

Not too many fixes have landed for this milestone, but this might be because of all the events related to the phone premiere and ongoing vivid work. Another small bikeshed-like problem surfaced during this milestone: what to do with mako and x86 images? The RC channel's name sounds krillin-specific, so there is basically no place where those images could be promoted to that would enable normal user usage (Steve is on this though).

Another problem are the autopilot tests. It is now visible that upstreams do not run their autopilot tests before landing and don't look at the testing dashboard afterwards as there are many test failures that haven't been fixed for long.

Recommendations

One recommendation is to once again re-introduce autopilot-tests as a promotion blocking mechanism. Best way would be to enable gating of ubuntu-rtm landings on autopilot tests, but this would require new CI infrastructure. Another possibility is to simply start requiring landers to provide AP test results as part of the sign-off process. But this would require further discussions. For now, starting with milestone ww11-2015, we will at least introduce the old rule of blocking promotion of the failure count increasing since the last promoted image.

Milestone ww05-2015

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww05-2015

Timeline

  • 2015.01.24 - 2015.01.25: Weekend shift for selected upstream developers, QA engineers and CI Train operator. QA signing off many last minute Friday silos along with new important critical fixes.

  • 2015.01.26: Landing gates closed early, only approved important fixes signed-off by the product team allowed to land. Many such silos, clicks and tarballs still remaining. Regression in location-service spotted, revert not possible due to re-introducing a ship-blocker bug.

  • 2015.01.27: All final clicks landing (gallery, camera, music), fix for location in the works. A regression in youtube handling (introduced from the youtube side) spotted. Fixes for both location and webbrowser land. Waiting for a final custom tarball including translation fixes. Experimental image built, sanity tests ran on it - passing but with a worrying possibility of an indicator-datetime regression.

  • 2015.01.28: Custom tarball sign-off completed, promotion candidate image built. QA and PES start their regression testing on image #220. QA finds a regression in the image, unity-scopes-shell is reverted and a new image built (#221). This image becomes the new promotion candidate.

  • 2015.01.29: Nightly build builds #222 which is the same as #221. Regression testing continues with good results. CKT informs of invalid factory IDs for some MMI plugins, which requires re-rolling both device and custom tarballs. Testing on #221 continues, decision has been made to first promote #221 and then follow up with subsequent promotion of the fixed-ID images (#224 most probably). Image #221/#222 passes criteria and is promoted as #15. Both custom and device tarballs land but decision is made to wait with promotion until next day.

  • 2015.01.30: CKT is happy with the fixed image, QA performs sanity testing on #224.

Successes

For this milestone all reported critical bugs have been fixed. QA high-availability and quick reactions helped multiple times in triaging incoming problems, helping out in applying necessary countermeasures.

Problems

Once again upstream developers submitted many many fixes at one point of time. As reported by QA during one evening on Friday there have been around 6-8 silos submitted in a very short time-frame, thus requiring all QA engineers to do full work-shifts during the weekend. Last minute tarball requests introducing noise in our schedules. Even with the landing gates closing early, due to different waits introduced by new important landing requests, image testing only starts happening on Wednesday. Another problem came to our attention when CKT informed us really late in the process about the invalid IDs in MMI plugins. As mentioned by John, there was never a clear communication of how those are supposed to be set correctly. During this release it also came to our attention that there are still a lot of processes that are exclusive to only certain individuals (e.g. custom tarball uploads).

Recommendations

Once again the first recommendation is to try and not submitting that many fixes for the last minute - bugfixes should be organized to be submitted evenly throughout the period of time before the landing gate freeze. Another recommendation is to alter some of our processes to make sure that there are more than a few people possible of performing a specific task in the others absence. Tarball creation and uploading should be streamlined. The landing team lead should also be able to perform image builds and promotions. We would also need all upstreams to once again make sure they're looking at per-image smoketesting results. If the tests were kept in shape with a solid baseline, every failure there should be an indication of a possible problem. Best if every engineer checked the test result page himself/herself, but for starters it might be enough for the engineering managers/tech leads doing it instead.

Milestone ww03-2015

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww03-2015

Timeline

  • 2015.01.13: Deadline for submitting silos for the milestone has been reached. At the end of the day 3 silos have been left for QA sign-off for the next day.

  • 2015.01.14: 3 remaining silos have been signed-off. Landing team notices issues with many new scoperunner crashers in autopilot smoketesting starting with image #192, but QA and dogfooders confirm that it's not reproducible in a normal user experience environment. Additional silo with location-service is pushed and requested by management last minute, past the deadline. The promotion candidate image has been kicked at 17:00 UTC. The newly created image fails sanity testing, revealing not being able to do OTA updates due to the removal of qml-module-ubuntu-connectivity from the image (unity8 upload, package not seeded). Package re-added as dependency of ubuntu-system-settings. Sanity testing passes.

  • 2015.01.15: Regression testing commencing. One possible blocker identified (bug #1411171), revert option analyzed (with the risk of re-introducing bug LP: #1362341). Testing continues while waiting on decisions on how to proceed. Upstream developer provides fix which passes QA, new image is kicked and goes into sanity testing.

  • 2015.01.16: Sanity tests pass on the new image, delta testing passing as well. One regression identified but with a very low user experience impact (LP: #1411273). QA gives a green light for promotion. Image #200 (krillin) is promoted.

Successes

Upstreams generally didn't push silos for the very last minute, which gave QA some more freedom. The upstream developer of indicator-datetime reacted very quickly to the blocker regression and provided a fix in a short time period.

Problems

Management pushed a last minute silo requirement, which delayed the candidate image creation by a few hours - which could have been already used for sanity testing. The removed dependency (not mentioned in the changelog) introduced a problem, but this particular case is very hard to catch beforehand - in the ideal case every component that's required by some other component should be listed in the dependencies. For clicks, the selected package should be seeded. We also need to remember about disabling the cronjob after closing the gates, not to generate no-change images for no reason.

Recommendations

Asking landers and QA engineers to do apt-get autoremove after installing their silos, making sure that any dependency removals do not cause unwanted package removals from the image. This won't protect us from all cases, but at least partially. It would be also great if urgent silo requests were communicated earlier.

Milestone ww51-2014

Overview

https://launchpad.net/canonical-devices-system-image/+milestone/ww51-2014

Timeline

Successes

Problems

Recommendations

LandingTeam/MilestoneReports (last edited 2015-06-10 07:24:27 by sil2100)