Mark Shuttleworth: Precise Quality, not just for Precise

I upgraded my primary laptop to Precise yesterday. Very smoooooth! Kudos to the Ubuntu team for the way they are running this cycle; their commitment to keeping the Precise Pangolin usable from opening to release as 12.04 LTS is very evident.

The three legs of our engineering practice are cadence, quality and design. For those teams which maintain their own codebases (unity, juju, bzr, lp and many more) the quality position is a easier to define, because we can make test coverage and continuous tested integration standard practices. It’s more challenging for the platform team and Ubuntu community, who integrate thousands of packages from all sorts of places into one product: Ubuntu. We’ve traditionally focused on items like security, where participation in a global security process helps us ensure Ubuntu gets world-class security support and has established a world-leading track record of security patches and proactive security.

Nevertheless, the last year has seen some amazing leaps forward in our ability to manage quality across the entire platform. In large part, that’s thanks to the leadership of Rick Spencer and Pete Graner, who made smoke-testing and benchmarking a rigorous part of the process for every change to the platform, and lead the work to make that commitment sane in practice across all the hundreds of people, inside and outside Canonical, who needed to be on board with it. And it’s thanks to tools like Jenkins and LAVA which automate the testing and reporting across a vast array of problem spaces, architectures and packages.

So we have a daily weather report for Precise, which gives you a feeling for where things stand right now, as well as tighter integration of the test suites being run by Canonical upstreams on code destined for Precise with the test harness used by the platform team integrating that work into the distribution. I’ll take the liberty of repeating some of Rick’s core points here:

For upstreams, it boils down to “treat your trunk as sacred”. Practically, it requires:

There is a trunk of code bound for Ubuntu.

This trunk always builds automatically.

This trunk has tests that are always passing automatically.

All branches are properly reviewed for having both good tests and good implementation before merged into trunk.

Any branch that breaks trunk by causing automated tests to fail or causes trunk to stop building, are immediately reverted.

For Ubuntu Engineering, the responsibilities include:

Every maintainer in Ubuntu must have a test plan for upstream trunks that are run before uploading to the development release.

Tests in the test plan that are automated can be run with the help of the QA team.

Tests in the test plan that are manual can be run with the help of Nicholas, the new community QA Lead

Refrain from uploading a trunk into Ubuntu if there are serious bugs found in testing that will slow down people using the development release for testing and development.

Revert uploads that break Ubuntu, as there is no point in having the latest of a trunk in Ubuntu if it’s broken and just slowing everyone down.

Add tests to upstream projects for the Ubuntu test plan if serious bugs do get through that cause a revert.

Now that the harnesses are in place, we’re going to crank up the sensitivity of the test suite, by adding more tests and flagging more of them as critical issues for immediate resolution when they break. Key items to add next are daily tests on software center changes, and tests of the multi-monitor work that is under way for 12.04 in Unity (using some pretty magical hardware setups).

There are a variety of additional practices and processes in place too, such as testing of the dialy ISO’s, reversion of changes that cross specific thresholds of stability for specific types of users, pro-active smoke testing of archive sanity throughout the cycle, and a dedicated vanguard quality team that aim to keep velocity high for everyone despite these additional gates and checks.

This isn’t limited to Canonical team members; didrocks and the French Musketeers have built a Unity SRU testing process which should let us crowdsource perspectives on the quality improvements or regressions of changes in Unity. Ara’s ongoing work around component and system testing is giving us a very useful database of known issues at the hardware level. Work on Checkbox and related tools continues to ensure that people can contribute data and help prioritise the issues which will have the widest benefit for millions of community adopters.

Upstream quality

Where upstreams have test suites, we’re integrating those into the automated QA framework. In an ideal world, whenever a package is changed, we’d have an upstream test suite to run for that package AND for every package which depends on it. That way, we’d catch breakage in the package itself, but more importantly, we’d catch consequential damage elsewhere, which is much harder for upstreams to catch themselves.

We’re already running that program, and as upstreams start to take testing more seriously, coverage across the whole platform will improve significantly. It’s been Canonical practice to have test suites for several years, and it’s very encouraging to see other upstreams adopting TDD and at the least rigorous unit and functional testing, one at a time. Open source projects love to talk about quality – but it’s important to back that with measurable practices and data. As an example in a complex case, we run the LTP against every kernel SRU, in addition to our own kernel and hardware cert tests.

In future, it should be possible to link this to the existing daily builds of tip (we have over 500 upstreams running daily builds on Launchpad, which is fantastic). THAT would give upstreams the ability to know when commits to their tip break tests in dependent packages. It would suck a large amount of compute, but it would provide a fantastic early warning system of collisions between independent changes in diverse but related projects.

There’s a lot more we will do, by integrating Apport for crash data collection, and routing those reports through a big data sieve we should be able to identify the issues which are having the biggest impact on the most users. But that’s a blog for another day. For now, well done, team Ubuntu!