Armen Zambrano's battlefield: Re-thinking Mozilla's Firefox for Android automated testing

NOTE: This is a blog post from December 2013 which I did not publish since our main proposal could not be accomplished. However, some of the analysis I did in here was interesting and could be useful in the future for reference. This proposal did not go through because we were proposing reducing armv6 testing on a periodic approach instead of per-check-in without the ability to backfill (we now have that ability).

Callek recently blogged about what the distribution of Android versions are, as well as what Mozilla's Firefox for Android version distribution is.

I wanted to put his data into a table and graphs to help me make comparisons easier.

DISCLAIMER: (I’m quoting Callek, however, I'm striking what I won't be doing)

I am not a metrics guy, nor do I pretend to be.
This post will not give you absolute numbers of users.
This post is not meant to show any penetration into the market
Ignores all things Firefox OS for the purposes herein.
I present this as an attempt at data only and no pre-judgment of where we should go, nor what we should do.
I am explicitly trying to avoid color commentary here, and allowing the reader to draw their own conclusions based on the static data.¹

Android versions

The tables shows the top four versions rather than the seven Android versions that Google reports against the Firefox Beta and Firefox Release distribution of Android versions. Please read Callek's post to know how he gathered the data.

Data from December 2013

If we look at the table in the image, we will notice that we’re listing the top four versions with most users for Android, Firefox Beta and Firefox Release. The last row, shows what percentages those four versions represent of the total number of Android users.

The three chart pies represent the same data in a visual manner.

The stacked bars chart shows only two specific versions: 2.3 and 4.0.

If you look at the stacked bars chart, we have two clear anomalies compared to the normal distribution of Android users:

We have a lot more users on 4.0 than normal and/or
We have abnormally less users on 2.3 than the norm

One theory that Callek shared with me, is that it is likely that there are devices running Android 2.3 that don't have the Play Store as it was not a product requirement (citation needed). This would explain this pattern for Android 2.3.

Armv6 VS Armv7

	ArmV7	ArmV6	x86
FxBeta	96.19%	0.9%	2.91%
FxRelease	98.61%	1.39%	N/A

It seems that 96-99% of our Firefox users are using an Armv7 device. I don't know if that is growing or shrinking, or if this distribution is the same for every country.

I do know that if we stopped automated testing of the Firefox Armv6 builds on the Tegra board, we would have much better wait times (read below to know what wait times are).

Android 2.2 vs Android 2.3

	Android	FxBeta	FxRelease
2.2	1.6%	1.62%	1.66%
2.3	24.1%	10.82%	14.22%

Another aspect that I want to point out is that we have less than 2% of our users on Android 2.2. Currently, our Tegra mobile testing devices are running Android 2.2.

I believe we would gain value by moving our testing infrastructure from Android 2.2 to 2.3.

We currently started running Android 2.3 test jobs inside of Android emulators on Amazon's EC2 instances. It is still experimental and in its early stages, but it might be a viable option. It is yet to be seen if we could run performance performance jobs (aka talos) on them.

We could also have Android 2.3 running on Panda ES boards, but no work has begun on that project that I’m aware of.

Closing remarks

I can't say what we should be testing on, however, I can highlight some of the information that I believe is relevant to push a decision one way or another.

Our current infrastructure load and current distribution of Android versions are not the only factors that need to be considered when determining what our testing pool should be. For example, if we had a country where Firefox for Android was extremely popular and all users were running on Armv6 (or Android 2.2), we would need to take into account whether or not we want to keep running this architecture/version in our test infrastructure, even though the number of users is small on a global scale.

Another example would be if we had partner deals, similar to the recent news about Gigabyte and Kobo bringing Firefox for Android pre-installed. In such a situation, we could have reached a testing coverage agreement, and therefore would have to support our partner’s needs even if their architecture/version choice had a small number of users globally. However, they did not choose Android 2.2 or Armv6.

Recommendations

My recommendations are:

Immediately drop automated testing of Armv6 builds on the Tegras

This would decrease our current wait times for Tegras
This would improve our turn around for Armv7 testing on Tegras

Push on Q1 to have Android 2.3 running side by side with Android 2.2 jobs

This would allow us to reduce testing on Tegras if we still have wait times after shutting down Armv6 testing

At the present time, I would not recommend the following:

Move Panda testing to 4.1 JB (instead of 4.0 ICS)
JB is where most of our users are
However, dealing with the first two goals will involve grabbing the same people from the Mobile, A-team and Release Engineering teams
I would wait until the end of Q1 to see if the first two goals made it through and if this idea is even the right one

Definitions

1 - Wait times

Every time a developer commits new code to our repositories, we create various builds for Firefox for Android. These builds are tested on either Tegras or Panda boards (read below).

The time that a job waits to get started is called “wait time”.

2 - Automated testing on Tegras

Currently, we can't buy anymore Tegras and we have around 200 left, which are in various states of “good”. We have not been able to keep up with our current load for a long while; we only start around 50-70% of our test jobs within 15 minutes of being available to be run (our aim is to start 95% of our test jobs within 15 minutes).

We run between 5k-7k test and performance jobs.

3 - Automated testing on Pandas

We have around 900 of them running Android 4.0.

They meet our wait time demands consistently.

They are properly supported through mozpool, which allows our team to do lots of operations on the boards through various APIs. These APIs also allow us to re-image boards on the fly to have different Android versions.

We run between 3k-5k test and performance jobs.

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Armen Zambrano's battlefield

Thursday, March 20, 2014

Re-thinking Mozilla's Firefox for Android automated testing