Friday, December 03, 2010

The xpcshell case on Windows

If you care about end to end times you might want to read this post.

I have enabled debug unit tests on the Windows 7 testing minis and I started filing the permanent oranges for them.
After notifying one of them, philor pointed out something that caught my attention. Debug xpcshell runs on Win7 takes more than a 100 mins ( :S ) compared to 30-40 mins on the IX machines that running Win2003. That sounds like a lot!

I decided that if we are going to disable debug unit tests on the builders (Win2003) as we did before for other platforms we should look closely and see what is going on.
NOTE that at the time that we did the switch we did not have easy ways of tracking variations and the gain was tremendous (more CPU power and end-user OSes) compared to increasing the end to end time. We improved greatly the wait times (larger CPU available) but the end to end times were affected on some platforms since the minis have lower hardware specification (for instance on leopard we didn't loose that much).

Now that I am back to moving Windows debug unit tests to the minis (I have been away -kind of - for more than 2 months) there is something that there wasn't at that time; ssalbiz prepared two months ago a report out of the information from our schedulerdb that has averages for our test jobs. This report was in reaction to some good discussions I had with shaver about our tear down/tear up times.

I will break the rest of this post into data and conclusions.

NOTE:   I am using data from mozilla-central and for Dec. 2nd, 2010.
NOTE2: I am using averages. I know, it is what I have.
NOTE3: Some of the statements on this post do not apply to mozilla-1.9.1 and mozilla-1.9.2
DATA:   The spreadsheet containing the data and charts used. Please be gentle on drawing conclusions without knowing all context that I would love to help you understand.

How many test jobs (perf and unit tests jobs - opt and debug) do we run for mozilla-central? (I am ignoring JP, mozmill-all and mobile)
  • 177
NOTE: We currently run concurrently debug unit tests on Win2003 and Win7. This will change.
In the next couple of months we will also add Windows XP.
  • What is the average for each job? 
 Well, our reports can tell us now.
  • What is the job that takes the longest for each platform? MAX() to the rescue!
If we take the worst average for any test job for each platform and we put them in a table. We can see the following.
Table 1 - This shows for each platform the worst average for any test job. In orange xpcshell. In purple Windows platforms.
We can see the worst for all test jobs is "debug xpcshell for Win7" with 106.18 minutes on average. Up until now it was "optimized xpcshell Win7x64" with 79.95 minutes (probably if we had debug on Win7x64 it would be even worst).
It is also noticeable on the list of worst three offenders for each platform that xpcshell only appears for Windows. That leads me to other questions.

Xpcshell on Windows
  • What would the world look without xpcshell? (or a shorter run of it)
Table 2 - Let's not count xpcshell at all for Windows.
If we remove xpcshell for Windows in our calculations we can see that there is a new worst offender for each platform combination. For instance, for Windows 7 debug type jobs we have "mochitest-other" as the test job that would take the longest. Instead of taking 106.18 mins for having a complete Windows debug coverage we would then only have to wait less than an hour; this is a decrease of 45% which is not bad!!

Let's look now at what would the worst time for all test jobs for any platform look like with and without xpcshell being considered.
Table 3 - Test coverages completion for all platforms
Currently, we wait close to 80 minutes to have a complete coverage for Windows (well, kind of as we don't really pay too much attention to Win7x64 - yet).
If Win7 debug unit tests replace the Win2003 debug unit tests we would have to wait a 32.81% more to have complete Windows coverage. That is not good!
In the last two rows of the previous table you can see that if xpcshell was ignored the new worst offender would be Fedora mochitest-4 debug and developers would wait close to 10% less (not really as build times are in favor of Linux) regardless on where we run Windows unit tests (IX/VMs vs minis). This means that Windows would not be anymore on the way to have full platform coverage (not really as the worst build times are for Windows) but Linux.

Xpschell on all platforms
  • How horribly does Windows compare to other platforms when running xpcshell?
Quite bad.
Let's look at the following chart:
You can easily see that every other platform besides Windows ( BLUE ) takes less than 30 minutes. Debug unit tests on the IX machines takes around 40 minutes while on the minis can take up to 100 minutes.
Something makes running xpcshells very very slow on Windows.
All other suites on Windows are not as dramatically as bad as we can see the gains when not considering it (45% gain on debug unit tests completion for Win7).
Ehsan suggested me to determine if xpcshell is going this slow because I/O by looking at the CPU usage that Windows provides.
If it is the reason or not someone needs to look at how to improve it by either fixing some underlying code or breaking xpcshell for Windows into two-three pieces to make it finish on the same range as other test suites.

As you might have noticed, this post is only considering at the times for complete coverage considering all platforms finished at the same time. This is not our reality as each platform takes a different time to finish a build (Windows debug is dramatically slow). This is worth another blog post and will be the basis for improving the left side of the equation (the build times) rather than the right side (the tests times).

This post is to be informative and to help us discover that we can improve the infrastructure even more. After we complete off-loading the builders from unit tests jobs into the minis we have to think of improving the suites, where instead could we run the test jobs, how we can make our builds even faster and others.

We face these new problems because we have stretched our infrastructure and to solve them we will have to reconsider many assumptions and keep on adding tools to allow us make better decisions.

Please don't expect me to do such detailed blog posts as they are quite time consuming. I should finish my goals first!

Questions welcome.

[1] Spreadsheet and charts

Creative Commons License

1 comment:

  1. To note that the reporting system has been built on top of the information of our schedulerDB that catlee, nthomas and anamarias had been working on before ssalbiz. In later quarters some of this information will be made available to interested parties to help them understand our systems.