Friday, December 03, 2010

The xpcshell case on Windows

If you care about end to end times you might want to read this post.

I have enabled debug unit tests on the Windows 7 testing minis and I started filing the permanent oranges for them.
After notifying one of them, philor pointed out something that caught my attention. Debug xpcshell runs on Win7 takes more than a 100 mins ( :S ) compared to 30-40 mins on the IX machines that running Win2003. That sounds like a lot!

I decided that if we are going to disable debug unit tests on the builders (Win2003) as we did before for other platforms we should look closely and see what is going on.
NOTE that at the time that we did the switch we did not have easy ways of tracking variations and the gain was tremendous (more CPU power and end-user OSes) compared to increasing the end to end time. We improved greatly the wait times (larger CPU available) but the end to end times were affected on some platforms since the minis have lower hardware specification (for instance on leopard we didn't loose that much).

Now that I am back to moving Windows debug unit tests to the minis (I have been away -kind of - for more than 2 months) there is something that there wasn't at that time; ssalbiz prepared two months ago a report out of the information from our schedulerdb that has averages for our test jobs. This report was in reaction to some good discussions I had with shaver about our tear down/tear up times.

I will break the rest of this post into data and conclusions.

NOTE:   I am using data from mozilla-central and for Dec. 2nd, 2010.
NOTE2: I am using averages. I know, it is what I have.
NOTE3: Some of the statements on this post do not apply to mozilla-1.9.1 and mozilla-1.9.2
DATA:   The spreadsheet containing the data and charts used. Please be gentle on drawing conclusions without knowing all context that I would love to help you understand.

Everything
How many test jobs (perf and unit tests jobs - opt and debug) do we run for mozilla-central? (I am ignoring JP, mozmill-all and mobile)
  • 177
NOTE: We currently run concurrently debug unit tests on Win2003 and Win7. This will change.
In the next couple of months we will also add Windows XP.
  • What is the average for each job? 
 Well, our reports can tell us now.
  • What is the job that takes the longest for each platform? MAX() to the rescue!
If we take the worst average for any test job for each platform and we put them in a table. We can see the following.
Table 1 - This shows for each platform the worst average for any test job. In orange xpcshell. In purple Windows platforms.
We can see the worst for all test jobs is "debug xpcshell for Win7" with 106.18 minutes on average. Up until now it was "optimized xpcshell Win7x64" with 79.95 minutes (probably if we had debug on Win7x64 it would be even worst).
It is also noticeable on the list of worst three offenders for each platform that xpcshell only appears for Windows. That leads me to other questions.

Xpcshell on Windows
  • What would the world look without xpcshell? (or a shorter run of it)
Table 2 - Let's not count xpcshell at all for Windows.
If we remove xpcshell for Windows in our calculations we can see that there is a new worst offender for each platform combination. For instance, for Windows 7 debug type jobs we have "mochitest-other" as the test job that would take the longest. Instead of taking 106.18 mins for having a complete Windows debug coverage we would then only have to wait less than an hour; this is a decrease of 45% which is not bad!!

Let's look now at what would the worst time for all test jobs for any platform look like with and without xpcshell being considered.
Table 3 - Test coverages completion for all platforms
Currently, we wait close to 80 minutes to have a complete coverage for Windows (well, kind of as we don't really pay too much attention to Win7x64 - yet).
If Win7 debug unit tests replace the Win2003 debug unit tests we would have to wait a 32.81% more to have complete Windows coverage. That is not good!
In the last two rows of the previous table you can see that if xpcshell was ignored the new worst offender would be Fedora mochitest-4 debug and developers would wait close to 10% less (not really as build times are in favor of Linux) regardless on where we run Windows unit tests (IX/VMs vs minis). This means that Windows would not be anymore on the way to have full platform coverage (not really as the worst build times are for Windows) but Linux.

Xpschell on all platforms
  • How horribly does Windows compare to other platforms when running xpcshell?
Quite bad.
Let's look at the following chart:
You can easily see that every other platform besides Windows ( BLUE ) takes less than 30 minutes. Debug unit tests on the IX machines takes around 40 minutes while on the minis can take up to 100 minutes.
Something makes running xpcshells very very slow on Windows.
All other suites on Windows are not as dramatically as bad as we can see the gains when not considering it (45% gain on debug unit tests completion for Win7).
Ehsan suggested me to determine if xpcshell is going this slow because I/O by looking at the CPU usage that Windows provides.
If it is the reason or not someone needs to look at how to improve it by either fixing some underlying code or breaking xpcshell for Windows into two-three pieces to make it finish on the same range as other test suites.

As you might have noticed, this post is only considering at the times for complete coverage considering all platforms finished at the same time. This is not our reality as each platform takes a different time to finish a build (Windows debug is dramatically slow). This is worth another blog post and will be the basis for improving the left side of the equation (the build times) rather than the right side (the tests times).

This post is to be informative and to help us discover that we can improve the infrastructure even more. After we complete off-loading the builders from unit tests jobs into the minis we have to think of improving the suites, where instead could we run the test jobs, how we can make our builds even faster and others.

We face these new problems because we have stretched our infrastructure and to solve them we will have to reconsider many assumptions and keep on adding tools to allow us make better decisions.

Please don't expect me to do such detailed blog posts as they are quite time consuming. I should finish my goals first!

Questions welcome.

[1] Spreadsheet and charts


Creative Commons License

Wednesday, December 01, 2010

Upcoming switching DEBUG unit tests from Win2003 build machines to Win7 test machines

We have enabled debug unit tests for Windows 7 Rev3 machines.
We are now running debug unit tests on both Win2003 builders and Win7 test machines (except 1.9.1 and 1.9.2).
You can see that we have double coverage for Windows debug unit tests
You should start seeing them on tbpl (except crashtest, reftests and xpcshell). If not check on the tinderbox page and make sure that the builder is "active" and "scrape" is checked as well.

We have enabled this for all branches (except 1.9.1 and 1.9.2) and we will soon adjust the try_chooser parsing to run them on the tryserver as well (it's a bug).

When will we do the switch?
  • when we have no perma-orange
  • when all test suites are shown on tbpl
  • when all working branches are showing them on tbpl
  • when we get approval to do so
If you have any questions, ask them on this blog post.
If you notice any fall-outs comment on bug bug 614956.

NOTE: This will impact slightly our wait times on the Win7 testing pool but reduce the wait times for build jobs.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

How to set up buildAPI locally

I wanted to give a hand to one of our Seneca students (Andrew Singh) that are contributing to release engineering. He is working on creating a report for buildapi.
To do so I had to set up buildapi locally and I was so lucky to be the guinea pig for Mac OS X ;)

Big thanks for ssalbiz and catlee for helping me debug this.

I followed the instructions that ssalbiz wrote https://wiki.mozilla.org/ReleaseEngineering/BuildAPI#Getting_Started:
  • Downloaded MySQL Community Server(Current Generally Available Release: 5.1.53)
  • Add mysql to your path:
export PATH=/usr/local/mysql/bins:$PATH 
echo "export PATH=/usr/local/mysql/bins:$PATH" > ~/.bash_profile
  • Create a new user to use with buildapi. I read this documentation.
mysql --user=root mysql

mysql> CREATE USER 'monty'@'localhost' IDENTIFIED BY 'some_pass';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'monty'@'localhost' WITH GRANT OPTION;
  • Create the databases and import them.
mysql -u -p
mysql> create database schedulerdb; create database statusdb;
mysql> exit
mysql -u -p schedulerdb < schedulerdb.sql
mysql -u -p statusdb < statusdb.sql
  • Setup buildapi:
hg clone http://hg.mozilla.org/build/buildapi
cd buildapi; sudo python setup.py install; cd ..

mkdir dist
cd dist; wget http://google-visualization-python.googlecode.com/files/gviz_api_py-1.7.0.tar.gz
sudo easy_install gviz_api_py-1.7.0.tar.gz
cd ..

paster make-config buildapi config.ini

# edit your config.ini with the right information
# sqlalchemy.scheduler_db.url = mysql://monty:some_pass@localhost/schedulerdb
# sqlalchemy.status_db.url = mysql://monty:some_pass@localhost/statusdb
  • Start buildapi
paster serve --reload --daemon config.ini

NOTE: I believe that if the following is respected you should not hit all the problems that I hit in the next section:
  • Install the 64-bit version of MySQL
  • Install buildapi with "python setup.py install" instead of "easy_install buildapi"

PROBLEMS

I tried to load http://localhost:5000 but it didn't work.

Now I tried it without --daemon and I got this:
armenzg-laptop $ paster serve --reload config.ini
Starting subprocess with file monitor
Traceback (most recent call last):
File "/usr/local/bin/paster", line 8, in
load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')()
File "/Library/Python/2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 84, in run
invoke(command, command_name, options, args[1:])
File "/Library/Python/2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 123, in invoke
exit_code = runner.run(args)
File "/Library/Python/2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 218, in run
result = self.command()
File "/Library/Python/2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/serve.py", line 276, in command
relative_to=base, global_conf=vars)
File "/Library/Python/2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/serve.py", line 313, in loadapp
**kw)
File "/Library/Python/2.6/site-packages/PasteDeploy-1.3.3-py2.6.egg/paste/deploy/loadwsgi.py", line 204, in loadapp
return loadobj(APP, uri, name=name, **kw)
File "/Library/Python/2.6/site-packages/PasteDeploy-1.3.3-py2.6.egg/paste/deploy/loadwsgi.py", line 225, in loadobj
return context.create()
File "/Library/Python/2.6/site-packages/PasteDeploy-1.3.3-py2.6.egg/paste/deploy/loadwsgi.py", line 625, in create
return self.object_type.invoke(self)
File "/Library/Python/2.6/site-packages/PasteDeploy-1.3.3-py2.6.egg/paste/deploy/loadwsgi.py", line 110, in invoke
return fix_call(context.object, context.global_conf, **context.local_conf)
File "/Library/Python/2.6/site-packages/PasteDeploy-1.3.3-py2.6.egg/paste/deploy/util/fixtypeerror.py", line 57, in fix_call
val = callable(*args, **kw)
File "/Library/Python/2.6/site-packages/buildapi-0.1dev-py2.6.egg/buildapi/config/middleware.py", line 37, in make_app
config = load_environment(global_conf, app_conf)
File "/Library/Python/2.6/site-packages/buildapi-0.1dev-py2.6.egg/buildapi/config/environment.py", line 48, in load_environment
scheduler_engine = engine_from_config(config, 'sqlalchemy.scheduler_db.')
File "/Library/Python/2.6/site-packages/SQLAlchemy-0.6.5-py2.6.egg/sqlalchemy/engine/__init__.py", line 272, in engine_from_config
return create_engine(url, **opts)
File "/Library/Python/2.6/site-packages/SQLAlchemy-0.6.5-py2.6.egg/sqlalchemy/engine/__init__.py", line 254, in create_engine
return strategy.create(*args, **kwargs)
File "/Library/Python/2.6/site-packages/SQLAlchemy-0.6.5-py2.6.egg/sqlalchemy/engine/strategies.py", line 60, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/Library/Python/2.6/site-packages/SQLAlchemy-0.6.5-py2.6.egg/sqlalchemy/dialects/mysql/mysqldb.py", line 101, in dbapi
return __import__('MySQLdb')
ImportError: No module named MySQLdb
I will now list the SOLUTION and after that the other route I first took and lead me to nowhere.

For reference, this problem is hit in many different ways by many different people:

SOLUTION

rm -rf /Library/Python/2.6/site-packages/MySQL_python-1.2.3-py2.6-macosx-10.6-universal.egg
easy_install mysql-python
  • This should work now:
python -c 'import MySQLdb'
  • I think it should work now
paster serve config.ini

FAILED ATTEMPT

NOTE: I am just typing this failed attempt for the record and to maybe bring some frustrated people with the same problem to "a solution".

Resuming from ImportError: No module named MySQLdb.

It seems that I am missing the MySQLdb for python:
sudo easy_install mysql-python
which errors on me:
EnvironmentError: mysql_config not found
The problem is that mysql_config in the package's site.cfg file (download the source package if you want to see that file) points to mysql_config = /usr/local/bin/mysql_config
You can fix this in several ways:
  1. Add a symlink "ln -s /usr/local/mysql/bin/mysql_config /usr/local/bin/mysql_config"
  2. Add /usr/local/mysql/bin to your PATH
  3. Download the mysql-python package and modify site.cfg

I thought that installing mysql-python would fix things but I as you can see I can't even do this on python:
armenzg-laptop $ python -c 'import MySQLdb'
Traceback (most recent call last):
  File "", line 1, in
  File "build/bdist.macosx-10.6-universal/egg/MySQLdb/__init__.py", line 19, in
  File "build/bdist.macosx-10.6-universal/egg/_mysql.py", line 7, in
  File "build/bdist.macosx-10.6-universal/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/armenzg/.python-eggs/MySQL_python-1.2.3-py2.6-macosx-10.6-universal.egg-tmp/_mysql.so, 2): no suitable image found.  Did find:
        /Users/armenzg/.python-eggs/MySQL_python-1.2.3-py2.6-macosx-10.6-universal.egg-tmp/_mysql.so: mach-o, but wrong architecture
For the record I also tried this:
pip install -I mysql-python

OTHER

Something that I found to be interesting/nit:
armenzg-laptop $ file $(which python)
/usr/bin/python: Mach-O universal binary with 3 architectures
/usr/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/python (for architecture i386): Mach-O executable i386
/usr/bin/python (for architecture ppc7400): Mach-O executable ppc
armenzg-laptop $ file $(which mysql)
/usr/local/mysql/bin/mysql: Mach-O executable i386
After I installed MySQL 64-bit:
armenzg-laptop $ file $(which mysql)
/usr/local/mysql/bin/mysql: Mach-O 64-bit executable x86_64



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.