Armen Zambrano's battlefield: April 2016

Friday, April 22, 2016

The Joy of Automation

This post is to announce The Joy of Automation YouTube channel. In this channel you should be able to watch presentations about automation work by Mozilla's Platforms Operations. I hope more folks than me would like to share their videos in here.

This follows the idea that mconley started with The Joy of Coding and his livehacks.
At the moment there is only "Unscripted" videos of me hacking away. I hope one day to do live hacks but for now they're offline videos.

Mistakes I made in case any Platform Ops member wanting to contribute want to avoid:

Lower the music of the background music
Find a source of music without ads and with music that would not block certain countries from seeing it (e.g. Germany)
Do not record in .flv format since most video editing software do not handle it
Add an intro screen so you don't see me hiding OBS
Have multiple bugs to work on in case you get stuck in the first one

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Sunday, April 17, 2016

Project definition: Give Treeherder the ability to schedule TaskCluster jobs

This is a project definition that I put up for GSoC 2016. This helps students to get started researching the project.

The main things I give in here are:

Background

Where we came from, where we are and we are heading towards

Goal

Use case for developers

Breakdown of components

Rather than all aspects being mixed and not logically separate

NOTE: This project has few parts that have risks and could change the implementation. It depends on close collaboration with dustin.

-----------------------------------

Mentor: armenzg

IRC: #ateam channel

Give Treeherder the ability to schedule TaskCluster jobs

This work will enable "adding new jobs" on Treeherder to work with pushes lacking TaskCluster jobs (our new continuous integration system).

Read this blog post to know how the project was built for Buildbot jobs (our old continous integration system).

The main work for this project is tracked in bug 1254325.

In order for this to work we need the following pieces:

A - Generate data source with all possible tasks

Bug 1232005 - Let the gecko decision task generate a file with all the possible TaskCluster jobs that could have been scheduled for a given push

We will need to post this artifact into the TC index so we can fetch it
We will probably need to create a "latest" alias

RISK: The structure of graphs is going to change; the artifact will change in format:

Alternative mentor for this section are: garndt

B - Teach Treeherder to use the artifact

Fetch that artifact for every tree and update the "runnable api"

RISK: The structure of graphs is going to change; the artifact will change in format:

This will require close collaboration with Treeherder engineers
This work can be done locally with a Treeherder instance
It can also be deployed to the “staging” version of Treeherder to do tests
Alternative mentors for this section is: camd

C - Teach pulse_actions to listen for requests from Treeherder

pulse_actions is a pulse listener of Treeherder actions
You can see pulse_actions’ workflow in here
Once part B is completed, we will be able to listen for messages requesting certain TaskCluster tasks to be scheduled and we will schedule those tasks on behalf of the user
RISK: Depending if the TaskCluster actions project is completed on time, we might instead make POST requests to an API

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Project definition: SETA re-write

As an attempt to attract candidates to GSoC I wanted to make sure that the possible projects were achievable rather than lead them on a path of pain and struggle. It also helps me picture the order on which it makes more sense to accomplish.

It was also a good exercise for students to have to read and ask questions about what was not clear and give lots to read about the project.

I want to share this and another project definition in case it is useful for others.

----------------------------------

We want to rewrite SETA to be easy to deploy through Heroku and to support TaskCluster (our new continuous integration system) [0].

Please read carefully this document before starting to ask questions. There is high interest in this project and it is burdensome to have to re-explain it to every new prospective student.

Main mentor: armenzg (#ateam)

Co-mentor: jmaher (#ateam)

Please read jmaher’s blog post carefully [1] before reading anymore.

Now that you have read jmaher’s blog post, I will briefly go into some specifics.

SETA reduces the number of jobs that get scheduled on a developer’s push.

A job is every single letter you see on Treeherder. For every developer’s push there is a number of these jobs scheduled.

On every push, Buildbot [6] decides what to schedule depending on the data that it fetched from SETA [7].

The purpose of this project is two-fold:

Write SETA as an independent project that is:

maintainable
more reliable
automatically deployed through Heroku app

Support TaskCluster, our new CI (continuous integration system)

NOTE: The current code of SETA [2] lives within a repository called ouija.

Ouija does the following for SETA:

It has a cronjob which kicks in every 12 hours to scrape information about jobs from every push
It takes the information about jobs (which it grabs from Treeherder) into a database

SETA then goes a queries the database to determine which jobs should be scheduled. SETA chooses jobs that are good at reporting issues introduced by developers. SETA has its own set of tables and adds the data there for quick reference.

Involved pieces for this project:

Get familiar with deploying apps and using databases in Heroku
Host SETA in Heroku instead of http://alertmanager.allizom.org/seta.html

https://bugzilla.mozilla.org/show_bug.cgi?id=1253020

Teach SETA about TaskCluster

https://bugzilla.mozilla.org/show_bug.cgi?id=1243123

Change the gecko decision task to reliably use SETA [5][6]

If the SETA service is not available we should fall back to run all tasks/jobs

Document how SETA works and auto-deployments of docs and Heroku

Write automatically generated documentation
Add auto-deployments to Heroku and readthedocs

Add tests for SETA

Add tox/travis support for tests and flake8

Re-write SETA using ActiveData [3] instead of using data collected by Ouija

https://bugzilla.mozilla.org/show_bug.cgi?id=1253028

Make the current CI (Buildbot) use the new SETA Heroku service

https://bugzilla.mozilla.org/show_bug.cgi?id=1252568

Create SETA data for per test information instead of per job information (stretch goal)

On Treeherder we have jobs that contain tests
Tests re-order between those different chunks
We want to run jobs at a per-directory level or per-manifest

Add priorities into SETA data (stretch goal)

Priority 1 gets every time
Priority 2 gets triggered on Y push

[0] http://docs.taskcluster.net/

[1] https://elvis314.wordpress.com/tag/seta/

[2] https://github.com/dminor/ouija/blob/master/tools/seta.py

[3] http://activedata.allizom.org/tools/query.html

[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1243123

[5] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=gecko

[6] testing/taskcluster/mach_commands.py#l280

[7] http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla-tests/config_seta.py

[8] http://alertmanager.allizom.org/seta.html

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Wednesday, April 13, 2016

Improving how I write Python tests

The main focus of this post is about what I've learning about writing Python tests, using mocks and patching functions properly. This is not an exhaustive post.

What I'm writing now is something I should have learned many years ago as a Python developer. It can be embarrassing to recognize it, however, I've thought of sharing this with you since I know it would have helped me earlier on my career and I hope it might help you as well.

Somebody has probably written about this topic and if you're aware of a good blog post covering this similar topic please let me know. I would like to see what else I've missed.

Also, if you want to start a Python project from scratch or to improve your current one, I suggest you read "Open Sourcing a Python Project the Right Way". Many of the things he mentions is what I follow for mozci.

This post might also be useful for new contributors trying to write tests for your project.

My takeaway

These are some of the things I've learned

Make running tests easy

We use tox to help us create a Python virtual environment, install the dependencies for the project and to execute the tests
Here's the tox.ini I use for mozci

If you use py.test learn how to not capture the output

Use the -s flag to not capture the output
If your project does not print but instead it uses logging, add the pytest-capturelog plugin to py.test and it will immediately log for you

If you use py.test learn how to jump into the debugger upon failures

Use --pdb to using the Python debugger upon failure

Learn how to use @patch and Mock properly

The theory of how to mock is explained very well in "Using Mocks in Python"
Learning where to @patch is golden. Read "Where to patch"

How I write tests

This is what I do:

If no tests exists for a module, create the file for it

If you're testing module.py create a test called test_module.py

If you already have tests but want to add coverage to a function, determine what is the minimal py.test call to only call the test or set of tests

You don't want to run all tests when you're developing
You can read about it in "Specifying tests/selecting tests".

@patch properly and use Mocks

What I'm doing now to patch modules is the following:

What function are you testing? (aka test subject)

Have a look at the function you're adding tests for and list which functions it calls (aka test resources)

Which of those test resources do you need to patch?

To patch the test resources I use @patch + I change the return_value. You can see an example in test_buildbot_bridge.py. I use two different style of patching if you're interested
I normally change test resources which hit the network (controlled environment) or that I can make the test execution faster
You can have pieces of code that are shared between tests to avoid duplicating mocking code

Determine if you need to Mock objects and function calls

See in here an example where I needed to mock a Push object

The way that Mozilla CI tools is designed it begs for integration tests, however, I don't think it is worth doing beyond unit testing + mocking. The reason is that mozci might not stick around once we have fully migrated from Buildbot which was the hard part to solve.

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, April 12, 2016

mozci-trigger now installs with pip install mozci-scripts

If you use mozci from the command line this applies to you; otherwise, carry on! :)

In order to use mozci from the command line you now have to install with this:

pip install mozci-scripts

instead of:

pip install mozci

This helps to maintain the scripts separately from the core library since we can control which version of mozci the scripts use.

All scripts now lay under the scripts/ directory instead of the library:
https://github.com/mozilla/mozilla_ci_tools/tree/master/scripts

This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.