This quarter has been a tough one for me. It has been a mix of organizing people, projects and implementing prototypes. It’s easy to forget what you have worked on, specially when plans and ideas change along the way.
Beware! This blog post may be a little unstructured, specially towards the end.
The quarter as a whole
Before the quarter commenced I was planning on adding TaskCluster jobs to Treeherder as my main objective. However, it quickly changed as GSoC submissions came around. We realized that we had two to three candidates interested in helping us. This turned into creating three potential projects. Two projects that came out of it are "refactor SETA and enable TaskCluster support" and "add unscheduled TaskCluster jobs to Treeherder." Both of these bring us closer to feature parity with Buildbot. Once this was settled, a lot of conversations happened with the TaskCluster team to make sure that Dustin's work and ours lined up well (he’s worked on refactoring how the ‘gecko decision task’ schedules tasks).
Around this time a project was completed, which made creating TaskCluster clients an excellent self-serve experience. This was key for me in reducing the number of times I had to interrupt the TaskCluster team to request that they adjust my clients.
Also around this time, another change was deployed that allowed developer’s credentials to assume almost all scopes required to schedule TaskCluster tasks without an intermediary tool with power scopes. This is very useful to create tools which allow developers to schedule tasks directly from the command line with their personal credentials. I created some prototypes to prove this concept. Here’s a script to schedule a Linux 64 task. Here’s the blog post explaining it.
During this quarter, Dustin refactored how scheduling of tasks was accomplished in the gecko decision task. For the project "adding new TaskCluster jobs," this was a risk as it could have made scheduling tasks either more complicated or not possible without significant changes. After many discussions, it seemed that we were fine to proceed as planned.
Out of these conversations a new idea was born: the "action tasks" idea. The beauty of "action tasks" is that they're atomic units of processing that can make complicated scheduling requests very easy. You can read martianwars’ blog post (under “What are action tasks?”) to learn more about it. Action tasks are defined in-tree to schedule task labels for a push. The project as originally defined had a very big scope (goal: make Treeherder find action task definition and integrate them in the UI) and some technical issues were encountered that made me concerned that more would be encountered (i.e., limited scopes granted to developers; this is not an issue anymore). My focus switched to making pulse_actions requests be visible on Treeherder. When switching deliverables I did not realize that we could have taken the first part of the project and just implemented that. In any case, a reduced scope is being implemented by martianwars since, after Dustin's refactoring, we need to put our graph through "optimizations" that determine which nodes need to be removed from the graph. This code lives in-tree and made "action tasks" to be the right solution since it uses in-tree logic where the optimizations code lives.
While working on my deliverables, I also discovered various things and created various utility projects.
New projects
TaskCluster developer experiments (prototypes):
In this repository I created various prototypes that make scheduling tasks from the command line extremely easy.
Replay pulse messages (new package):
This project allows you to dump a Pulse queue into a file. It also allows you to "replay" the messages and process them as if feeding from a real queue. This was crucial to test code changes for pulse_actions.
Treeherder submitter (new package):
Treeherder submitter is a Python package which makes it very easy to submit jobs to Treeherder. I made pulse_actions submit jobs to Treeherder with this package. I had to write this package since the Treeherder client allowed me to shoot my own foot. Various co-workers have written similar code, however the code found was not repackaged to be used by others (understandably). This package helps you to use the minimum amount of data necessary to submit jobs and helps you transition between job states (pending vs. running vs. completed).
Unfortunately, I have not had time to upstream this code due to the end of quarter being upon me, however I would like to upstream the code if the team is happy with it. On the other hand, Treeherder will soon be switching to a Pulse-based submission model for ingestion and the Python client might not be used anymore.
TaskCluster S3 uploader (new package):
This package allows to upload files to an S3 bucket that has a 31 days expiration.
It takes advantage of a cool feature that the TaskCluster team provides.
The first step is that there is an API where you request temporary S3 credentials to the bucket associated to your TaskCluster credentials. You then can upload to your assigned prefix for that bucket. It is extremely easy!
Discoveries
This work by
Zambrano Gasparnian, Armen is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.