Thursday, December 22, 2011

talos.zip, talos.json and you

I have deployed today a small change that modifies how we deploy talos.zip to the performance/talos jobs.

In short what the change does is this:
That's it. Nothing else. Nothing more.

How does this help?
  • Every new talos.zip we place under build.m.o will require a commit on the tree
  • Only when the change lands on the repo that talos.zip will be used
  • A newer talos.zip will only be used from that changeset onwards
  • Any regressions caused by the new talos.zip will be blamed to a changeset on the tree
  • Such changeset can be backed out by anyone without the need of releng
What are other side-effects?
  • We can run a talos.zip through the try server and use compare-talos
  • We don't need a downtime anymore to land a talos.zip
  • The new talos.zip cannot affect any other branches
  • We can run an old changeset with the talos.zip that was used for it
  • We can extend the talos.json file to control other moving parts like plugins
If we could summarize it in one sentence it would be:
"One changeset, one talos.zip"

This different model is not new as Jetpack already had it (jetpack-location.txt).
This model locks every changeset on a tree to a specific state of an external force.
In other words, we can configure parameters from inside tree.

Best regards,
Armen

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=673131


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, December 20, 2011

How to exclude a file from being exported in Narro


  1. Browse to the file
  2. Uncheck "Export with project"
How to exclude a file from being exported in Narro

This is useful when there are files that require explicit approval from L10n release drivers.
We recently changed our search engines for Armenian and a Narro export reverted the changes. This caused Axel to file a regression bug for me to fix it.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, November 22, 2011

Changing how talos.zip gets deployed


For the longest time I have been looking to have time to work on this project to make everyone's life easier. 
Bug 673131 - when minor talos changes land, the a-team should be able to deploy with minimal releng time required

What about if we could limit talos changes as any other change that lands on the tree?




Currently what we do is to download a talos.zip that replaces the old one in one of our build machines. This means that as soon as any job starts a talos job it will grab the newest talos.zip.
To read about some of the problems that this causes you can go at the bottom of this post.

I will tell you what I want to change even though I don't yet know exactly how to do it.

INITIAL DESIGN
  • the talos job downloads a text file:
    • e.g. hg.mozilla.org/mozilla-central/raw-file/abcd1234567/path/to/talos/config/file.json
  • that file will contain the URL of the talos bundle
    • e.g. people.mozilla.com/~armenzg/talos/talos.zip
  • the talos bundle will be downloaded
INITIAL CONCERNS
  • how to prevent a talos.zip to contain malicious code and cause us harm?
    •  anyone with try commit level could tinker with a machine inside of the build network (even though we don't ship anything from such machine)
    • we should find a way to limit this
    • perhaps have this feature only available to a give project branch? A-team branch?
      • we could add a cgi script to upload a talos.zip
      • maybe we should redesign this to just indicate a "revision" and update to it for http://hg.mozilla.org/build/talos
  • it forces to match a build to a given talos.zip
    • this means that if you want to try another talos.zip you will have to push a different changeset to specify a different talos.zip to be used even though the build is exactly the same
I am also afraid that this project could easily start scope creeping knowing how many artifacts we download for talos (e.g. pageloader.xpi, plugins et all).

CURRENT SETUP PROBLEMS
I wanted to have this section in case there was somebody curious about the problems we face with the current setup:
  • no need for downtimes as changes are isolated to a changeset landing
  • a changeset can have some talos jobs test the old talos.zip and the new one
    • this can make some platforms to show a new regression on that changeset and some on the next one. This makes it hard to figure things out
  • a build that started before the talos.zip was deployed can be blamed for causing a regression even though the talos.zip was deployed *after* the talos.zip was deployed
  • a talos.zip change does not show up on the pushlog
    • this means that it can only be noticed if a note is sent to dev.tree-management or if the Maintenance page was updated correctly
    • this means that it can not be backed out by a developer


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Friday, November 18, 2011

My first mobile testing hours

I decided to spend few hours learning the process of submitting Fennec issues and documenting it. I would like to be doing this 30-45 minutes every day to help the project.

I documented my learning process in:
https://wiki.mozilla.org/User:Armenzg/Fennec_Native_UI_testing

For the record, I had to remove the multi fennec build from latest-birch as it was broken.

At the end of this post there are some easy to hit bugs which would save you the time to figure out if it is already filed.

I hope this can help you get started and see that is not hard to do it.

First steps

NOTE: Here's the list of bugs that I've filed or have CCed myself to.

Testing

  • If you hit a bug check the list of know bugs Fennec Native UI list.
    • I normally just use the integrated search (Ctrl + F) that comes with Firefox to search through the list
    • I keep open the list all the time
  • If you don't find it file it by typing the steps you followed plus your expected behavious.
Not tried yet:
  • blassey told me to use "adb am start " to tell from my laptop to my phone to open a URL
  • Note on taking screen shots:
You can take screenshots using the Android SDK; you don't have to root
the device.  The other way is to upgrade to ICS if that is an available option
for you.

The android sdk will install the ddms, in which you can select a device and
then take a screenshot

Issues found

These are issues that I have not yet filed or have not have had time to check if they are valid.
  • On first start up I go to the Firefox Nightly start page and it is crazy wide
  • "Reformat text on zoom" is not "on" by default

Easy to hit bugs

Last updated: Nov. 18th, 2011
  • bug 702633 - painting is broken for google.ca
  • bug 701380 - have a different start up page
  • bug 701594 - Should not be able to stay zoomed out further than the page width
  • bug 700940 - Favicon transparency results in purple background sometimes
  • bug 701797 - "save as PDF" is broken



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, November 01, 2011

How FSOSS was for me

Last week at FSOSS a lot of good things happened for me and writing about it in an organized manner will be a bit challenging :)

As I mentioned last week, I gave a new talk at FSOSS on Saturday morning. I spoke about what is involved on shipping open-source software to millions of users (I will add the presentation in another blog post). I had the opportunity to bring my sweet beloved Veronica with me and according to her "[my] presentation was wonderful! best looking guy there ... and [my] talk wasn't too bad either!... I learned so much (though your handsome face was quite distracting! :) )". A little biased but I got to say that it felt like it came about right and I had a lot of fun.
During Q&A for my presentation.

I had started working on this presentation on Tuesday and had the opportunity to run it through Mike Hoye who had encouraged me to give the talk a month ago. His input was valuable and helped me with some problems I was having.

Besides my presentation there were many things that happened on Thursday, Friday and Saturday.

As soon as I got my bag of swags I realized that most sponsors (and not sponsors) were represented in it (see pic below). I was surprised that I did not see anything from us even though Mozilla was the biggest sponsor of the event.
Mike Hoye giving his presentation.
No Mozilla swag on the welcome bag.


Me, Jeff Griffiths and two Microsoft evangelists

I spent most of my day with Jeff Griffiths at the Mozilla booth. Over there, I realized what big of a brand Mozilla is and the affinity that Seneca students have with us. I was overwhelmed with how many came to us to find out how to contribute or how to take part of an internship.

I also noticed a couple of things while I was at the booth. I noticed there were a couple of non-open-source booths like Microsoft. I guess events like this need sponsors to give money regardless of how much open-source they do. Regardless of this, I went and spent some time with them as I was curious on the Microsoft phone (which felt much better than I had expected). I was really impressed that they had three people and all sorts of cool swags and phones all over. They also had a very cool shirt that says "I love Windows phone" with icon-apps rather than words (see pic). I also like their slogan "Make web not war" which is catchy.

I wish this event was more widely known and have more people coming to it as there are some good talks. Seneca staff and students did an excellent job at running this event. It felt like they do this every day.

I also had an opportunity to meet again many of my former professors and Shaz (I used to work for her as a student ambassador at student services). Tim McKeena (Seneca prof.) was extremely excited to hear my talk and gave me a small gift from Seneca.

I wish I attended more sessions but working on the booth and preparing for the presentation took most of my time.

Keep it up Seneca!


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, October 24, 2011

Mozilla's release process at FSOSS

FSOSS is back and this year I will be there presenting a new presentation. This time I will be talking about the release process at Mozilla.
Last year I presented about Mozilla's Release Engineering infrastructure which is accomplished with many open-source tools but mainly by Buildbot.

I am looking forward to see you there!


On another note, you will have in total four Mozilla co-workers presenting about different topics:
How to ship Open Source software to half a billion users - Armen Zambrano Gasparnian
http://fsoss.senecac.on.ca/2011/node/108

How Web Browsers Work - Ehsan Akhgari
http://fsoss.senecac.on.ca/2011/node/113

Introduction to Mozilla's Add-on SDK - Jeff Griffiths
http://fsoss.senecac.on.ca/2011/node/77

Take control of your TV with XBMC - Lawrence Mandel
http://fsoss.senecac.on.ca/2011/node/58




Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, September 26, 2011

How I got involved with Mozilla's Armenian localization and we shipped it :)

I have been asked how I got involved with the localization process and I would like to have a post so I can always make reference back to it.

It was back at the beginning of 2007 when Dave Humphrey taught during Seneca's study week how to Dive into Mozilla. After that, he encouraged us to take on a project and push it forward. Most of the projects sounded scary, I lacked technical confidence and were not important to me. The only project that caught my attention was to translate Firefox into Armenian. My grandfather had recently started using the Internet and I was hoping to make him happy by making the browser to be in Armenian (his mom's tongue). My major problem was that I did not read/write Armenian so I had to teach myself. The other problem I had was that the localization process was convoluted with wiki pages and lacking a Mozilla supported web tool.  I left the project for a while as I undertook the development of a localization tool.

It was during the summer of 2008 that Robert Sargsyan contacted me and soon after we managed to get him an approver's account on Narro (a web tool for localization). 


During 2009 there was a long silence but Robert kept on working hard for all those months. In September, we setup a Mercurial repository on bitbucket and figured out all issues and imported the translations from Narro.


Early in 2010 we opened a registration for the Armenian team and created a language package for Firefox 3.6 while we waited to receive the approval. On August 2010 the first import landed and on November we managed to make it for Firefox 4's beta 7


Since then you can download Firefox in Armenian in here.


My grandpa never got to use Firefox in Armenian as he passed away before that but I bet many other grandpas will be able to.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Friday, September 09, 2011

Mozilla's Automation Infrastructure explained (DRAFT)

Hi all,
I have previously done a releng brownbag (Dec. post & Apr.'s slide) to help new employees to understand how our infrastructure works. The problem with such presentation is that is hard for people to choose my brownbag during the All Hands when there are such awesome sessions to attend to. Therefore, I created a couple of screencasts in which I give a tour of our infrastructure.

This was a very quick and dirty screencast, I didn't have the right tools (trial version) and many of the diagrams have been reused from April's brownbag which some have become out of date.

Please, please, please, give me all the feedback that you think will make this tutorial much better and clearer.

Without more delay here are the 2 videos:

Direct URL: http://www.youtube.com/watch?v=ahfb94_aaBE
Direct URL: http://www.youtube.com/watch?v=DY6P-uG_ylk

Here is also a list of all URLs used during the screencast in order of appearance:
https://github.com/armenzg/playground/raw/master/mozilla/slides/images/simple%20releng.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/omnigraffle/releng%20simple%20setup.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/omnigraffle/pods.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/omnigraffle/diagrams%20of%20builds%20%28mobile%20included%29.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/omnigraffle/branches.png
https://wiki.mozilla.org/Inbound_Sheriff_Duty
http://hg.mozilla.org/
http://hg.mozilla.org/mozilla-central/
https://tbpl.mozilla.org/
http://perf.snarkfest.net/compare-talos
https://github.com/armenzg/playground/raw/master/mozilla/slides/images/tbpl%20status.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/images/tbpl.png
https://github.com/armenzg/playground/raw/master/mozilla/slides/images/star%20oranges.png
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&tree=Try
https://github.com/armenzg/playground/raw/master/mozilla/slides/images/tree%20status.png
http://graphs.mozilla.org/
http://graphs-new.mozilla.org/index.html
http://graphs-new.mozilla.org/graph.html
http://graphs-new.mozilla.org/graph.html#tests=[[89,1,1],[89,1,14]]&sel=none&displayrange=7&datatype=running
https://wiki.mozilla.org/ReleaseEngineering/TryServer#How_to_push_to_try
http://people.mozilla.org/~lsblakk/trychooser/
http://hg.mozilla.org/try/
https://build.mozilla.org/
http://build.mozilla.org/builds/
https://build.mozilla.org/clobberer/
https://build.mozilla.org/buildapi/self-serve
https://build.mozilla.org/buildapi/self-serve/mozilla-central
http://build.mozilla.org/builds/running.html
http://build.mozilla.org/builds/pending.html
http://build.mozilla.org/builds/pending/
https://build.mozilla.org/buildapi/reports/waittimes
http://brasstacks.mozilla.com/gofaster/#/
http://brasstacks.mozilla.com/gofaster/#/buildcharts
http://brasstacks.mozilla.com/gofaste/buildchart.html?buildid=78856c1ce34b4e85bf23bdc6a887f28c

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Win64 status update

Hi all,
Since my last status update a lot of things have happened:
  • The opt builds are going green
  • The builds are now showing on the developer's dashboard: tbpl
  • All branches have win64 builds (I have added the last few this morning)
    • We have try support for win64 (implied in previous point)
    • We don't have win64 support for aurora, beta, release and 1.9.2
What is it missing to be at par with other operating systems?
  • Testing infrastructure
    • We currently have 5 Win7 64-bit machines and they are only testing mozilla-central (as of today) to keep up
    • The other 50 that we had were repurposed 3 months ago for the other operating systems so we could increase their capacity by 8-10%. It was a tough call but it was necessary to keep up.
  • Debug builds
    • Symbols. It seems we are hitting a Microsoft bug. We will disable them for now
    • Packaging.
  • Symbols for the try server.
Perhaps some people won't agree that we should make the tests visible for mozilla-central since we don't have testing coverage for other branches. Nevertheless I believe that it makes sense that we could have a way of seeing tests failing rather than not at all. If it takes us 3-4 weeks to clone more machines we would be adding test failures without seeing them. I don't think it is asking for too much to try to file a bug for a test failure and carry on (even hide it) if we are not willing to back the change out or to fix the failure. At least we would have a merge to blame for or a range of pushes to help debugging the issues. If you disagree feel free to do what you think is best for everybody.

EDIT: Fixed typo 64-bi instead of 65-bit.
EDIT: For further info please follow the tracking bug.
EDIT: To try the build go to http://nightly.mozilla.org


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, August 22, 2011

Win64 builds status update

Two things:
  • we now have all Win64 build machines cloned
  • we now have all branches with Win64 builds (Try coming soon)
The builds are currently not compiling but khuey and Makoto Kato are tackling each issue until we get back to compiling builds.

Meanwhile I will be setting up a subset of the pool to take care of doing Try builds.

I will give another status update when something new happens.
If you want to be up-to-date on all details please follow bug 558448 [1].
You can read my previous post for background information and

[1] Bug 558448 - (support-win64) [Tracking bug] officially support Windows 64-bit builds


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Wednesday, August 17, 2011

Go Faster: improved download times for test jobs and merged few talos suites together

As part of the Go Faster initiative I have been involved with a couple of bugs that have now recently gone live.

The first one is establishing a p2p link between two of our colos which improved the download times of builds, symbols and test packages to the test slaves. [1]
We can see in the diagram how the average setup time (which download times are part of) has stabilized after IT enabled the p2p link was enabled last Tuesday Aug. 9th in between our collocations.  This means that we have faster download times per run and proportionally more time is spent running tests. This makes our jobs to take less time and increases our pool's capacity. Unfortunately, this also is causing some builds to fail to upload artifacts to ftp (bug 677348). IT has a plan to fix this.

On the other hand, I have merged three talos jobs into only one (this went live yesterday Aug. 16th) [2]. This means that we have removed 2 setup times plus 2 reboot times per push. This is minimal but it gets us started to do more of the same.
Our goal is that every talos suites that takes less than 10 minutes should be joined with other jobs as long as they don't take over 30 minutes. In other words, keep every talos job within 10 and 30 minutes (initial goal).
You probably won't be able to tell that this changed as you will still be able to see the email regressions showing up on the dev.tree-management list. The only difference is that there will be two less "T" jobs per push on tbpl and when you click on a "chrome" talos job you will see the a11y and tscoll suites show up in the summary like this:


Now I will gather new data and determine which other suites are to be merged together.
I previously gathered some data and analyzed but I guess I did not write a blog post but just made a comment in a bug.

Stay tuned for more information!

[1] Bug 661656 - Determine if we can improve the download times between sjc1 and scl1
[2] Bug 659328 - Merge talos suites that finish in less than 10 minutes to improve wait times


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, July 12, 2011

Firefox and Windows 64-bit builds (testing version; not a release version)

We now have a small set of Windows 2008 64-bit slaves ready to be put in our production systems that can generate the 64-bit version of Mozilla Firefox.

NOTE: This is not a released version but a testing version.




I will leave out any talking about when this could be released to our users and just focus on explaining where the project has been and where we are now from a Release Engineering point of view. Releasing depends on evaluating what the problems are on the product side before we would release it.
If you have done any comparison of the pros/cons of the 32-bit and 64-bit version of Firefox running on a 64-bit machine please let me know as I am interested to know.

Try it out
We have been producing Firefox 64-bit nightly builds since last week but we now have a small pool of machines and we are upstreaming the process to production levels.

You can give it a try by downloading the installer.

Help needed
Right now I have several bugs that I need help from developers to get them fixed.

  • bug 671000 - make buildsymbols takes 45 mins rather than 5 mins
  • bug 669384 - make buildsymbols fails for leak test builds
  • bug 670915 - make package fails for leak test builds
  • bug 670697 - sporadic make check failures 
There are many more bugs but the ones I mentioned are the ones that affect releng infrastructure.
We are using a tracking bug for product problems and another one for releng problems.

Background
I started working on this project last year on Q2 and by May 2010 I had some proof of concept going on. To my surprise the media picked up on this and brought a lot of attention to it.
By Q3 we started having problems with OPSI (a system to deploy changes to our machines) and all efforts started little by little shifting towards supporting developers to ship Firefox 4. On Q2 of this year all focus was to adapt to the new fast release cadence.

Nevertheless, by the end of March I had set up a machine to be cloned unto other machines.
Unfortunately the tools that we had in IT were not being able to clone the machine.
IT at that point started to look for a solution and in June we hired digipengi who has Windows experience (for real!).
We worked together and we had to recreate the Windows 64-bit machine from scratch with only one partition rather than three.
We are now at the point that we have 5 production slaves, another 4 to be added soon and we will be cloning the remaining ones in the near future.

If you want to know more CC yourself to the tracking bug.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, June 27, 2011

disabling tp4

As announced on last week's platform meeting, this week we would like to disable tp4 as we have been running tp5 side by side and without issues*.


We have not heard any concerns on disabling tp4 so we will do so on Thursday June 30th.

If you have anything to add please do so at bug 664831.

* The only issues were to adjust tbpl and compare-locales to show tp5 which are fixed


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, April 19, 2011

Load from March 24th to April 19th

Last week I did a post about how high our load was for that day and to let other people know that we are looking into mitigating the bad wait times that have been happening.

We know that we need more slaves but we also know that our masters are hitting edge cases and not being optimal. We now believe that bug 592244 is behind to some chunk of the wasted CPU by running some jobs twice. The problem comes that we have several masters that query a scheduling master and sometimes two jobs are run in two different masters. catlee has done a great job on chasing this and we hope that fixing this issue will improve significantly the wait times (it would have been hard for us without his help to narrow down this issue). If it does not help us enough to get by we will have to go back and chase other edge cases in our masters. Meanwhile IT and releng is still working on getting the next pool of test slaves.

And now back to the load (link to page with raw data):
  • on the 11th we handled 138 pushes across all branches (the day before the aurora merge)
  • try server had a 47.5%, mozilla-central 16.9% and cedar 11.2% (/me looks at ehsan) of the whole load
Conclusions:
  • even though we had the trip to Las Vegas, the all-hands and platform's work week we have had a very high load since we shipped Firefox 4
I wonder what the distribution from April 18th to the end of the month will look like as it would be more representative of what the normal development would be.

For the next post I should only grab weekdays and interpose them to see how things look from week to week.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, April 12, 2011

Yesterday's load

I will do a longer analysis at some point but I would like to share with you a link and a screenshot of it.
These two diagrams show commits over 24 hours (from Mon, 11 Apr 2011 00:00 PDT to Tue, 12 Apr 2011 00:00 PDT) from all of our currently supported project branches. On the first diagram we can see pushes per hour and on the second diagram we can see a distribution of these pushes among the different project branches.

Each one of these commits produce different types of builds and tests. For a given build we can end up queuing up to 14 test suites plus 8 different talos jobs for a given OS.
How easily can the test pool be out of capacity? Three builds of a certain OS finishing around the same time can generate up to 66 testing jobs and take up more than the whole testing pool for that OS (we have 48 to 54 machines per OS) for a variable amount of time. Test jobs can take from 5 minutes to more than 60 minutes depending on the OS and the test suites.

For further information on test times I have some raw data from back in December (out-of-date warning) and three blog posts where I drew conclusions out of it.

This high load of pushes and the conglomeration of pushes (how close they are to each other) make test jobs to be queued and wait to be processed (this can be seen on the daily Wait Time emails on dev.tree-management). We need more machines (and we are working on it) but here are few things that you can do to improve things until then:
  • Use the TryChooser syntax. Spending a moment to choose a subset of build and test jobs for your change helps to use the right amount of resources. If you need all builds and tests do not hesitate to use it all. Note that at some point this syntax will be mandatory.
  • Cancel unneeded jobs. Use self-serve (which shows up on tbpl) to stop running or pending jobs once you know that they are not needed because you pushed something incorrectly or it is going to fall  Once a build or test is not needed please cancel it to free up resources. Everyone will thank you.
There are also things that could be fixed like improving reftests and xpcshell for Win7 but that is not something that everyone can help in a reasonable amount of time.

[EDIT] 4:15pm PDT - I want to highlight that there is going to be a series of blog posts explaining what is the work and new testing machines purchase that we will be undertaking to handle such bad wait times.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Friday, March 18, 2011

How to disable Windows Error Reporting on Windows 2008

Sometimes programs crash on Windows and we all know that Windows might asks us to report back to them or just for us to be notified that something did not work properly.
This is quite good for users but not for automating jobs on machines.

I was setting up a Windows 64-bit machine to run our Firefox builds and I noticed that every time we reached the "make -k check" step the job would hang until it timed out.
I decided to run the step manually and discovered that we would get a prompt for the user to intervene.
jsapi-tests.exe crashed when running "make -k check" and Windows notifies the user

At first ted let me know that it might be related to disabling the JIT debugger (Visual Studio allows you to attach a debugger on programs outside of itself just-in-time!) but I figured out that it was disabled and this was the "post attaching the debugger" message.

I filed a bug to disable the jsapi-tests.exe crash until it gets fixed but soon after I found a post that gave me an idea.
I searched for "prevent stopped working" and I noticed this low-rated comment on stackoverflow that mentions how to disable the "Windows Customer Experience Improvement Program".
This was not what I wanted but it inspired me to look for something that would stop Windows from notifying the users of an error.
I filtered for the word "Error" (because that is what you do on Administrative tools on Windows instead of searching). After looking for a while I found "Prevent display of the user interface for critical errors" and voila! It did the trick.

Here are the steps I followed which I documented on the Win64 reference platform documentation:
  • Run "gpedit.msc"
  • Computer configuration -> Administrative Templates
  • Windows Components -> Windows Error Reporting
  • Set "Prevent display of the user interface for critical errors" to Enabled
From now on this machine won't let the user know that a program crashed and will just carry on.

Happy Windows automation!


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

How to install NSClient++ 0.3.8 on Windows 2008 x64 RC2

I am finally back to work on setting up the Windows 2008 64-bit machine to generate the 64-bit version of Firefox.

At Mozilla's Release Engineering we use Nagios to monitor our build and test machines (among other systems).
The NRPE addon (Nagios Remote Plugin Executor) is designed to allow you to execute Nagios plugins on remote Linux/Unix machines.
In the case of Windows we use NSClient++ which can be used for Nagios as its NRPE for Linux/Unix machines.

Therefore, for the Windows 64-bit machine I installed NSClient++ as we do for the Windows 32-bit machines (bhearsum did this long time ago and gave me a heads up from what he remembered - also thanks to ravi for checking things with me).

Enough background and let me show you what I did.

  • Download NSClient++ 0.3.8 for 64-bit machines
    • I am going to use the installer as it adds the firewall exceptions for me.
  • Start the installer and choose these settings:
    • "Enable common check plugins", "Enable nsclient server (check_nt)", "Enable NRPE Server (check_nrpe)", "Enable WMI checks"
  • Do not start the service and finish the installation
  • Rename C:\Program Files\NSClient++\NSC.ini as NSC.original.ini
  • Checkout mozilla/tools/nagios and copy NSC.ini to C:\Program Files\NSClient++
    • I am reusing the NSC.ini in our Win2k3 machines
    • In fact, the selected settings in the installation have no effect since we replace the NSC.ini but I thought you might be interested on having a rough idea on what to do yourself.
To check that everything went well do the following (from this documentation):
  • Run "C:\Program Files\NSClient++\nsclient++.exe" /test
  • We are going to run the following two checks (see at bottom of this post for output):
    • CheckDriveSize ShowAll MinWarnFree=10% MinCritFree=5% Drive=c:\
    • CheckCPU warn=80 crit=90 time=20m time=10s time=4
To check that everything works well from another machine do the following:
  • Add to allowed_hosts in C:\Program Files\NSClient++\NSC.ini the IP of a linux machine that has the nagios plugins installed
  • Reboot the Windows machine (we want to make sure that everything is on a clean state)
  • From the Linux machine do the following:
cd /usr/lib/nagios/plugins
./check_nrpe -H mw64-ix-slave01 -c check_load
CRITICAL: 1m: average load 0% > critical, 5m: average load 0% > critical, 15m: average load 0% > critical|'1m'=0%;0;0; '5m'=0%;0;0; '15m'=0%;0;0;
./check_nrpe -H mw64-ix-slave01 -c check_buildbot
OK: python.exe: 1
  • Now go back to the Windows machine and restore the allowed_hosts in NSC.ini to its original state
And that's it!
You can now use nagios with your Windows machine!

Here is the output of running |"C:\Program Files\NSClient++\nsclient++.exe" /test| and the two checks:
Launching test mode - client mode
d NSClient++.cpp(1178) Enabling debug mode...
d NSClient++.cpp(551) Attempting to start NSCLient++ - 0.3.8.76 2010-05-27
d NSClient++.cpp(969) Loading plugin: CheckDisk...
d NSClient++.cpp(969) Loading plugin: Event log Checker....
d NSClient++.cpp(969) Loading plugin: Helper function...
d NSClient++.cpp(969) Loading plugin: CheckSystem...
d NSClient++.cpp(969) Loading plugin: CheckWMI...
d \PDHCollector.cpp(73) Autodetected w2k or later, using w2k PDH counters.
d NSClient++.cpp(969) Loading plugin: File logger...
d \PDHCollector.cpp(110) Using index to retrive counternames
l \FileLogger.cpp(93) Log path is: C:\Program Files\NSClient++\\NSC.log
d NSClient++.cpp(969) Loading plugin: NRPE server (w/ SSL)...
d \NRPEListener.cpp(91) Loading all commands (from NRPE)
d \NRPEListener.cpp(121) Starting NRPE socket...
d \PDHCollector.cpp(130) Found countername: CPU:    \Processor(_total)\% Process
or Time
d \PDHCollector.cpp(131) Found countername: UPTIME: \System\System Up Time
d \PDHCollector.cpp(132) Found countername: MCL:    \Memory\Commit Limit
d \PDHCollector.cpp(133) Found countername: MCB:    \Memory\Committed Bytes
d NSClient++.cpp(969) Loading plugin: SystemTray...
d \Socket.h(669) Bound to: 0.0.0.0:5666
e \SysTray.cpp(51) SysTray is not installed (or it cannot interact with the desk
top) SysTray won't be loaded. Run NSClient++ SysTray install to change this.
d NSClient++.cpp(671) NSCLient++ - 0.3.8.76 2010-05-27 Started!
l NSClient++.cpp(455) Using settings from: INI-file
l NSClient++.cpp(456) Enter command to inject or exit to terminate...
CheckDriveSize ShowAll MinWarnFree=10% MinCritFree=5% Drive=c:\
d NSClient++.cpp(1106) Injecting: CheckDriveSize: ShowAll, MinWarnFree=10%, MinC
ritFree=5%, Drive=c:\
d NSClient++.cpp(1142) Injected Result: OK 'OK: c:\: 19.3G'
d NSClient++.cpp(1143) Injected Performance Result: ''c:\ %'=49%;10;5; 'c:\'=19.
32G;3.74;1.87;0;37.47; '
OK:OK: c:\: 19.3G|'c:\ %'=49%;10;5; 'c:\'=19.32G;3.74;1.87;0;37.47;
CheckCPU warn=80 crit=90 time=20m time=10s time=4
d NSClient++.cpp(1106) Injecting: CheckCPU: warn=80, crit=90, time=20m, time=10s
, time=4
d NSClient++.cpp(1142) Injected Result: OK 'OK CPU Load ok.'
d NSClient++.cpp(1143) Injected Performance Result: ''20m'=4%;80;90; '10s'=23%;8
0;90; '4'=22%;80;90; '
OK:OK CPU Load ok.|'20m'=4%;80;90; '10s'=23%;80;90; '4'=22%;80;90;


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Thursday, March 10, 2011

Automated Firefox XP debug unit tests are now live

As of this morning, we now have automated debug unit tests for Windows XP :)
TBPL showing the row for XP debug unit tests




We now have all desktop platforms at parity.



This change was enabled in bug 614955 and again many thanks goes to ted for helping when I was almost loosing it.

Kudos goes to philor for checking that the results of the test runs were good.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, January 18, 2011

L10n script

I tried to help a friend to get his locale in shape and removed the dust of a script I wrote for Armenian a year ago.
Here is for you to use it or to learn the process!
#!/bin/bash
# Script:  generate-locale.sh
# Author:  Armen Zambrano Gasparnian
# Contact: armenzg@mozilla.com
# Purpose: Repackage a locale in hg
# Date:    Jan 13th, 2010

# NOTE:
#    If you run this script and you reach the step "make installers-$LOCALE"
#    you can skip running this script and just run these subset of steps:
#      cd $BASE_DIR/$BRANCH/browser/locales
#      PYTHONPATH=../../../compare-locales/lib python ../../../compare-locales/scripts/compare-locales -m merged l10n.ini ../../../l10n $LOCALE | tee ../../../compare-locales.log
#      make installers-$LOCALE LOCALE_MERGEDIR=$PWD/merged; cd -
#    You want to add new files and do modifications of your locale in:
#      $BASE_DIR/l10n/$LOCALE  

# Change it to your locale
export LOCALE='x-testing'

set -ex
export BASE_DIR=`pwd`
export L10N_HG_SERVER='http://hg.mozilla.org/l10n-central'
export BRANCH='mozilla-central'
export EN_US_REPO='http://hg.mozilla.org/$BRANCH'
export EN_US_BINARY_URL="http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/"
export REVISION='default'
# We don't really disable webm but we just bypass a check in the confgure step that we don't need
export CONFIGURE_ARGS='--enable-application=browser --with-l10n-base='$BASE_DIR'/l10n --disable-webm'


### 1) Clobber previous run
rm -rf $BRANCH/dist/install
rm -rf $BRANCH/dist/*$LOCALE*

### 2) Checkout the browser repo
# if checkout does not exists
[ -d $BRANCH ] || hg clone $EN_US_HG_SERVER/$REPO_PATH ; 
hg -R $BRANCH pull -r default 

### 3) Checkout the locale repo
mkdir -p $BASE_DIR/l10n
cd $BASE_DIR/l10n
# if we don't have the locale clone it
[ -d $LOCALE ] || hg clone $L10N_HG_SERVER/$LOCALE ; 
hg -R $LOCALE pull -r default

### 4) Let's generate a "merged" directory with compare-locales
cd $BASE_DIR 
rm -rf compare-locales
hg clone http://hg.mozilla.org/build/compare-locales compare-locales
cd compare-locales; hg up -C -r RELEASE_AUTOMATION; cd ..
cd $BASE_DIR/$BRANCH/browser/locales
# a directory called "merged" will be generated under browser/locales
PYTHONPATH=../../../compare-locales/lib python ../../../compare-locales/scripts/compare-locales -m merged l10n.ini ../../../l10n $LOCALE | tee ../../../compare-locales.log

### 5) Setup
cd $BASE_DIR/$BRANCH
autoconf-2.13
cd js/src && autoconf-2.13 && cd ../..
./configure $CONFIGURE_ARGS 
make -C config
# get the latest en-US and unpack it
make -C browser/locales wget-en-US
make -C browser/locales unpack;

make -C nsprpub
make -C modules/libmar
# 6) generate the xpi and the installers
cd browser/locales; make installers-$LOCALE LOCALE_MERGEDIR=$PWD/merged; cd -
# 7) list the packages in the correct place, the correct naming and the correct chmod
cd $BASE_DIR
mv $BRANCH/dist/*hy-AM* $BRANCH/dist/install/*xpi .


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Monday, January 10, 2011

Reftests and xpcshell test suites run slow on Windows 7 machines

I previously blogged about xpcshell being extremely slow on Windows 7 and jimm was able to fix something which significantly reduced the time it took. In that post I compared two different operating systems one on Mac minis and the other on a mix of VMs and IX fast hardware machines.

This time I am comparing Windows XP against Windows 7 running on the same base hardware (Mac minis - dual core 2.26GHz CPU). The comparison shows that the xpchsell and reftests test suites run significantly slower on Windows 7 than Windows XP.
Comparison of performance and test suites between Windows XP and Windows 7

If you believe you can give a hand write a comment on bug 617503.

Link to raw data.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

XP optimized unit tests enabled for Minefield

Last week we started to run unit tests on Windows XP for every Minefield build we generate.
This change excludes the branches mozilla-1.9.1 and mozilla-1.9.2 where we will still be running them on Win 2003 machines.

There are several permanent oranges that philor and dholbert have filed.
All of them are just affecting the reftests suite, hence, it is hidden until they all are tackled.
Once everything is perma-orange-free we will discontinue running unit tests on Windows 2003 machines as it will improve wait times on the builder machines.

We can now see XP unit tests on tbpl.mozilla.org




Please if you find any new permanent oranges feel free to file them and add them under:
 Big thanks (again) to philor and dholbert for filling the known bugs.

 


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.