Wednesday, June 25, 2008

SourceStamp in application.ini (it helps the L10n process)

As I mentioned in my previous blog post, when we do L10n repackages it happens by unpackaging an en-US build and overwriting the en-US dtd and property files with the ones of that locale.

Part of this process requires checking out part of the en-US code and the current problem is that we do NOT checkout the same Source Stamp than what was used to generate the en-US build, since we do not really have a way to know what was the Source Stamp used.

There is a bug which I started to work on last week and the patches would allow us to have the Source Stamp in application.ini if a MOZ_CO_DATE variable is set when checking out the source code. Therefore, if we have this working we will have it easy to know what Source Stamp was used for an en-US build and we will be able to check out exactly the same Source Stamp when doing L10n repackages and we should not be unsynchronized at all.

NOTE: The previous patches mentioned do not change anything at all if no MOZ_CO_DATE is set.

BuildID do not be precise!! - part 1 (in relation with l10n/build)

What is the BuildID? The BuildID represents the date and hour a build was generated. This allows us to identify a build; we could think of it as the Serial Number of a product which allows people to know things as in which factory and/or what date was produced.

When you select "about firefox" you can see the BuildID in there:
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9) Gecko/2008061004 Firefox/3.0"

If you look at that number in bold, you can tell that the number represents a date YYYYmmDD and it has an extra 2 digits which represent the hour (HH) in which a build was generated.

This BuildID can also be found in the file application.ini that comes with your Firefox's installation:
[App]
Vendor=Mozilla
Name=Firefox
Version=3.0
BuildID=2008061004


One of the problems in here is that 2 different builds with (or without) different source code can have the same buildID if they happen inside the same hour. In the past, we were not able to generate two builds in less than an hour but nowadays we can and it happens that we can have to different builds with the same BuildID

NOTE: There are two bugs in which the meaning and the precision of the BuildID is being discussed. Bugs 431270 and 431905

L10n and the BuildID
We have lived many years with this and only this value. We have downloaded the binaries for years and we only had the reference on when that binary was created but nothing at all related to which Source Stamp, which files have been compiled in this build. We could only tell by looking at tinderbox logs and others.

There is people who needed to know what Source Stamp was used to generate that build and one of them is the L10n community, you knew it or not.

When we do generate the L10n repackages, we do download an en-US build, we checkout the en-US code and we do a configuration before running the repackage of the locale. Currently, we do checkout the latest en-US code BUT not the code that was used to generate that en-US build. This could mean that there can be several hours of difference in the Source Stamp and therefore we generate locales that are NOT exactly as the en-US build.

NOTE: This does not happen on releases since the code for en-US is frozen, but it does happen for nightly L10n repackages

Axel showed me a solution which reduces this gap between "the en-US Source Stamp used to generate the en-US build" and "the en-US Source Stamp used to generate an L10n repackage". The solution estimates out of the BuildID which could haven been the Source Stamp for that build by saying "let's assume it checked out at the beginning of the hour (Minute Zero of the hour)". This is better than what we have now since a BuildID that says the hour is HH, it could be a checkout of the Source Stamp HH:00 or HH:59

It is needed that we fix this and in the follow up blog post I will mention what I have been working on and as you might assume already it has to do by adding the Source Stamp to a binary.

Monday, June 23, 2008

l10n repackaging - part 5 - introduction

This is the first time you are reading one of my blogs posts and you will be able to hear on this Monday's meeting talking briefly about the project I am working on related to change the way we do l10n repackages.

Introduction
The way localizations are generated are by: 1) downloading an en-US binary, 2) unpack it and 3) repack it by replacing certain files from the specific locale.
The way we do repackages are in dedicated machines and we are trying to move it over the buildbot infrastructure to make it scalable and do them on our generic buildslaves

Key concepts for the next weeks
I have done several blogs posts over the past weeks (1, 2, 3, 4) and I will just talk what my attention is at right now:

Nightly repackages
In this scenario we have an en-US build being generated at certain hour of the night, checking out a specific source stamp and after it finishes we can continue by checking out the same source stamp that we are going to use to do l10n repackages. For now, I want this code into the staging configuration we have since it involves 2 schedulers (Nighlty and DependentL10n) and it can be done with one machine until we find the proper way to pass the source stamp from one scheduler to the next one.

Dependent repackages
We generate dependent en-US builds continually and it can take less than 6-8 minutes to generate one of them, therefore, we do not really have enough time to keep up with that rhythm with l10n repackages (in my local machine it takes around 1-2 mins to get ready and 30 secs to repackage each locale and with 50 locales it takes a total of 25 mins) unless we had 4-6 slaves for l10n repackages and that is just thinking of linux or mac machines, the windows slaves take way longer than the previous ones.
I am thinking that we would want to do l10n repackages every hour and the source stamp would most likely have to be different from the dependent en-US that we would download (unless we have the en-US source stamp somewhere in the application.ini), but for now it should do

Release repackages
In our release automation system we already do l10n repackages in the generic slaves but it still does it by using the code used in tinderbox client which is not scalable.
Changes in here will require to check out the same tag as the en-US build does

Bootstrap configuration
There is a lot of configuration that is read in our release automation code from "bootstrap configuration" files which contain information as which tag to checkout, where to upload the builds, which tree to report on tinderbox, etc... and this information will be needed in the process of doing l10n repackages on buildbot. The best solution is to have a factory (or a step) to parse this information and add it as a property to the build object that gets passed from step to step on buildbot and read them to be able to use them at the proper time.

Major problems

  1. Make sure that multiple slaves run preparatory steps. Currently, the only way I have found is to have a builder per slave under a Dependent scheduler to make sure that all slaves run these steps but this has its problems, which is that if a slave disconnects (does not fail) it will not trigger the next DependentL10n scheduler
  2. Pass build properties from Scheduler to Scheduler which I thought we would not need but it seems like it. I might be able to "cheat" by passing the checkout time in the application.ini file or by passing text files to the slaves with the desired information

Tuesday, June 10, 2008

l10n repackaging - part 4 - packager.mk and rewriting goals

Accomplished

  • I have finished all that it is required to generate a locale's dmg file and its associated xpi (an add-on to change the language of your browser) under a buildbot configuration
  • I have filed a bug: "Bug 438240 - packager.mk does not mount image automatically"
    • This line used to be "good enough" to answer in an automated way to the question: "do you want to mount this image?"
      echo Y | hdiutil attach -readonly -mountroot /tmp -private -noautoopen $(UNPACKAGE) > hdi.output;
      but it is not good enough at least for Leopard and according to Axel for Tiger (We have decided to open a zoo instead of continuing with our l10n work in Mozilla ;)
    • "My fix" (which is based in the scripts from Axel and Alice) is an expect script which feeds answers to an spawned process (hdiutil) which in reality would have expected a human to reply to the questions
    • I have added a 2 line patch to fix this but I have no rush to get it in the tree
  • I have tested something John wanted me to try and found that it did not work as he wanted.
    • I started my buildbot configuration with 3 slaves at the same time
    • The 1st slave picks "af" as the locale to work on
    • The 2nd slave picks "ar" as the locale to work on
    • The 3rd slave picks "be" as the locale to work on
    • I killed the 2nd slave -- therefore "ar" was not completed
    • When the 1st slave was done it picked up "bg" as the next locale to work on instead of the unfinished "ar"
    • As you can see the locale "ar" would have never been processed in this run
  • John says that he worked at some point with buildbot and saw that unfinished BUILDS where returned to the queue and taken by the next available slave. Since my Build and PeriodicL10n classes are "custom-made", they might be some features missing

Rewriting tasks/goals

Here is the list of problems/tasks/goals that I have, the type of problem that they are and the priority that they should be given:
  • P1-DEFECT - Use buildbot sendchange to FORCE the processing of a single locale
  • P2-DEFECT - Use same timestamp as en-US
  • P3-DEFECT - If locale in process and proccess not completed it should be put back into the queue so it can be reassigned
  • P3-CONFIG - Triger l10n repackages after succesful en-US build
  • P3-PERF - How to deal with common steps? They could be executed only once before processing the first locale
  • P3-PERF - When to clobber? There are various scenarios to consider
  • P4-CONFIG - A lot of small steps should be unified under a module in buildbot custom
  • P4-CONFIG - Add configuration to staging 1.8
  • P4-CONFIG - WGET - what is the exact URL of the latest en-US to get?
  • P4-CONFIG - compare-locales.py
  • P4-CONFIG - l10n verify - I have seen that being run in production, more research to be done
  • P4-CONFIG - Push && Announce - After each locale is processed, they have to be pushed to an FTP server and being announced via email or other

What does these priorities mean? I see 2 types of problems in here.
  1. Problems that require research, time and a lot of thinking since they are not implemented anywhere else OR I do not know how to do it
  2. Problems that require less brain power and it is just a matter of putting pieces together of concepts that are being used somewhere else

The first type of problems are the ones as priority 1 or 2, which have more of my attention and dedication. The other ones could be easily solved by anyone on Mozilla without too much effort.

There are more problems to be solved in l10n-build but this list just narrows it to what is required to move from the tinderbox infrastructure to buildbot.

Any feedback and questions are welcomed.

Monday, June 09, 2008

l10n repackaging - part 3 (it feels so goooood to have a solution)

I started this past week to work on the code (which is highly inspired on Axel's work for the l10n build processes) that will allow us to distribute l10n repackages between slaves to do repackages of all the locales we have.

In my previous blog post I was worried on how to get the master to have the latest information of all the locales we have without a) having to restart a buildbot master to grab the latest list of locales and b) not doing extremely hacky things with buildbot.

I wanted to try a couple of things by making the slaves do some work as you can see in this quote from last post:
How can we change this?
  • An initial slave could do some "thinking" and notify "someone" (an object) in the master which locales to be repackaged
  • An initial slave checks out a file, uploads it to the master and somehow notify the master to reconfigure itself
BUT I realized that I should go to the moment that all the build requests are generated and just before that get the latest all-locales file from the repository.

The "good-enough" solution

  def getLocales(self):
"""It checks out the all-locales file"""
Thanks to bsmedberg in this one
tuple = subprocess.Popen(
['cvs', '-d:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot',
'co', '-p', 'mozilla/browser/locales/all-locales'],
stdout=subprocess.PIPE).communicate()
list = tuple[0].split('\n')
if list[-1]=='':
list = list[0:-1] #get rid of the last one
return list

def doPeriodicBuild(self):
if self.locales:
locales = self.locales
else:
locales = self.getLocales() <-- We get a list with the latest list of locales

#look line
for locale in locales: <-- We create a non-merge-able
build request per locale in the list
obj = PeriodicL10n.BuildDesc(locale)
#print obj.locale
self.queue.insert(0, obj)
bs = buildset.BuildSet(self.builderNames,
#SourceStamp(branch=self.branch),
PeriodicL10n.NoMergeStamp(branch=self.branch),
self.reason)

self.submit(bs)

What do we have solved so far

  • By using NoMergeStamp, we have build requests that do not get merged by buildbot
  • The function getLocales() will always generate the full list of all-locales without doing anything hacky with buildbot and always generate the right amount of buildrequests
  • In the function PeriodicL10n.BuildDesc(locale) we generate objects that contain a property "locale" which in a later process gets passed to a Build object and therefore we can use WithProperties("l10n/%(locale)s") which is a class that will generate a string with values extracted from the current build object of a step. For example:
    l10n_depBuildFactory.addStep(ShellCommand(
    command=["cvs","-q","-z3","-d",cvsl10nroot,"co",WithProperties("l10n/%(locale)s")])
  • We have a queue with Builds that are taken every time there is a slave available, therefore, the more slaves we have ---> the shorter it takes to do the whole process

Great relief

I was really frustrated just before I reached the previous solution because I did not want to spend what it was going to be a lot of "trial and fail" with the different options which could have led to very complicated solutions or dead end roads.
I am glad that I did got rid of what it was for me the biggest bug of my project and now I can dedicate my self to put all the pieces of my research into a bunch of steps in buildbots which should be able

What is to come ...


I still haven't received any feedback but that is fine because I still have to continue working on what the l10n repackage of a single locale involves, which I will describe in a later post, but for now let me list what is in my mind of things left to be done:
  1. Write and test the set of steps that generate a single locale (I am half way through)
  2. Research what the push and announce steps do (this needs further explanation)
  3. There are common steps to all locales that a slave has to do. This means that a slave might be repeating the same task from one locale to another one. It might be interesting to keep some of the work done for a previous locale for the current one. 1) checkout of firefox's code, 2) make configure step, 3) download of latest en-US and 4) partial compilation of some objects
  4. It is really important to check out the same timestamp of firefox's code for all locales. I have an eye on "Trigering Schedulers" and the last comment on it: "This is useful to ensure that all of the builds use exactly the same SourceStamp, even if other Changes have occurred while the build was running."

NOTE: You can see an image on the right side that shows which step I am at right now. "make installers" of a single locale. Soon we should see it green as well

Tuesday, June 03, 2008

l10n repackaging - part 2

After 5 weeks in my internship and 2 weeks after my meeting with Axel Hecht. I have started to get some results that help us understand the problem that I am facing.

This is the bug that I am working on: "Bug 434878 - change how automation does l10n repacks" and here is the tar ball (you must have buildbot and others installed - here is a link with a script to help you out) that has a script that starts everything for you so you can see it running.

NOTE: I am reducing each section to focus in one part of the major problem, which will be mentioned in another blog posts

Main goal
  • Change how l10n (localization) repackages happen
Reasons
  • Not scalable. Currently we have to do ALL repackages even if we just want one
  • Code difficult to follow
  • Not easy to do big changes on how things get done
Ideas
  • Use buildbot to help us
  • Parallel the repackages. This will help especially to improve the process in Windows machines since it takes forever and a little more to finish the whole chunk of repackages
Obstacles

"Hello, I am a slave. What do you want me to do?"
Buildbot has a master who tells slaves what to do, but the issue is that the master gets configured at startup. Therefore for now:
  • I have the list of locales hard-coded in the master's configuration file (this is not 100% true, but let's pretend it is)
Having the list of locales hard-coded means that I have to modify the configuration file and reconfigure the master again, which is not a good idea since there can be other builds happening AND more important than that is that it won't be automated but it will dependent on a human to logging to the master and reconfigure it.

How can we change this?
  • An initial slave could do some "thinking" and notify "someone" (an object) in the master which locales to be repackaged
  • An initial checks out a file, uploads it to the master and somehow notify the master to reconfigure itself
We will try to discard some options, find new ones and even try the "hacky" ones.
Using a slave, as Axel comments in the bug, is not appropriate for a slave to be saying what has to be built and what not.

This is the point where Axel and others have tried to bring a solution but the technology (buildbot) reaches its boundaries. Unless we have completely forgotten about a feature, we will have to come with a way around

armenzg: Mozilla internship (50+ kms on a broken bicycle)

You know how lazy I am so I will be reusing my friend's blog post:
- http://backinblakk.blogspot.com/2008/06/32-mile-ride.html

Yes, I did it and I have to give thanks to Lukas for being with me in this great adventure

Aside of that, last weekend I spent in L.A., which was an amazing time at my uncle's home and meeting all the people from the church in L.A.

I did various things from seeing snow, driving GoKarts, watch a play, a movie and have an amazing "futbol" game (in which I scored a goal like my friend Hacheek!).

"oh yeah baby, I can't wait to see SATC; I have been waiting all year to watch it"