We know that we need more slaves but we also know that our masters are hitting edge cases and not being optimal. We now believe that bug 592244 is behind to some chunk of the wasted CPU by running some jobs twice. The problem comes that we have several masters that query a scheduling master and sometimes two jobs are run in two different masters. catlee has done a great job on chasing this and we hope that fixing this issue will improve significantly the wait times (it would have been hard for us without his help to narrow down this issue). If it does not help us enough to get by we will have to go back and chase other edge cases in our masters. Meanwhile IT and releng is still working on getting the next pool of test slaves.
And now back to the load (link to page with raw data):
- on the 11th we handled 138 pushes across all branches (the day before the aurora merge)
- try server had a 47.5%, mozilla-central 16.9% and cedar 11.2% (/me looks at ehsan) of the whole load
- even though we had the trip to Las Vegas, the all-hands and platform's work week we have had a very high load since we shipped Firefox 4
For the next post I should only grab weekdays and interpose them to see how things look from week to week.
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
No comments:
Post a Comment