What do you do when you have two very different projects that both want to maintain their independence but developers need to land coordinated changes across them?
a quick introduction
I’m currently working on a project that’s outside of my normal realm; “bi-directional” synchronisation between a Mercurial repository and one hosted on GitHub.
But before I dive too deep there’s some terms and processes that I should define.
A large mercurial host repository that stores the majority of the Firefox source.
Called “branches” internally, these are actually separate Mercurial repositories. Reviewed code is landed onto an integration branch (historically mozilla-inbound) which then triggers Firefox’s test suite.
They look after the “tree” (the repository). One task they perform is monitoring the test results on the integration branches, and either merge into mozilla-central if it passes, or backout the commit if it causes failures.
servo, gecko, stylo, and quantum
Servo is a browser written in rust under Mozilla’s research umbrella. It was designed from the ground up to take advantage of modern CPU and GPU architectures. Gecko is the core engine that drives Firefox, from parsing HTML to rendering the page. Stylo is a project to bring Servo’s modern CSS/layout engine into Firefox. Quantum is a large collection of work that plans on delivering significant changes to Firefox’s core engine (aka “the Firefox platform” or just “the platform”).
These take code that has passed review and land them on the target repository. Servo’s autolander “homu” gates landed on a successful run of Servo’s tests (continuous integration, or CI), while Gecko’s autolander “autoland” just transplants code from the review to the integration repository.
Servo is developed on GitHub, while Gecko’s lives in a Mercurial repository. For a multitude of good reasons it was decided that neither moving Servo to Mercurial or Gecko to GitHub would be viable. What we needed was a way for changes made on GitHub be synchronised to Mercurial and vice-versa.
The core problem at hand is it’s fairly common for Stylo patches to touch both Servo and Gecko at the same time. These changes need to land simultaneously on the integration branch in order for the Firefox tests to pass. In the current environment this is not possible. Changes first need to land on Servo via a GitHub pull request, wait for Servo’s tests to run and for homu to land the code (~45 minutes), watch the integration branch for the Servo changes to be overlaid, ignore the build/test failures, and finally push the Gecko changes to fix the code and (hopefully) return tests to a passing state.
Needless to say this is a sub-optimal experience for both developers and sheriffs.
in the beginning - unification
Long before I was involved the plan called for unification of the Servo and Firefox autolanders.
Large parts of homu would be rewritten into Gecko’s autoland, and changes to both GitHub and Mercurial would funnel through this central autolander. It would kick off tests for both Servo and Gecko, and would land to both repositories only if tests on both repositories passed.
This plan was ambitious and relied Gecko’s test suite returning a boolean pass/fail result.
As it happens both of the project’s autolanders were developed rapidly and grew organically as demands grew. As a result working on them can present issues, especially when implementing non-trivial changes. homu doesn’t have any real tests, which adds risk to any large scale changes. Meanwhile autoland’s development environment can be a nightmare at times; look at it the wrong way and you might lose the next two hours of your day getting your environment up and running again.
It became clear that somebody from outside of the autoland team was not going to be able to extend it to merge the two landers without heroic efforts.
gps and glob join the party
At the end of last year, gps and I were called in to provide whatever help we could to this project. gps is an expert in all things source control related, and I was familiar with gecko’s autoland. While gps dove straight into developing servo-vcs-sync (see below), I acted in a supporting role for the existing engineer, however I wasn’t called upon much in that capacity.
My involvement really ramped up when gps took one and a half months of leave at the start of this year, and I was called in to cover for him as best as I could.
servo-vcs-sync is born
gps realised that implementing uni-direction synchronisation from GitHub to Mercurial would be an important first step, and would ease a lot of the issues the Stylo team were having with their manual vendoring in of the servo code. He worked diligently on this and servo-vcs-sync was deployed on the Friday before his long leave.
This system and works well, but it wasn’t without some teething problems; in fact it failed on its first weekend, generating ~10k error emails before the server was shutdown (pans out the failure retry interval was accidentally set to 100ms instead of 5 minutes - ouch).
In the time that gps was away I was able to resolve issues and add some missing functionality and it’s been stable for some time.
a change of ownership
At the same time it was realised that the existing engineer was unable to work on the project and ownership was transferred to me. Development had not progressed beyond the planning stage, which gave me the opportunity to really look at the plan with a fresh set of eyes.
It also became evident that a critical part of this project - the ability for the Firefox’s test suite to return a boolean pass or fail - was much harder than expected, and would not be ready within a reasonable time-frame.
What was needed was a new plan.
a new plan, or two
Being brought into the project late meant was good and bad. I had a lot to learn, but I also didn’t have any preconceived notions of how this should be done; I was able to focus on the end goal.
I set up meetings and spoke with anyone who would listen to gather advice. I drew diagrams, we discussed, I discarded and drew more diagrams. At every increment complexity was shaved off.
For example early designs called for a new service to co-ordinate the two autolanders - the idea was for changes to register patches with the coordinator, which gathered results and ensured patches landed in the correct order across both repositories.
and here were are
After much discussion, especially around how best to manage backouts, we’ve settled on the following plan.
When a change is requested to be landed through autoland the following will happen:
- The list of files modified is checked to determine if files in the servo/ directory are touched
- If this is the case, the request is not landed by autoland, instead it notes the gecko changes then sends a message to the servo-vcs-sync service
- servo-vcs-sync creates a pull request on the servo/servo GitHub repository
- Homu tests and lands the pull request as per the existing process
- servo-vcs-sync is triggered by the changes to the GitHub repository
- Any gecko changes are mixed into the commit before it is pushed to the integration/autoland repository
- A sheriff performs a normal backout on the integration/autoland repository
- The mercurial server notifies servo-vcs-sync via pulse
- servo-vcs-sync creates a pull request on the servo/servo GitHub repository with the highest priority
- Homu lands the PR before any other, including those in-flight
- The change flows through to gecko via the normal GitHub → Mercurial flow
The following diagrams illustrate the flow of data across the systems.
I’m currently working on implementing this, which thus far mostly involves shaving yaks.