Building packages at scale
tl;dr We are able to build 14,000 packages across 6 zones in 4.5 hours
At Joyent we have long had a focus on high performance, whether it’s through innovations in SmartOS, carefully selecting our hardware, or providing customers with tools such as DTrace to identify bottlenecks in their application stacks.
When it comes to building packages for SmartOS it is no different. We want to build them as quickly as possible, using the fewest resources, but without sacrificing quality or consistency.
To give you an idea of how many packages we build, here are the numbers:
Branch | Arch | Success | Fail | Total |
---|---|---|---|---|
2012Q4 | i386 | 2,245 | 20 | 2,265 |
2012Q4 | x86_64 | 2,244 | 18 | 2,262 |
2013Q1 | i386 | 2,303 | 40 | 2,343 |
2013Q1 | x86_64 | 2,302 | 39 | 2,341 |
2013Q2 | i386 | 10,479 | 1,277 | 11,756 |
2013Q2 | x86_64 | 10,290 | 1,272 | 11,562 |
2013Q3 | i386 | 11,286 | 1,317 | 12,603 |
2013Q3 | x86_64 | 11,203 | 1,308 | 12,511 |
2013Q4 | i386 | 11,572 | 1,277 | 12,849 |
2013Q4 | x86_64 | 11,498 | 1,270 | 12,786 |
2014Q1 | i386 | 12,450 | 1,171 | 13,621 |
2014Q1 | x86_64 | 12,356 | 1,150 | 13,506 |
2014Q2 | i386 | 13,132 | 1,252 | 14,384 |
2014Q2 | x86_64 | 13,102 | 1,231 | 14,333 |
Total | 139,122 |
Now of course we don’t continuously attempt to build 139,122 packages. However, when something like Heartbleed happens, we backport the fix to all of these branches, and a rebuild of something as heavily depended upon as OpenSSL can cause around 100,000 packages to be rebuilt.
Each quarter we add another release branch to our builds, and as you can see from the numbers above (2013Q1 and earlier were limited builds) the total number of packages in pkgsrc grows with each release.
Recently I’ve been focussing on improving the bulk build performance, both to ensure that fixes such as Heartbleed are delivered as quickly as possible, and also to ensure we aren’t wasteful in our resource usage as our package count grows. All of our builds happen in the Joyent public cloud, so any resources we are using are taking away from the available pool to sell to customers.
Let’s first take a walk through pkgsrc bulk build history, and then look at some of the performance wins I’ve been working on.
pkgsrc bulk builds, 2004
The oldest bulk build I performed that I can find is this one. My memory is a little fuzzy on what hardware I was using at the time, but I believe it was a SunFire v120 (1 x UltraSPARCIIi CPU @ 650MHz) with 2GB RAM. This particular build was on Solaris 8.
As you can see from the results page, it took 13.5 days to build 1,810 (and attempt but fail to build 1,128) packages!
Back then the build would have been single threaded with only one package being
built at a time. There was no support for concurrent builds, make -j
wouldn’t have helped much, and essentially you just needed to be very patient.
May 2004: 2,938 packages in 13.5 days
pkgsrc bulk builds, 2010
Fast forward 6 years. At this point I’m building on much faster x86-based hardware (a Q9550 Core2Quad @ 2.83GHz and 16G RAM) running Solaris 10, however the builds are still single threaded and take 4 days to build 5,524 (and attempt but fail to build 1,325) packages.
All of the speed increase is coming directly from faster hardware.
May 2010: 6,849 packages in 4 days
pkgsrc @ Joyent, 2012 onwards
Shortly after joining Joyent, I started setting up our bulk build infrastructure. The first official build from this was for general illumos use. We were able to provide over 9,000 binary packages which took around 7 days to build.
At this point we’re starting to see the introduction of very large packages such as qt4, kde4, webkit, etc. These packages take a significant amount of time to build, so even though we are building on faster hardware than previously, the combination of an increased package count as well as individual package build times increasing mean we’re not seeing a reduction in total build time.
July 2012: 10,554 packages in 7 days
Performance improvements
At this point we start to look at ways of speeding up the builds themselves. As we have the ability to create build zones as required, the first step was to introduce distributed builds.
pbulk distributed builds
For pkgsrc in the 2007 Google Summer of Code Jörg Sonnenberger wrote pbulk, a replacement for the older bulk build infrastructure that had been serving us well since 2004 but had started to show its age. One of the primary benefits of pbulk was that it supported a client/server setup to distribute builds, and so I worked on building across 6 separate zones. From my work log:
September 2012: 10,634 packages in 2 days
Distributed chrooted builds
By far the biggest win so far was in June 2013, however I’m somewhat ashamed that it took me so long to think of it. By this time we were already using chroots for builds, as it ensures a clean and consistent build environment, keeps the host zone clean, and also allowed us to perform concurrent branch builds (e.g. building i386 and x86_64 packages simultaneously on the same host but in separate chroots).
What it took me 9 months to realise, however, was that we could simply use multiple chroots for each branch build! This snippet from my log is enlightening:
Not bad indeed. The comment is somewhat misleading, though, as this comment I made on IRC on June 15th alludes to:
Multiple distributed chroots get us an awfully long way, but now we’re stuck with big packages which ruin our total build times, and no amount of additional zones or chroots will help.
However, we are now under 24 hours for a full build for the first time. This is of massive benefit, as we can now do regular daily builds.
June 2013: 11,372 packages in 18 hours
make -j vs number of chroots
An ongoing effort has been to optimise the MAKE_JOBS
setting used for each
package build, balanced against the number of concurrent chroots. There are a
number of factors to consider:
- The vast majority of
./configure
scripts are single threaded, so generally you should trade extra chroots for lessMAKE_JOBS
. - The same goes for other phases of the package build (fetch, checksum, extract, patch, install, package).
- Packages which are highly depended upon (e.g. GCC, Perl, OpenSSL) should have
a high
MAKE_JOBS
as even with a large number of chroots enabled, most of them will be idle waiting for those builds to complete. - Larger packages are built towards the end of a bulk build run (e.g. KDE, Firefox) and these tend to be large builds. Similar to above, as they are later in the build there will be fewer chroots active, so a higher MAKE_JOBS can be afforded.
Large packages like webkit will happily burn as many cores as you give them and return you with faster build times, however giving them 24 dedicated cores isn’t cost-effective. Our 6 build zones are sized at 16 cores / 16GB DRAM, and so far the sweet spot seems to be:
- 8 chroots per build (bump to 16 if the build is performed whilst no other builds are happening).
- Default
MAKE_JOBS=2
. MAKE_JOBS=4
for packages which don’t have many dependents but are generally large builds which benefit from additional parallelism.MAKE_JOBS=6
for webkit.MAKE_JOBS=8
for highly-dependent packages which stall the build, and/or are built right at the end.
The MAKE_JOBS
value is determined based on the current PKGPATH
and is
dynamically generated so we can easily test new hypotheses.
With various tweaks in place, fixes to packages etc., we were running steady at around 12 hrs for a full build.
August 2014: 14,017 packages in 12 hours
cwrappers
There are a number of unique technologies in pkgsrc that have been incredibly useful over the years. Probably the most useful has been the wrappers in our buildlink framework, which allows compiler and linker commands to be analysed and modified before being passed to the real tool. For example:
There are a number of other features of the wrapper framework, however it
doesn’t come without cost. The wrappers are written in shell, and fork a large
number of sed
and other commands to perform replacements. On platforms with
an expensive fork()
implementation this can have quite a detrimental effect
on performance.
Jörg again was heavily involved in a fix for this, with his work on cwrappers, which replaced the shell scripts with C implementations. Despite being 99% complete, the final effort to get it over the line and integrated into pkgsrc hadn’t been finished, so in September 2014 I took on the task and the sample package results speak for themselves:
Package | Legacy wrappers | C wrappers | Speedup |
---|---|---|---|
wireshark | 3,376 seconds | 1,098 seconds | 3.07x |
webkit1-gtk | 11,684 seconds | 4,622 seconds | 2.52x |
qt4-libs | 11,866 seconds | 5,134 seconds | 2.31x |
xulrunner24 | 10,574 seconds | 5,058 seconds | 2.09x |
ghc6 | 2,026 seconds | 1,328 seconds | 1.52x |
As well as reducing the overall build time, the significant reduction in number of forks meant the system time was a lot lower, allowing us to increase the number of build chroots. The end result was a reduction of over 50% in overall build time!
The work is still ongoing to integrate this into pkgsrc, and we hope to have it done for pkgsrc-2014Q4.
September 2014: 14,011 packages in 5 hours 20 minutes
Miscellaneous fork improvements
Prior to working on cwrappers I was looking at other ways to reduce the number
of forks, using DTrace to monitor each internal pkgsrc phase. For example the
bmake wrapper
phase generates a shadow tree of symlinks, and in packages with
a large number of dependencies this was taking a long time.
Running DTrace to count totals of execnames showed:
Looking through the code showed a number of ways to reduce the large number of forks happening here
cat -> echo
cat
was being used to generate a sed
script, which sections such as:
There’s no need to fork here, we can just use the builtin echo
command instead:
Use shell substitution where possible
The dirname
commands were being operated on full paths to files, and in this
case we can simply use POSIX shell substitution instead, i.e.:
becomes:
Again, this saves a fork each time. This substitution isn’t always possible,
for example if you have trailing slashes, but in our case we were sure that
$file
was correctly formed.
Test before exec
The rm
commands were being unconditionally executed in a loop:
This is an expensive operation when you are running it on thousands of files
each time, so simply test for the file first and use a cheap (and builtin)
stat(2)
call instead of forking an expensive unlink(2)
for the majority of
cases.
At this point we had removed most of the forks, with DTrace confirming:
The result was a big improvement, going from this:
to this:
Again note, not only are we reducing the overall runtime, but the system time is significantly less, improving overall throughput and reducing contention on the build zones.
Batch up commands
The most recent changes I’ve been working on have been to further reduce forks
both by caching results and batching up commands where possible. Taking the
previous example again, the ln
commands are a result of a loop similar to:
Initially I didn’t see any way to optimise this, but upon reading the ln
manpage I observed the second form of the command which allows you to symlink
multiple files into a directory at once, for example:
As it happens, this is ideally suited to our task as $src
and $dst
will for
the most part have the same basename.
Writing some awk
allows us to batch up the commands and do something like
this:
There’s an additional optimisation here too - we keep track of all the
directories we need to create, and then batch them up into a single mkdir -p
command.
Whilst this adds a considerable amount of code to what was originally a simple
loop, the results are certainly worth it. The time for bmake wrapper
in
kde-workspace4 which has a large number of dependencies (and therefore symlinks
required) reduces from 2m11s to just 19 seconds.
Batching wrapper creation: 7x speedup
Cache results
One of the biggest recent wins was in a piece of code which checks each ELF
binary’s DT_NEEDED
and DT_RPATH
to ensure they are correct and that we have
recorded the correct dependencies. Written in awk there were a couple of
locations where it forked a shell to run commands:
These were in functions that were called repeatedly for each file we were checking, and in a large package there may be lots of binaries and libraries which need checking. By caching the results like this:
This simple change made a massive difference! The kde-workspace4 package includes a large number of files to be checked, and the results went from this:
to this:
Cache awk system() results: 25x speedup
Avoid unnecessary tests
The biggest win so far though was also the simplest. One of the pkgsrc tests
checks all files in a newly-created package for any #!
paths which point to
non-existent interpreters. However, do we really need to test all files?
Some packages have thousands of files, and in my opinion, there’s no need to
check files which are not executable.
We went from this:
to this:
Again DTrace helped in identifying the hot path (30,000+ sed
calls in this
case) and narrowing down where to concentrate efforts.
Only test shebang in executable files: ~50x speedup
Miscellaneous system improvements
Finally, there have been some other general improvements I’ve implemented over the past few months.
bash -> dash
dash
is renowned as being a leaner, faster shell than bash
, and I’ve
certainly observed this when switching to it as the default $SHELL
in builds.
The normal concern is that there may be non-POSIX shell constructs in use, e.g.
brace expansion, but I’ve observed relatively few of these, with the results
being (prior to some of the other performance changes going in):
Shell | Successful packages | Average total build time |
---|---|---|
bash | 13,050 | 5hr 25m |
dash | 13,020 | 5hr 10m |
It’s likely with a small bit of work fixing non-portable constructs we can
bring the package count for dash
up to the same level. Note that the
slightly reduced package count does not explain the reduced build time, as
those failed packages have enough time to complete successfully before other
larger builds we’re waiting on are completed anyway.
Fix libtool to use printf builtin
libtool has a build-time test to see which command it should call for advanced printing:
Unfortunately on SunOS, there is an actual /usr/bin/print
command, thanks to
ksh93 polluting the namespace. libtool finds it and so prefers it over printf,
which is a problem as there is no print
in the POSIX spec, so neither dash
nor bash implement it as a builtin.
Again, this is unnecessary forking that we want to fix (libtool is called a
lot during a full bulk build!) Thankfully pkgsrc makes this easy - we can
just create a broken print
command which will be found before
/usr/bin/print
:
saving us millions of needless execs.
Parallelise where possible
There are a couple of areas where the pkgsrc bulk build was single threaded:
Initial package tools bootstrap
It was possible to speed up the bootstrap phase by adding custom make -j
support, reducing the time by a few minutes.
Package checksum generation
Checksum generation was initially performed at the end of the build running across all of the generated packages, so an obvious fix for this was to perform individual package checksum generation in each build chroot after the package build had finished and then simply gather up the results at the end.
pkg_summary.gz
generation
Similarly for pkg_summary.gz
we can generate individual per-package pkg_info
-X
output and then collate it at the end.
Optimising these single-threaded sections of the build resulted in around 20 minutes being taken off the total runtime.
Summary
The most recent build with all these improvements integrated together is here, showing a full from-scratch bulk build taking under 5 hours to build over 14,000 packages. We’ve come a long way since 2004:
Date | Package Builds | Total Build Time (hours) |
---|---|---|
2004/05 | 2,938 | 322 |
2010/05 | 6,849 | 100.5 |
2012/07 | 10,554 | 166.5 |
2012/10 | 10,634 | 48 |
2013/06 | 11,372 | 18 |
2014/08 | 14,017 | 12 |
2014/10 | 14,162 | 4.5 |
We’ve achieved this through a number of efforts:
- Distributed builds to scale across multiple hosts.
- Chrooted builds to scale on individual hosts.
- Tweaking
make -j
according to per-package effectiveness. - Replacing scripts with C implementations in critical paths.
- Reducing forks by caching, batching commands, and using shell builtins where possible.
- Using faster shells.
- Parallelising single-threaded sections where possible.
What’s next? There are plenty of areas for further improvements:
- Improved scheduling to avoid builds with high
MAKE_JOBS
from sharing the same build zone. make(1)
variable caching between sub-makes.- Replace
/bin/sh
on illumos (ksh93) with dash (even if there is no appetite for this upstream, thanks to chroots we can just mount it as/bin/sh
inside each chroot!) - Dependency graph analysis to focus on packages with the most dependencies.
- Avoid the “long tail” by getting the final few large packages building as early as possible.
- Building in memory file systems if build size permits.
- Avoid building multiple copies of libnbcompat during bootstrap.
Many thanks to Jörg for writing pbulk and cwrappers, Google for sponsoring the pbulk GSoC , the pkgsrc developers for all their hard work in adding and updating packages, and of course Joyent for employing me to work on this stuff.
All Posts
- 16 Jul 2015 » Reducing RAM usage in pkgin
- 03 Mar 2015 » pkgsrc-2014Q4: LTS, signed packages, and more
- 06 Oct 2014 » Building packages at scale
- 04 Dec 2013 » A node.js-powered 8-bit CPU - part four
- 03 Dec 2013 » A node.js-powered 8-bit CPU - part three
- 02 Dec 2013 » A node.js-powered 8-bit CPU - part two
- 01 Dec 2013 » A node.js-powered 8-bit CPU - part one
- 21 Nov 2013 » MDB support for Go
- 30 Jul 2013 » What's new in pkgsrc-2013Q2
- 24 Jul 2013 » Distributed chrooted pkgsrc bulk builds
- 07 Jun 2013 » pkgsrc on SmartOS - creating new packages
- 15 Apr 2013 » What's new in pkgsrc-2013Q1
- 19 Mar 2013 » Installing SVR4 packages on SmartOS
- 27 Feb 2013 » SmartOS is Not GNU/Linux
- 18 Feb 2013 » SmartOS development preview dataset
- 17 Jan 2013 » pkgsrc on SmartOS - fixing broken builds
- 15 Jan 2013 » pkgsrc on SmartOS - zone creation and basic builds
- 10 Jan 2013 » Multi-architecture package support in SmartOS
- 09 Jan 2013 » Solaris portability - cfmakeraw()
- 08 Jan 2013 » Solaris portability - flock()
- 06 Jan 2013 » pkgsrc-2012Q4 illumos packages now available
- 23 Nov 2012 » SmartOS and the global zone
- 24 Oct 2012 » Setting up Samba on SmartOS
- 10 Oct 2012 » pkgsrc-2012Q3 packages for illumos
- 23 Aug 2012 » Creating local SmartOS packages
- 10 Jul 2012 » 7,000 binary packages for OSX Lion
- 09 Jul 2012 » 9,000 packages for SmartOS and illumos
- 07 May 2012 » Goodbye Oracle, Hello Joyent!
- 13 Apr 2012 » SmartOS global zone tweaks
- 12 Apr 2012 » Automated VirtualBox SmartOS installs
- 30 Mar 2012 » iptables script for Debian / Ubuntu
- 20 Feb 2012 » New site design
- 11 Jan 2012 » Set up anonymous FTP upload on Oracle Linux
- 09 Jan 2012 » Kickstart Oracle Linux in VirtualBox
- 09 Jan 2012 » Kickstart Oracle Linux from Ubuntu
- 22 Dec 2011 » Last day at MySQL
- 15 Dec 2011 » Installing OpenBSD with softraid
- 21 Sep 2011 » Create VirtualBox VM from the command line
- 14 Sep 2011 » Creating chroots for fun and MySQL testing
- 30 Jun 2011 » Graphing memory usage during an MTR run
- 29 Jun 2011 » Fix input box keybindings in Firefox
- 24 Jun 2011 » How to lose weight
- 23 Jun 2011 » How to fix stdio buffering
- 13 Jun 2011 » Serving multiple DNS search domains in IOS DHCP
- 13 Jun 2011 » Fix Firefox URL double click behaviour
- 20 Apr 2011 » SSH via HTTP proxy in OSX
- 09 Nov 2010 » How to build MySQL releases
- 29 Apr 2010 » 'apt-get' and 5,000 packages for Solaris10/x86
- 16 Sep 2009 » ZFS and NFS vs OSX
- 12 Sep 2009 » pkgsrc on Solaris
- 09 Dec 2008 » Jumpstart from OSX
- 31 Dec 2007 » Set up local caching DNS server on OSX 10.4