gbuild: To boldly go where no build system has gone before ...
This is the first post in a short series of blog posts about the new GNU make based build system that will soon be integrated into the DEV300 codeline. It is covering the memory usage of the new GNU make build system when using full dependencies.
Since the first announcement of plans to replace the old build.pl/dmake build system with a new solution, one of the focus points was the correct handling of dependencies. To handle this problem correctly, multiple approaches have been tried and one was to have all dependencies in one process because recursive make is considered harmful. As the gnumake2 cws is finally approaching the final status in the lifecycle of a feature branch (being "ready for QA" until "integrated"), it is time to address one of the main concerns reported by community members about it: memory usage.
The very idea of having all information about an OpenOffice.org build in one process -- including all dependencies -- might sound obnoxious and megalomaniac at first. But in general, memory got cheap in recent years and it was the time to at least consider this option.
By now, we can say it was well worth it: gnumake2 is now capable of building eight modules (framework, sfx2, svl, svtools, sw, toolkit, tools, xmloff) in one process, and we have a solid base to approximate the memory usage of a build that contains all dependencies in one process.
First we have to find a reasonable metric of "dependency-intensity" of a part of the build. Then we can try to relate that metric to the memory usage measured on migrated modules to extrapolate to a full build.
A simple metric is the number on #include-statements in hxx- and cxx-files. (That might be a very naïve assumption, but I have spent too much lifetime in the physics department of a university to be scared of a spherical cow.)
To measure the memory usage two methods have been used:
- Adding $(info finished parsing.) $(shell sleep 60) to the end of the main makefile allowed me to measure the heap size of a real make process with pmap -d $(ps -a|grep make|cut -f1 -d\ )|egrep -o writeable/private:.[0-9]+K|cut -f 2 -d\ on Linux 64-Bit.
- valgrinds massif provided a simple, but synthetic way to do the same (using useful-heap as a measure).
|include statements||pmap (KiB)||massif (KiB)|
|no module||0||632|| 407 |
|tools||964||1184|| 1115 |
|svl||1652||1660|| 1645 |
|toolkit||2276||3524|| 3210 |
|svtools||3768||5548|| 5179 |
|framework||6049||5188|| 4812 |
|sfx2||6065||6196|| 5961 |
|xmloff||6496||6860|| 6582 |
|framework, sfx2||12514||10276|| 9276 |
|sw||24087||29340|| 25943 |
|all migrated but sw||27670||23124|| 21145 |
|all migrated but xmloff, sfx2||38796||40088|| 35550 |
|all migrated||51757||50812|| 45129 |
Using a simple linear regression over the data, OpenOffice.org Calcs LINEST function tells me that one can assume a heap usage of 1010±40 bytes per include for the pmap data and 890±35 bytes for the massif data. This plot show no obvious systematic error in the assumption of the model:
The last two data points are the predictions of the model for a full build with and without binfilter:
- without binfilter: 170-190 MiB (pmap), 150-170 MiB (massif)
- with binfilter: 190-210 MiB (pmap), 170-180 MiB (massif)
Finishing note: gnumake2 will be integrated soon, so if you are using actively developing on the DEV300 codeline, it is advisable to check out the basics of it. A good starting point is the talk "Rebooting the OpenOffice.org Build System" [ ODP PDF video ] I gave at the OOoCon2010. For more detailed information see the Build Environment Effort section at the OOo wiki.
I hope to keep the posts about the build system coming in the next days. Next up: How to migrate a module to gbuild.
P.S.: In case an interested GNU make hacker comes across this post, here is some output from make -p for the 8 migrated modules:2385 variable set hash-tables
# files hash-table stats:
# Load=21210/32768=65%, Rehash=5, Collisions=605939/1022290=59%
# # of strings in strcache: 32753 / lookups = 991181 / hits = 958428
# # of strcache buffers: 284 (* 8176 B/buffer = 2321984 B)
# strcache used: total = 2313570 (5196) / max = 8176 / min = 8170 / avg = 8175
# strcache free: total = 238 (2980) / max = 6 / min = 0 / avg = 0
# strcache hash-table stats:
# Load=32869/65536=50%, Rehash=3, Collisions=545530/1044947=52%
(This is a very raw mirror of the original blog post made to blogs.sun.com 16 Nov 2010. As per http://web.archive.org/web/20090627144253/http://www.sun.com/termsofuse.jsp "... You grant Sun and all other users of the Website an irrevocable, worldwide, royalty-free, nonexclusive license to use, reproduce, modify, distribute, transmit, display, perform, adapt, resell and publish such Content (including in digital form) ..." )
This was originally published at 2011-07-26 10:03:00/2011-07-26 08:03:08 on livejournal.