Blast from the Past: the Pimpl?

Posted by Bjoern Michaelsen on 18 July 2015

Blast from the Past: the Pimpl?

They sentenced me to twenty years of boredom

For trying to change the system from within

-- Leonard Cohen, I'm your man, First we take Manhattan

Advance warning: This blog post talks about C++ coding style, and given the "expressiveness" (aka a severe infection with TimTowTdi) this is bound to contain significant amounts of bikeshedding, personal opinion/preference. As such, be invited to ignore all this as the ramblings of a raging lunatic.

Anyone who observed me spotting a Pimpl in code will know that I am not a fan of this idom. Its intend is to reduce build times by using a design pattern to move implementation details out of headers -- a workaround for C++s misfeature of by default needing a recompile even for changing implementation details only without changing the public interface. Now I personally always thought a pure abstract base class to be a more "native" and less ugly way to tell this to the compiler. However, without real testing, such gut feelings are rarely good advisors in a complex language like C++.

So I did some testing on the real life performance of a pure abstract base class vs. a pimpl (each of course in a different compilation unit to prevent the compiler to optimize away what we want to measure) -- and for reference, a class with functions that can be completely inlined. These are the three test implementations, inline:

//-- header (hxx) --
class InlineClass final
{
	public:
		InlineClass(int nFirst, int nSecond)
			: m_nFirst(nFirst), m_nSecond(nSecond), m_nResult(0)
		{};
		void Add()
			{ m_nResult = m_nFirst + m_nSecond; };
		int GetResult() const
			{ return m_nResult; };
	private:
		const int m_nFirst;
		const int m_nSecond;
		int m_nResult;
};

Pimpl, as suggested by Effective Modern C++ when using C++11, but not C++14:

//-- header (hxx) --
#include <memory>
class PimplClass final
{
	public:
		PimplClass(int nFirst, int nSecond);
		~PimplClass();
		void Add();
		int GetResult() const;
	private:
		struct Impl;
		std::unique_ptr<Impl> m_pImpl;
};
//-- implementation (cxx) --
#include "pimpl.hxx"
struct PimplClass::Impl
{
	Impl(int nFirst, int nSecond)
		: m_nFirst(nFirst), m_nSecond(nSecond), m_nResult(0)
	{};
	const int m_nFirst;
	const int m_nSecond;
	int m_nResult;
};
PimplClass::PimplClass(int nFirst, int nSecond)
	: m_pImpl(std::unique_ptr<Impl>(new Impl(nFirst, nSecond)))
{}
PimplClass::~PimplClass()
	{}
void PimplClass::Add()
	{ m_pImpl->m_nResult = m_pImpl-&gt;m_nFirst + m_pImpl-&gt;m_nSecond; }
int PimplClass::GetResult() const
	{ return m_pImpl->m_nResult; }

Pure abstract base class:

//-- header (hxx) --
#include <memory>
struct AbcClass
{
	static std::shared_ptr<AbcClass> Create(int nFirst, int nSecond);
	virtual ~AbcClass() {};
	virtual void Add() =0;
	virtual int GetResult() const =0;
};
//-- implementation (cxx) --
#include "abc.hxx"
#include <memory>
struct AbcClassImpl final : public AbcClass
{
	AbcClassImpl(int nFirst, int nSecond)
		: m_nFirst(nFirst), m_nSecond(nSecond)
	{};
	virtual void Add() override
		{ m_nResult = m_nFirst + m_nSecond; };
	virtual int GetResult() const override
		{ return m_nResult; };
	const int m_nFirst;
	const int m_nSecond;
	int m_nResult;
};
std::shared_ptr<AbcClass> AbcClass::Create(int nFirst, int nSecond)
	{ return std::shared_ptr<AbcClass>(new AbcClassImpl(nFirst, nSecond)); }

Comparing these we find:

implementation lines added for GetResult() source entropy added source entropy for GetResult() runtime
inline 2 187 17 100%
Pimpl 3 316 26 168% (174%)
pure ABC 3 295 (273) 19 (16) 158%
So the abstract base class has less complex source code (entropy)1, needs less additional entropy to expand and is still faster in the end on common hardware (Intel i5-4200U) with common compiler optimization switches (-O2)2.

Additionally, in a non-trivial code base you might actually need to use virtual functions for your implementation anyway as you are deriving from or implementing an existing interface. In the Pimpl case, this means using two indirections (resolving the virtual function and then resolving the m_pImpl pointer in that function on top of that). In the abstract base class case thats not happening and in addition, it means that you can spare yourself the pure virtual declarations in the *.hxx (the virtual ... =0 ones), as those are already declared in the class derived from. In LibreOffice, this is true for any class implementing UNO interfaces. So the first numbers are actually biased against an abstract base class for real world code bases -- the numbers in parathesis show the results when an interface is already defined elsewhere.

So unless the synthetic example used here is some kind of weird cornercase, this suggests abstract base classes being the better alternative over a Pimpl once the class goes beyond being a plain value type with completely inlineable accessor member functions.

Thanks for bearing with me on this rant about one of my personal pet peeves here!

1 entropy is measured as cat abc.[hc]xx|gzip|wc -c or cat pimpl.[hc]xx|sed -e 's/Pimpl/Abc/g'|gzip|wc -c. 2 Here is the code run for that comparision:

constexpr int repeats = 100000;

int pimplrun(long count)
//int abcrun(long count)
{
        std::vector< std::shared_ptr&lt;PimplClass /* AbcClass */ > > vInstances;
        vInstances.reserve(count);
        while(--count)
                vInstances.emplace_back(std::make_shared<PimplClass>(4711, 4711));
                //vInstances.emplace_back(AbcClass::Create(4711, 4711));
        int result(0);
        count = vInstances.size();
        while(--count)
                for(auto pInstance : vInstances)
                {
                        pInstance->Add();
                        result += pInstance->GetResult();
                }
        return result;
}
Instances are stored in shared pointers as anything that a Pimpl is considered for would be "heavy" enough to be handled by reference instead of by value.

Update 1: Out of curiosity, I looked a bit deeper at this with callgrind. This is what I found for running the above (with 1000 repeats) and --cache-sim=yes: I1 cache: 32768 B, 64 B, 8-way D1 cache: 32768 B, 64 B, 8-way LL cache: 3145728 B, 64 B, 12-way

event inline ABC Pimpl
Ir 23,356,163 38,652,092 38,620,878
Dr 5,066,041 14,109,098 12,107,992
Dw 3,060,033 5,094,790 5,099,991
I1ir 34 127 29
D1mr 499,952 253,006 999,013
D1mw 501,636 998,312 500,097
ILmr 28 126 24
DLmr 2 845 0
DLmw 0 1,285 250
I dont know exactly what to derive from that, but what is clear is that purely by instruction counts Ir this can not be explained. So you need --cache-sim=yes which gives the additional event counts. Actually Pimpl looks slightly better on most stats, so as it is slower in real life, the cache misses on the first level data cache D1mr might have quite an impact?

Update 2: This post made it to reddit, so I looked into some of the feedback from there. A common suggestion was to use for(auto& pInstance : vInstances) instead of for(auto pInstance : vInstances) in the benchmarking function. This had no significant impact on walltime measurements nor made it callgrind event counts show some clearer picture. I also played around with the order of linked objects to see if it has any impact (via cache locality etc.). While runtime measurements fluctuated quite a bit (even when using the same binary), the order was always the same: inlining quickest, then abstract base class and pimpl slowest.

Originally published on 2015-07-18 07:16:23 on wordpress.