An Old Dog Learns Catch Tricks: Part 1

Eldan Ben Haim
Dec 11, 2021
7 min read

I’ve recently quit my job and decided to take some time off before embarking on a new journey. With this much time in my hands, I’m able to progress some of my pet projects. More specifically, I’m able to spend time on stuff that I’d normally opt not to focus on otherwise; for example, proper build systems, installation, “productization” and, oh, yes, automated testing.

I’ve mentioned Bracez, my pet-JSON editor, in previous posts. Bracez needed some long over-due work around the JSON parsing bits (not a foundational component for a JSON editor, obviously). A short while ago, I ran into a huge JSON file (Chromium’s HSTS preloaded data) that made the editor run very slow. I moved long-running JSON parsing to the background, but there was still much improvement desired around performance of that background scanning. Moreover, Bracez attempts to avoid a full parse after file edits and I wanted to improve this mechanism so that it falls back to full-parse less frequently. And finally, while I indeed moved full-parse to the background, the editor enqueues full parse requests to a background queue without ever cancelling these requests if the file is modified in the mid of a full parse cycle; this was wasteful and impacted performance (this is a very common scenario — e.g when you’re typing to edit a file, it’s very possible that you type while background parsing is happening).

All of these changes were pretty neat projects. Biting one’s teeth in a good code optimization problem is always good for one’s teeth (and overall health). The partial reparse code contains some nice algorithmic challenges. And getting cancellation right for background process is.. well, typically annoying and hard to get right. Nevertheless, none of these efforts is what we’re going to discuss in this post.

The parser, along with background processing and fast-reparse-slow-parse code has become a rather non-trivial maze of logic. And mazes of logic tend to be brittle. Sure, there are probably ways to write it better — there always are — but it will probably never become a straightforward piece of code. If I were going to further complicate things (the way I like), I should at least make sure I have proper automated testing for these bits before venturing into this dragon’s lair.

I’ve decided to start by looking for a decent unit-testing framework for C++ with which I’ll write the missing tests. Soon I came across Catch2. Catch is really straightforward: you use a bunch of macros to define test cases and sections (internal partition within a test case). Test cases are basically written as C++ functions that contain invocations of assertion macros — that encode the actual conditions to verify and potentially fail tests on.

As I was learning how to use the framework, I figured that some of the neat things it could do weren’t very trivial to implement. So I dug a little deeper and am sharing the results over this series of posts.

Expression Decomposition

A simple example from the Catch2 tutorial is brought below. This is really simple to use:

TEST_CASE( "Factorials are computed", "[factorial]" ) {
    REQUIRE( Factorial(0) == 1 );
    REQUIRE( Factorial(1) == 1 );
    REQUIRE( Factorial(2) == 2 );
    REQUIRE( Factorial(3) == 6 );
    REQUIRE( Factorial(10) == 3628800 );
}

I guess the only thing that deserves an explanation is the “[factorial]” second argument passed to the TEST_CASE macro. This allows you to associate tags with tests. But don’t worry about this right now. The tutorial goes on to show the output for a failed test:

Example.cpp:9: FAILED:
  REQUIRE( Factorial(0) == 1 )
with expansion:
  0 == 1

So we see the file and line where a REQUIRE assertion failed and we of course see the quoted assertion condition, and then we see…. Wait, what!? We see the condition expression with some of its sub expressions (the left-hand side of the equality operator in this case) substituted by their respective values ('0 == 1')... How do they do that!?

Start With The Basics

It’s clear how with macro magic we can have the file + line output. And it’s clear how we can emit a the expression as a string. So let’s start with this code, defining an EVAL_AND_PRINT macro that will grow to be something similar to REQUIRE:

#define EVAL_AND_PRINT(x) \
    std::cout << "In " << __FILE__ << ":" << __LINE__ << ": " \
              << #x << " is equal to " << (x) << std::endl;

That’s all fine, but how are we going about outputting the last part of that failure message, with the expanded expression? It turns out that there’s a neat trick at play here.

Let’s start with defining a relaxed set of requirements: for an expression of the form

sub-expression-1 operator sub-expression-2

where sub-expression-1 and sub-expression-2 are expressions with a higher precedence than operator, and operator is a comparison operator, we want to be able to emit:

The result of sub-expression-1,
The result of sub-expression-2, and
The result of the original expression.

For our toy example we’ll actually only look at equality comparison expressions; the principles shown here are easily extended to cover other operators.

Operator Overloading is Your Friend

The first approach that comes to mind is wrapping the sub-expressions in some class that implements operator overloading for logic operators, and captures both subexpression results alongside with the result. Let’s give it a shot:

template<class T>
class ValueWrapper {
public:
    ValueWrapper(T t) : val(t) {}

private:
    T val;

public:
    template <class O>
    auto operator==(const O &other) const -> BinaryExpression<decltype(val == other), T, O> {
        return BinaryExpression("==", val == other, val, other);
    }
};

Our class ValueWrapper, templated by the type of wrapped value, basically only exists to construct a BinaryExpression from a C++ expression. Consider an expression of the form

ValueWrapper(9+15) == 27 + 2)

This will now construct a BinaryExpression class, with the values of 9+15 (=24) and 27+2 (=29) passed to the constructor along with the comparison result and the operator name. The BinaryExpression class can be declared like so:

template<class RES, class LHS, class RHS>
class BinaryExpression {

private:
    RES result;
    const char *op;
    LHS lhs;
    RHS rhs;

public:
    BinaryExpression(const char *_op, RES _result, LHS _lhs, RHS _rhs)
    : op(_op), result(_result), lhs(_lhs), rhs(_rhs) {}
    
    virtual std::string toString() const {
        std::stringstream s;
        s << lhs << op << rhs;
        return s.str();
    }    
    RES getResult() const { return result; }
};

So now, we can do something like this:

#define EVAL_AND_PRINT(x) \
    std::cout << "In " << __FILE__ << ":" << __LINE__ << ": " \
              << #x << " is equal to " << (x).getResult() << std::endl \
              << "   " << (x).toString() << std::endl;

And then

EVAL_AND_PRINT(ValueWrapper(9+15) == 27 + 2))

Results in:

In main.cpp:123: ValueWrapper(9 + 15) == 27+2 is equal to 0
   24==29

We’re getting there! The problem here is that, well,

EVAL_AND_PRINT(ValueWrapper(9+15) == 27 + 2))

Is very much different than

EVAL_AND_PRINT(9+15 == 27 + 2))

Which is where we want to go. So we need to somehow wrap the left-hand sub-expression with a ValueWrapper for the client, instead of making them do it on their own.

A Wrapping Operator

Macros are probably of no use here, since they’re not really designed to manipulate their arguments; rather embed them as they are (modulo stringifying and concatenation, obviously). So we’re trying something else: operator overloading. Again.

Recall that we are only interested in expressions decomposed to

sub-expression-1 operator sub-expression-2

where operator is a comparison operator. Now let’s look at an expression of the form

alpha alpha-operator sub-expression-1 operator sub-expression-2

where (a) alpha (greek letters always make things seem much smarter than they are, don’t they? They should have been called geek letters) is an instance of some class A we control (b) alpha-operator is an operator that has a higher or equal precedence to all comparison operators that interest us, is left-associative.

From requirement (b), we can rewrite the updated expression in infix notation as follows:

operator(alpha-operator(alpha, sub-expression-1), sub-expression-2)

If we overload alpha-operator for class A, then from requirement (a) we can rewrite this expression as

operator(alpha.alpha-operator(sub-expression-1), sub-expression-2)

Now if we define alpha.alpha-operator to return a ValueWrapper wrapper around its right-hand operand, by transforming ‘sub-expression-1 operator sub-expression-2’ to ‘alpha alpha-operator sub-expression-1 operator sub-expression-2’ (which is definitely within the realm of capabilities for macros!) we in fact transformed the expression to an equivalent of ‘ValueWrapper(sub-expression-1) operator sub-expression-2’ which is exactly what we wanted (needed, really)!

Let’s see this in action. First, let’s define class A; only since we’re no longer pretending we’re doing something particularly smart here, we’ll call it ValueWrapperHelper (which is a silly, silly name to compensate for the greek above). And the alpha-operator will be the ‘<=‘ operator:

class ValueWrapperHelper {
public:
    ValueWrapperHelper(){}
    
    template<class T>
    ValueWrapper<T> operator<=(const T &other) {
        return ValueWrapper(other);
    }
};

And we can update EVAL_AND_PRINT as follows:

#define EVAL_AND_PRINT(x) {\
    auto er = ValueWrapperHelper() <= x;\
    std::cout << "In " << __FILE__ << ":" << __LINE__ << ": " \
              << #x << " is equal to " << er.getResult() << std::endl \
              << "   " << er.toString() << std::endl; }

Notice the er variable declaration which performs the expression transformation we described above. Having that in place,

EVAL_AND_PRINT(9 + 15 == 27+2);

Translates to

…
auto er = ValueWrapperHelper() <= 9 + 15 == 27+2
…

Which based on operator precedence and the discussion above translates to:

…
auto er = ValueWrapperHelper().operator<=(9 + 15) == 27+2
…

Which at some point is evaluated as:

…
auto er = ValueWrapper(9 + 15) == 27+2
…

And voila, that's exactly what we wanted. Indeed, the output is now:

In main.cpp:129: 9 + 15 == 27+2 is equal to 0
   24==29

In conclusion…

This neat trick by the Catch2 team makes unit test code “fluent” while still producing cool diagnostics messages. And there are no bad hacks involved, just operator overloading and very little macro witchcraft. It’s not perfect; for example, it has limitations around complex expressions and compound booleans. And the toy implementation presented here is just that: a toy implementation, missing a lot of the “grown up stuff”. The Catch implementation is obviously much more elaborate than this; if you found this interesting, you should go ahead and take a deep look into the Catch2 repository.

This is not the only novelty I found in the library. There’s one more neat trick I saw there, around its data-driven testing functionality. Some would say it’s more hack-is; it’s definitely more complex. But we’ll save this for another post…

I'm only doing this so you won't have to

Tales about a hobbyist's Programming and Making Endeavours