Pursuit of Proper TypeScript Typing

Eldan Ben Haim
Oct 22, 2021
10 min read

I’m starting this post with a service for the (few, I presume) non-Hebrew-speaking readers in the crowd. I’d like to introduce the Hebrew word מקולקל, pronounced as “Mekulkal”. It roughly means broken, or out-of-order. There. Now you’re a Hebrew speaker, if you weren’t before. Something new every day.

With my long time affection for calculators (if you own cool calculators you can almost convince yourself you’re good at math), Hebrew as a mother tongue, and a weakness for bad puns there was literally no choice for me but to call my little (software) calculator project Meculcalator (it’s a broken calculator, got it!?). It’s an RPN calculator, profoundly inspired by the HP-48 and co. It support matrices, vectors, imaginary numbers, symbolic algebra (kinda), and - of course - programming.

Meculcalator started its life a few years ago as a web application, coded in JavaScript. Like so many other of my personal projects, over the years I’ve worked on it on and off. The most recent significant changes I made were to tidy up the separation between the calculator “engine” and the UI, convert the engine code to proper TypeScript, and create a mobile Meculcalator based on Ionic and the TypeScript calculator engine.

There are two-and-a-half hand-crafted parsers included within the calculator engine. The primary ones are a parser for symbolic algebraic expressions, and a parser for programs (Meculcalator is programmable :)). I wanted to tear these down and replace them with a cleaner parsing solution and so I set out to look for options. There’s a few libraries out there for TypeScript but during the search I realized I’m really in the mood of writing my own thing. This obviously quickly led me to identify “flaws” in all of the libraries I ran into, and these flaws became a wishlist from my parser combinator library. What I’d really want is:

Elegant, API with strict TypeScript typing for parsers
Rich set of AST generation facilities
Support a wide area of parsing use-cases on top of “here’s an input, please parse it”, such as:
- Suggest completions
- Re-parse parts of the input given changes
- Fast validation using regular expressions

And so I set out to create “parzing”: a TypeScript parser-combinator library. This would be my first venture into publishing an npm package. The first version is meant to cover the first two bullets above, as these are the immediate needs for Meculcalator. But as Meculcalator will grow, so will the list of supported features.

As I was building the first version of parzing, the goal of a clean and strictly typed API revealed a couple of interesting challenges that called for use of some non trivial TypeScript typing facilities. Seeing how these were useful to me, I figured I’d showcase them here.

But prior to all that, here’s a toy parser example. In fact the grammar described here is the Meculcalator programming language — but the parser definition is stripped down to only cover the grammar without any real AST building. This should give you a fair idea of the API I was aiming at.

const tok_start_program = P.token("<<");
const tok_end_program = P.token(">>");
const tok_frame_start = P.token("->");

const object_literal = P.anyOf("0123456789")._(O.map((s) => new ObjectLiteral(s)));
const var_name = P.anyOf("abcdefghijklmnopqrstuvwxyz");

let block: Parser<Block>;

const frame_invoke = P.sequence(
    tok_frame_start._(O.omit()), 
    P.cut(),
    P.many(var_name),
    tok_start_program._(O.omit()),
    P.ref(() => block),
    tok_end_program._(O.omit())
)._(O.build(FrameInvokeNode));

const ite_statement = P.sequence(
    P.token('IF')._(O.omit()),
    P.cut(),
    P.ref(() => block),
    P.token('THEN')._(O.omit()),
    P.ref(() => block),    
    P.sequence(
        P.token('ELSE')._(O.omit()),
        P.ref(() => block)
    )._(O.map((s) => s[0]))._(O.optional()),    
    P.token('END')._(O.omit())
)._(O.build(IfThenElseStatement))

const while_statement = P.sequence(
    P.token('WHILE')._(O.omit()),
    P.cut(),
    P.ref(() => block),
    P.token('DO')._(O.omit()),
    P.ref(() => block),
    P.token('END')._(O.omit())
)._(O.build(WhileStatement))
    
const repeat_statement = P.sequence(
    P.token('REPEAT')._(O.omit()),
    P.cut(),
    P.ref(() => block),
    P.token('UNTIL')._(O.omit()),
    P.ref(() => block),
    P.token('END')._(O.omit())
)._(O.build(RepeatStatement));

const local_store = 
    P.sequence(var_name, P.token('=')._(O.omit()))._(O.whitespace(P.pass()))._(O.build(LocalStoreNode));
      
block = 
    P.sequence(
        P.many(
            P.choice(
                object_literal,
                frame_invoke, 
                ite_statement,
                while_statement,
                repeat_statement,
                local_store
            )
        )
    )._(O.build(Block))

const program = P.sequence(tok_start_program, block, tok_end_program)._(O.map((v) => v[1]));

Fundamentals: The Parser API and Simple Typing Definitions

For a strictly typed parsing library, we want a parser API that’s explicit about the type of value it uses to represent the parsed result. So we define a Parser API with a generic argument, like so:

export interface Parser<T> {
    parse(parserContext: ParserContext): T;
}

parse()’s contract is simple: read input from the parsing context, and return a representation of what was parsed. The ParserContext object is used to pass the parser input as well as pass arbitrary context between high- and low-level contexts. We’ll ignore ParserContext for now; it’s not very interesting for our current discussion (which as you’ll see will include very little discussion of implementation).

As a side note, you’ll notice that I chose not to remain flexible regarding the representation of the parsed input. I’m assuming inputs are streams of characters, not arbitrary objects. I felt that if I wanted to offer the reach set of functionality described above, and still maintain reasonable simplicity, I need to accept that this library will be usable for text parsing rather than parsing general purpose streams of objects.

Much of what the parzing library deals with is the ability to take Parser objects and transform or combine them. One simple example is the Mapping transform. The mapping transform is crucial to allowing a parser to return anything other than a string, e.g an AST. The way a mapping transform works is that it takes a Parser (we’ll call it ‘the original parser’) and a function (we’ll call that ‘the mapping function’), and returns a new Parser. The new parser invokes the original parser, takes the output from that parser, invokes the mapping function on that and returns the result of applying the mapping function as the parsing result. For example, a mapping parser could take a parser that parsed a textual representation of a number, and then convert that textual representation from string to number.

With TypeScript, declaring a strictly typed map() function is quite straightforward:

    map<V extends Parser<unknown>, T>(parser: V, 
            mapper: (i: ParserType<V>) => T): Parser<T>

Note the use of ParserType above. When instantiated with a Parser<T>, this will return T -- that is, it transforms a parser type to the return type of that parser. It's defined below:

type ParserType<pt> = pt extends Parser<infer T> ? T : never;

This uses a conditional type with inferred arguments: TypeScript allows defining a type based on a conditional check on an input type (and specifically whether it conforms, or extends a given type). What more, a conditional check can also capture type parameters of the type against which conformance is checked — using the infer keyword as above. You can read more about the details here, but the net result is that ParserType<P> eventually returns the return type of the parser that was passed to it.

Now that we got the basics, let's move to something more challenging.

Strictly Typing a Sequence Combinator

The sequence parser combinator receives a set of parsers as operands (the “sequenced parsers”), and creates from them a parser that attempts to parse each of the sequenced parsers by turn. So, given a parser A that attempts to parse a number (probably parsing a string consisting of digits and then mapping this to a number value), and a parser B that attempts to parse an arbitrary word (sequence of non-space characters) and returning the string, a sequence combinator operating on A and B will try to parse a number followed by a string. What’s the return type of this parser though? That would be any representation for a sequence — in TypeScript probably the most trivial to use would be an array or a tuple — consisting of the values of the individual sequenced parsers.

The sequence parser combinator is implemented as a TypeScript function, receiving a variadic argument list — of sequences from different types. What we would like to express is that this function returns a parser whose return type is the sequence of return types of the sequenced parsers. How do we go about this? One step at a time… :)

Let's start with constructing a type by mapping a tuple of Parser<R1>, Parser<R2>, .... types to a tuple of R1, R2, .... So basically we want to ‘map’ all types of the parser tuple by applying the ParserType<> type that we defined above. This can be achieved using TypeScript mapped types. With TypeScript mapped types, we can define a type based on the members of a different type — by mapping them. We can change member names (by providing member name templates), types or other attributes (such as mutability). We can also decide to exclude some members.

A definition for mapping a tuple of Parser<R> types to a tuple of the underlying R’s, based on mapping types, is given below:

export type SeqType<TS extends Parser<unknown>[]> = {
    [i in keyof TS]: ParserType<TS[i]>
};

So type SeqType<T> maps each member in the original type (that is, each element in the tuple) to a corresponding member in the target type (again, a corresponding element in the result type tuple), declaring the type of the element in the result type tuple as ParserType<TS[i]> — that is, the type of the parser in the original tuple.

But wait, there’s more! Sometimes it’s convenient for a parser to not return any value. This is especially useful within sequences. Consider for example a theoretical parser for a C statement block:

sequence(token("{"), many(statement_parser), token("}"))

It would be great if the above statement could return an array of statements, ignoring the opening and closing braces, wouldn’t it? Consider altering the above expression to the following:

function ignore(parser: Parser<string>): Parser<void> {
        return map(parser, (s) => {});
}

sequence(ignore(token("{")), many(statement_parser), ignore(token("}")))

Now type type of our sequence is [void, string, void] -- as opposed to [string, string, string] prior to the change. With this change, we want to:

Have the sequence combinator drop undefined values; and,
Adjust our typing accordingly.

This post, as you may have noticed, is mostly about typing so I'm going to elegantly ignore point number 1 above. As to adjusting our types:

export type FilterVoid<T> = 
    T extends [infer Head, ...infer Rest] ? 
       (Head extends void ? 
           [...FilterVoid<Rest>] : 
           [Head, ...FilterVoid<Rest>]) : T;
           
export type SeqType<TS extends Parser<unknown>[]> = FilterVoid<{
    [i in keyof TS]: ParserType<TS[i]>
}>;

The above definition utilizes TypeScript’s recursive type definitions support to define a filter on a sequence of types, dropping void types. In each step of the FilterVoid<T> recursion — we include first type in the currently processed list of types if it’s not void followed by a recursive definition of the type of the rest of the list, or omit the first type and just include the recursive definition for the rest of the list. The result is that FilterVoid<[void, string, void]> ultimately becomes [string]. Having defined this void filter gadget, we can now apply it on the mapped type to complete the definition of SeqType as shown above.

A choice combinator

Similarly to the sequence combinator, the choice combinator receives a sequence of parsers. This combinator will attempt invoking each of the parsers until one succeeds; the successful result is then returned. If no parser is able to parse the input — the choice parser fails.

The type of the result of a choice operator may thus be any of the types of the input parsers. In TypeScript we can express this as a union of types: type A | B describes the set of values that are compatible with either type A or B.

Here again we’re using generic tuples and recursive type definitions to create a union type of all input parser types, as below:

type ChooseResult<E> = 
    E extends [infer Head, ...infer Tails] ? 
       ParserType<Head> | ChooseResult<Tails> : 
       never;

Supporting postfix operators

There are several useful “operators” on parsers — functions that take parsers and transform them to other parsers. Consider for example the Mapping transform discussed above — as a quick reminder, this transform takes a parser and a transformation function, and returns a parser that returns the result of applying the transformation function on the result of the input parser. The naïve approach to declaring the mapping transform was presented above -- a function, map, defined this way:

map<V extends Parser<unknown>, T>(parser: V, 
            mapper: (i: ParserType<V>) => T): Parser<T>

However given this declaration, parser definition code quickly becomes cumbersome. To state this with a little more rigor, this declaration leads to a big distance between the transformation function and the syntax cue that indicates it is a transformation function:

map(P.sequence(P.token('WHILE'),
    P.cut(),
    P.ref(() => block),
    P.token('DO’),
    P.ref(() => block),
    P.token('END’)), 
  (args: [string, Block, string, Block, string]) => 
      new WhileLoop(args[1], args[4]))

A much more fluent to read version is:

P.sequence(P.token('WHILE'),
    P.cut(),
    P.ref(() => block),
    P.token('DO'),
    P.ref(() => block),
    P.token('END')).map((args: [string, Block, string, Block, string]) => new WhileLoop(args[1], args[4]))

However this would require the Parser interface to include a map method. Given that map is just one example for an operator (there are quite a few others), and that we’d like clients of the library to introduce new operators and benefit from a simple invocation syntax — that’s not a real option. Some common javascript libraries (e.g rx.js) solve this by defining operators as functions, and adding a single method to the operator target interface that receives the operator as an argument and applies it to the target. rx.js calls that method pipe. For brevity, in this parsing generator library we call the the application function “_”. So our syntax now is:

P.sequence(P.token('WHILE'),
    P.cut(),
    P.ref(() => block),
    P.token('DO'),
    P.ref(() => block),
    P.token('END'))._(map((args: [string, Block, string, Block, string]) => new WhileLoop(args[1], args[4])))

Which is almost as fluent as including the map method directly in the Parser<T> interface. How do we declare this interface?

type WithPostfixSupport<T> = T & {
    _<R>(f: (target: WithPostfixSupport<T>) => R): WithPostfixSupport<R>;
}

export function addPostfixSupport<T>(who: T): WithPostfixSupport<T> {
    const ret = who as WithPostfixSupport<T>;
    ret._ = function(f: (target: WithPostfixSupport<T>) => any) {
        return addPostfixSupport(f(ret));
    }

    return ret;
}

The WithPostfixSupport<T> type above declares the postfix operator application method "_" on a target type. Note that the application method is a generic function parameterized by the return type — to allow strict typing. The return type of this method is the return type of the operator, but again with postfix operator application support — this is to allow chaining operators, such as parser.map(…).omit().

Diving into implementation for once, addPostfixSupport reminds us that we’re actually living in the javascript world where everything's possible and types, much like points in Who's Line Is It Anyway, don't matter. This function patches the input target adding the application method to it. The application method’s definition is quite trivial — it basically applies the transformation to the target, and recursively converts the result of the transformation to an object that supports postfix application.

With these definitions, the implementation of the map postfix operator would look like something along these lines:

export function map<S, T>(mapper: (s: S) => T) {
   return (p: Parser<S>) => {
      return new MapParser(p, mapper);
   }
}

The use of generics in the definitions of map and WithPostfixSupport allows the typescript compiler to catch a wide range of errors, such as providing a mapping function with bad input our output types. It also allows IDEs such as VSCode to offer intelligent code completion options.

Conclusion

Mapped types, conditional types, recursive type and typed tuples comprise a very powerful type system in TypeScript. Familiarizing yourself with these capability will allow you to write developer-friendly APIs that both help prevent a wide range of mistakes, and contributes to API discoverability.

As for parzing -- I'm still working on this library. Right now writing some documentation. Once this is ready, expect to find it in npm.

Oh, and also, calculators are cool.

Hmm. Perhaps I should rename Meculcalator to Calcoolator. Darn. Already taken.

I'm only doing this so you won't have to

Tales about a hobbyist's Programming and Making Endeavours