Understanding and Implementing Pattern Matching

2023-10-21 :: racket, tutorials, programming-languages, understand-and-implement

By: Mike Delmonaco

Pattern matching is a very powerful tool used to destructure and perform case analysis on data. It’s commonly found in more academic functional languages and has recently made its way into Python. In this post, we’ll discover pattern matching and implement it in Racket.

I will assume that you have some familiarity with Racket. We’re going to be writing some macros, but general familiarity with macros should be enough, we’re not doing anything fancy.

1 Motivation

2 match

3 Implementation

4 Bonus: Core Patterns

1 Motivation

If you’re already familiar with pattern matching, feel free to skip to the implementation.

Before we get to pattern matching, let’s talk about trees. Let’s say we’re trying to find the largest element in a binary tree. You can do this with predicates and accessors:

> (struct node [left right] #:transparent)

> (struct leaf [data] #:transparent)

> (define (bt-max bt)

(cond

[(node? bt) (max (bt-max (node-left bt)) (bt-max (node-right bt)))]

[(leaf? bt) (leaf-data bt)]))

> (bt-max (leaf 1))

> (bt-max (node (leaf 1) (node (leaf 3) (leaf 2))))

Easy enough. Now, let’s reflect a binary tree to create its mirror image:

> (define (bt-reflect bt)

(cond

[(node? bt)

(node (bt-reflect (node-right bt))

(bt-reflect (node-left bt)))]

[(leaf? bt) bt]))

> (bt-reflect (leaf 1))

(leaf 1)

> (bt-reflect (node (leaf 1) (leaf 2)))

(node (leaf 2) (leaf 1))

> (bt-reflect (node (leaf 1) (node (leaf 2) (leaf 3))))

(node (node (leaf 3) (leaf 2)) (leaf 1))

This looks pretty similar to the previous function. In fact, it’s not hard to imagine pretty much every function on trees looking just like this: Check if it’s a node with node? use field accessors to get the left and right subtrees, check if its a leaf with leaf?, and use field accessors to get the data.

Let’s be good little programmers and avoid repeating ourselves by creating an abstraction:

> (define (bt-cases bt on-node on-leaf)

(cond

[(node? bt) (on-node (node-left bt) (node-right bt))]

[(leaf? bt) (on-leaf (leaf-data bt))]))

> (define (bt-max bt)

(bt-cases bt

(lambda (left right) (max (bt-max left) (bt-max right)))

(lambda (data) data)))

> (bt-max (leaf 1))

> (bt-max (node (leaf 1) (node (leaf 3) (leaf 2))))

This is a little cleaner. It got rid of the predicates and accessors, but there is still a little boilerplate with those lambdas. To fix this, we can go one step further and make a macro!

> (define-syntax-rule

(bt-match bt [(left right) node-body ...] [(data) leaf-body ...])

(bt-cases bt

(lambda (left right) node-body ...)

(lambda (data) leaf-body ...)))

> (define (bt-max bt)

(bt-match bt

[(left right) (max (bt-max left) (bt-max right))]

[(data) data]))

> (bt-max (leaf 1))

> (bt-max (node (leaf 1) (node (leaf 3) (leaf 2))))

Very nice! We have a concise syntax for defining functions on binary trees. One limitation is that we can’t look deeper than one level into the data structure. If we were doing something like tree rotations, we’d need to look 2 levels into the structure, but these tools wouldn’t support that. Another limitation is that this only works for binary trees! Do we have to make this every time we work with union data?

Let’s look at some examples with lists. First, let’s split a list into pairs:

> (define (to-pairs lst)

(if (<= 2 (length lst))

(cons (list (first lst) (second lst)) (to-pairs (rest (rest lst))))

'()))

> (to-pairs '())

'()

> (to-pairs '(1))

'()

> (to-pairs '(1 2))

'((1 2))

> (to-pairs '(1 2 3))

'((1 2))

> (to-pairs '(1 2 3 4))

'((1 2) (3 4))

> (to-pairs '(1 2 3 4 5))

'((1 2) (3 4))

> (to-pairs '(1 2 3 4 5 6))

'((1 2) (3 4) (5 6))

Now let’s zip two lists together:

> (define (zip xs ys)

(cond

[(and (cons? xs) (cons? ys))

(cons (list (first xs) (first ys)) (zip (rest xs) (rest ys)))]

[else '()]))

> (zip '() '())

'()

> (zip '(1) '())

'()

> (zip '(1) '(a))

'((1 a))

> (zip '(1 2 3) '(a b c))

'((1 a) (2 b) (3 c))

> (zip '(1 2) '(a b c d))

'((1 a) (2 b))

There is a similar pattern, but more general: We have a few possible cases for the shape of our data. We check the possible cases, and based on the shape, we extract pieces of our data and operate on them. Except this isn’t as straightforward and abstract-able as our very repetitive binary tree operations. There is still an abstraction to be made, but it’s a much more general and powerful one. This abstraction is, of course, pattern matching.

2 match

Here is how we would implement a tree operation with pattern matching:

> (define (bt-max bt)

(match bt

[(node left right) (max (bt-max left) (bt-max right))]

[(leaf data) data]))

> (bt-max (leaf 1))

> (bt-max (node (leaf 1) (node (leaf 3) (leaf 2))))

It’s pretty much the same as our bt-match, except now we have to specify which constructor we’re matching in which case.

The general form for using match is (match val [pattern body] ...) where val is the value you’re destructuring, pattern is a pattern, which specifies the shape of the data for this case and may bind variables to its fields, and body has access to these fields. A match form can have many cases. The first case with a pattern that matches the shape of the data binds the variables in its pattern to the corresponding pieces of val and runs that case’s body.

In this example, we have a node pattern which binds the subtrees to variables called left and right. This pattern matches when the value being matched is a node and matches the sub-patterns against the sub-trees. Variable patterns like left match any type of value and bind the value to that variable for use in the body.

Here is how we would implement the list operations with pattern matching:

> (define (to-pairs lst)

(match lst

[(cons x (cons y lst))

(cons (list x y) (to-pairs lst))]

[_ '()]))

> (define (zip xs ys)

(match (list xs ys)

[(list (cons x xs) (cons y ys))

(cons (list x y) (zip xs ys))]

[_ '()]))

In the to-pairs example, we first check for the case of a list with at least two elements by using two cons patterns. A cons matches values that are cons pairs and matches the first sub-pattern agains the car and the second against the cdr of the value.

We also see the underscore pattern _, which matches against any value, like a variable pattern, and ignores the value. It is often used like else in a cond, but can also be used to ignore a field as a subpattern.

Here, we have a usage of a nested pattern. We have the pattern (cons x (cons y lst)). The pattern that matches the cdr of lst is another cons pattern. The ability to nest patterns like this allows us to check deeply into data structures, which is something we couldn’t do with our bt-match macro.

In the zip example, we’re matching against two values xs and ys. A simple trick for doing this is creating a list with two values and matching the list. Here, we use the list pattern which can take an arbitrary number of sub-patterns. Intuitively, it matches values which are lists with length equal to the number of sub-patterns and matches each element of the list on its corresponding sub-pattern. In this case, since we’re matching against the value (list xs ys), we use two sub-patterns (cons x xs) and (cons y ys). If they are both conses, we can cons both elements to the zipped list by recurring. Otherwise, one of the lists must be empty, so we just return the empty list.

Pattern matching is very powerful. We can check the shape of our data structures, reach deeply into them, check multiple cases, and even perform case analysis on multiple values at once.

Let’s do one more example just for fun. Let’s take in a list of left/right steps representing a path in a binary tree and get the data at the leaf specified by the path:

> (define (bt-get bt path)

(match (list bt path)

[(list (node bt _) (cons 'left path))

(bt-get bt path)]

[(list (node _ bt) (cons 'right path))

(bt-get bt path)]

[(list (leaf data) '())

data]

[(list _ '())

(error 'bt-get "path too short")]

[(list _ (cons _ _))

(error 'bt-get "path too long")]))