What may not be clear to readers in a lot of the previous discussions is the use of Algebraic Data Types in combination with patterm matching to define functions. It’s really quite simple, conceptually (implementation may be a different matter, we’ll see.) Here’s an example we’ve seen before, I’ll just be more descriptive this time:
1 |
typedef list(t) { cons(t, list(t)) | null } |
This declaration achieves two things:
- It defines a type
list(t) (
list of
t) where t is a type variable that can stand for any type. - It creates two constructor functions, called cons and null, that accept arguments of the specified types (none in the case of null,) and return data of type list(t).
Reading it aloud, it says define a type
which is either a list
of some unspecified type t
cons
of a
t and a
list of
t, or a
null.
Once defined, we use these type costructors to create lists of a concrete type:
1 2 3 |
def a = cons(true, cons(false, cons(false, null))); |
After the above definition, a
has type
list(bool). The following, on the other hand, would fail to type check:
1 |
def a = cons(true, cons('x', null)); |
It fails because:
- cons('x', null) is of type list(char) .
- The outer
cons
expects arguments <t> and list(<t>) , but it gets bool and list(char) . - The outer
cons
cannot reconcile <t> = bool with <t> = char so the type check fails.
That’s all very nice, but how can we use Algeraic Data Types? It turns out that they become very useful in combination with pattern matching in case statements. Consider:
1 2 3 4 5 6 7 8 9 10 |
fn length(a) { switch(a) { case (cons(head, tail)) { 1 + length(tail) } case (null) { 0 } } } |
In that case statement, a
must match either
cons(head, tail) or
null. Now if it matches
cons(head, tail), the (normal) variables
head and
tail are automatically created and instantiated as the relevant components of the
cons in the body of the case statement. This kind of behaviour is so commonplace in languages like ML that special syntax for functions has evolved, which I’m borrowing for F♮:
1 2 3 4 5 6 7 8 |
fn length { (cons(head, tail)) { 1 + length(tail) } (null) { 0 } } |
This version of length, instead of having a single formal argument list outside the body, has alternative formal argument lists inside the body, with mini bodies of their own, just like a case statement. It’s functionally identical to the previous version, but a good deal more concise and readable.
One thing to bear in mind, in both versions, is that length has type list(t) → int. That is to say, each of the formal argument lists inside the body of a function, or the alternative cases in a case statement, must agree in the number and types of the arguments, and must return the same type of result.
Now, it becomes obvious that, just as we can rewrite a let to be a lambda, this case statement is in fact just syntactic sugar for an anonymous function call. The earlier definition of length above, using a case statement, can be re-written as:
1 2 3 4 5 6 7 8 9 10 |
fn length(a) { fn { (cons(head, tail)) { 1 + length(tail) } (null) { 0 } }(a) } |
so we get case statements: powerful, pattern matching ones, allowing more than one argument, for free if we take this approach.
length is polymorphic. It does not do anything to the value of head
so does not care about its type. Therefore the type of length
, namely
list(t) → int actually contains a type variable
t.
Here’s a function that does care about the type of the list:
1 2 3 4 5 6 7 8 |
fn sum_strlen { (cons(head, tail)) { strlen(head) + sum_strlen(tail) } (null) { 0 } } |
Assuming strlen
has type
string → int, that would constrain
sum_strlen to have type
list(string) → int. Of course that’s a rather silly function, we would be better passing in a function like this:
1 2 3 4 5 6 7 8 |
fn sum { (fn, cons(h, t)) { fn(h) + sum(fn, tail) } (fn, null) { 0 } } |
That would give sum
a type:
1 |
(t -> int) -> list(t) -> int |
and we could call it like:
1 2 3 4 |
def a = cons("hello", cons(" ", cons("there", null))); def len = sum(strlen, a); |
or even, with a Curried application:
1 |
def sum_strlen = sum(strlen); |
This is starting to look like map-reduce. More on that later.
Real-World Applications
Algebraic Data Types really come in to their own when it comes to tree walking. Consider the following definitions:
1 2 3 4 5 6 7 |
typedef expr(t) { value(t) | add(expr(t), expr(t)) | sub(expr(t), expr(t)) | mul(expr(t), expr(t)) | div(expr(t), expr(t)) } |
Given that, we can write an evaluator for arithmetic expressions very easily:
1 2 3 4 5 6 7 |
fn eval { (value(i)) { i } (add(l, r)) { eval(l) + eval(r) } (sub(l, r)) { eval(l) - eval(r) } (mul(l, r)) { eval(l) * eval(r) } (div(l, r)) { eval(l) / eval(r) } } |
So eval
has type
expr(int) → int . We can call it like:
1 2 3 |
eval(add(value(2), mul(value(3), value(5)))); |
to get 17.
Pattern matching not only covers variables and type constructors, it can also cope with constants. For example here’s a definition offactorial
:
1 2 3 4 |
fn factorial { (0) { 1 } (n) { n * factorial(n - 1) } } |
For this and other examples to work, the cases must be checked in order and the first case that matches is selected. so the argument to factorial would only match n if it failed to match .
As another example, here’s member:
1 2 3 4 5 |
fn member { (item, item @ tail) { true } (item, _ @ tail) { member(item, tail) } (item, []) { false } } |
Here I’m using F♮’s built-in list type constructors
@, (pronounced cons
,) and
[] (pronounced null
,) and a wildcard
_ to indicate a don’t care
variable that always unifies, but apart from that it’s just the same as the
cons and
null constructors. Anyway, the cases say:
- member(item, list) is true if item is at the head of the list.
-
member(item, list) is
true if
item
is amember
of the tail of the list. - item is not a member of the empty list.
Problems and Solutions
You’ve probably realised that given a type like list(t) above, it’s not possible to directly create lists of mixed type. That is because it is usually a very bad idea to do so. However if you need to do so, you can get around the restriction without breaking any rules, as follows:
- Create a container type for your mixed types:
1typedef either(t, u) { first(t) | second(u) } - Create lists of that type:
1def a = [ first("a string") , second(20) ]
After the above definition, a
has type list(either(string, int))
, and you can’t get at the data without knowing its type:
1 2 3 4 5 |
fn sum_numbers { (first(_) @ tail) { sum_numbers(tail) } (second(n) @ tail) { n + sum_numbers(tail) } ([]) { 0 } } |
Here, sum_numbers has type [either(<t>, int)] → int. e.g. it doesn’t care what type first holds. We could have written first(s) instead of first(_), but the use of a wildcard _explicitly says we don’t care, stops any potential warnings about unused variables, and is more efficient.