# PHOAS encoding of pattern matching notations

Hello everyone,

Today I was inspired by this comment of @herbelin to try and embed a programming language that supports pattern matching of product types in Coq using the PHOAS method. And of course, this needs to come with some reasonable notations.

So, I’m starting with a simple PHOAS encoding of lambda calculus with products types, that can be pattern matched in `Lam`:

``````From Ltac2 Require Import Ltac2.
Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint p2p p : Type :=
match p with
| leaf      => X
| ppair l r => p2p l * p2p r
end.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).
``````

Now I want to create some notations for this, such as `[λ (a, (b, c)), a b]`. This notation should elaborate to `Lam (ppair leaf (ppair leaf leaf)) (fun '(a, (b, c)) => a b)`. This is a bit of a problem though, because the `(a , (b, c))` part of the notation needs to generate both `ppair leaf (ppair leaf leaf)` and the real Coq pattern `'(a, (b, c))`. This is problematic, because `pattern`s need to be applicative in Coq, and cannot be computed within the notation. I came up with a solution using Ltac that converts a nested products to `prod_pattern`:

``````Ltac to_pattern x :=
match x with
| (?x , ?y) => let a := to_pattern x in
let b := to_pattern y in
constr:(ppair a b)
| _ => constr:(leaf)
end.

Notation "[ 'λ' pat , k ]" :=
(Lam ltac:(let r := to_pattern pat in exact r) (fun pat => k))
(at level 0, pat pattern, only parsing).
``````

Crazily, this actually almost works! The only problem is that when the `ltac` part gets evaluated, the system tries to resolve the `binder`s in the pattern, and errors out:
`Definition test := [λ (a, (b, c)) , App (Var a) (Var b)]. (* The variable a was not found in the current environment. *)`
This can be solved by introducing some bogus variables:

``````Section test.
Variable a b c d e f : X.
Definition test := [λ (a, (b, c)) , App (Var a) (Var b)].
End test.
Print test. (* test = Lam (ppair leaf (ppair leaf leaf)) (fun '(a, (b, _)) => App (Var a) (Var b)) : term *)
``````

I tried if Ltac2 may be better suited for this problem because it appears to be working on preterms, but something as simple as the following already errors out:

``````Notation "[[ 'λ' pat , k ]]" :=
(Lam ltac2:(exact leaf) (fun pat => k))
(at level 0, pat pattern, only parsing).
Check [[λ a, Var a]]. (* Missing notation term for variable k, probably an ill-typed expression *)
``````

Does anybody have a better solution for this problem?

You could use canonical structures to reify the pattern:

``````Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Record R := { pat : prod_pattern; p2p : Type }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Canonical Structure Rppair l r :=
{| pat := ppair (pat l) (pat r); p2p := (p2p l * p2p r) |}.
Canonical Structure Rleaf :=
{| pat := leaf; p2p := X |}.

Notation "[ 'λ' pat , k ]" :=
(Lam _ (fun pat => k))
(at level 0, pat pattern).

Definition test := [λ (a, (b, c)) , App (Var a) (App (Var c) (Var b))].
``````

Note that it works for printing too.

There is still a limitation though: all variables have to be bound in the right-hand side otherwise, it fails to infer that variables have type `X`. Maybe experts in canonical structures have a solution though.

Edit: I renamed `interp` into `p2p`.

1 Like

This is a very intriguing problem…

The problem with the CS-based approach seems that Coq can’t infer the type of the unused variables. Sadly, patterns don’t support casts (like `(a : X, b)`), but there is a hacky way to force Coq to unify a variable with a type. However, this intrudes some clutter in the notation…

``````Definition K : X -> term -> term := fun x y => y.

(** Casts [x] to the type [X] *)
Notation "'castvar' x 'in' y" :=
(ltac:(match type of (K x y) with
| _ => exact y
end))
(at level 0, only parsing, y at level 200).

Check fun '(x,y) => castvar x in y.
(* fun '(_, y) => y *)
(*      : X * term -> term *)

Definition ex :=
[λ (a, (b, c)), castvar c in App (Var a) (Var b)].
Print ex.
(* ex = [ λ (a, (b, _)), App (Var a) (Var b)] *)
(*      : term *)
``````
2 Likes

Ah, finally a reason for me to learn how Canonical Structures work! Normally I tend to use typeclasses. Apart from the problems pointed out, there is another issue. The way `R` is stated now, means that one cannot prove anything about the `f` parameter of `Lam`. This is because there are no (provable) restrictions on its domain. We can fix this using a correctness predicate.

``````Require Import Program.
Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint interp p : Type :=
match p with
| leaf      => X
| ppair l r => interp l * interp r
end.

Record R := { pat : prod_pattern; p2p : Type; correct : interp pat = p2p }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Program Canonical Structure Rppair l r :=
{| pat := ppair (pat l) (pat r); p2p := (p2p l * p2p r) |}.
Next Obligation.
destruct l, r; f_equal; cbn; assumption.
Defined.
Program Canonical Structure Rleaf :=
{| pat := leaf; p2p := X |}.

Notation "[ 'λ' pat , k ]" :=
(Lam _ (fun pat => k))
(at level 0, pat pattern).

End term.
``````

I suspect that it may be possible to simplify this further. I’ll have to take another look tomorrow. Thanks!

1 Like

Actually, this enters in the general category of “unification hints”. There exists Coq support for inverting equations of the form `proj ?x = value` (for `?x` of type record), but there could as well be a support for inverting recursive functions, i.e. for inverting equations of the form `interp ?x = value` (referring to the recursive function `interp` of your last post).

So I did some reading on this, this and the paper suggested by @herbelin, and I think I have come up with a real Frankenstein solution that does not require mentioning names like @mwuttke97’s solution needs .

There are two problems with the canonical structures solution. (1) It cannot infer un-named variables. (2) Until now, we have done all notations within the `term` section, but outside the section terms actually need to have the type `forall X, term X`. So the notation have to generate this too.

For problem (2) it makes sense to use a typeclass `Class var := { X : Type }` to infer the type `X`. For (1) after reading the suggestion in the second paper for a `failsafe` canonical structure, and the option to combine typeclasses and canonical structures, I came up with a hybrid solution where the base case is solved by an instance and the recursive cases are solved by a canonical structure.

There are now two issues left

1. The generated terms are a bit messy because they contain proofs. Not the end of the world though.
2. I have not yet managed to convince Coq to print the notations.

Any suggestions to improve this further?

``````Require Import Program.
Section term.

Context {X : Type}.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint interp p : Type :=
match p with
| leaf      => X
| ppair l r => interp l * interp r
end.

Record R := { pat : prod_pattern; p2p : Type; _ : interp pat = p2p }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Program Canonical Structure Rppair  l r :=
Build_R (ppair (pat l) (pat r)) (p2p l * p2p r) _.
Next Obligation.
destruct l as [? ? <-], r as [? ? <-]. reflexivity.
Qed.

End term.
Arguments term X : clear implicits.

Class var := { X : Type }.
Existing Class R.
Instance Rleaf `(var) : R := Build_R leaf X eq_refl.

Declare Custom Entry dsl.
Notation "[ t ]" := (fun `(var) => t) (t custom dsl).
Notation "x" := (Var x) (in custom dsl, x ident).
Notation "( x )" := x (in custom dsl).
Notation "x y" := (App x y) (in custom dsl at level 0).
Notation "'λ' pat , k" :=
(@Lam X _ (fun pat => k))
(in custom dsl at level 0, pat pattern, k at level 0).
Notation "'λ' pat , k" :=
(Lam _ (fun pat => k))
(in custom dsl at level 0, pat pattern, k at level 0, only printing).

Definition test := [λ (a, (b, _)) , a (a b) b ((b) b) a ].
Print test.
Eval compute in test.
``````
1 Like

By the way, has any work been done on these unification hints in practice, or is this just a theoretical exercise at the moment?