Reasoning about higher-order program transformation?

Hello !

I am trying to formalize a program transformation which consists of lifting
“ifs” to the top-level of the term. The natural way to express this
transformation is to use a higher-order function taking a continuation as a
second argument (see function phi below).

However, this definition is hard to reason about due to the higher-order argument.
For example, I would like to prove Lemma A and Lemma B below.

I tried “defunctionalizing” the transformation, defining an inductive type for the different
continuations. However, I cannot get around the structural recursion check in Coq.

I feel this kind of problem must have a standard solution, but I’m stuck so far.

Any help or pointers to possible approaches would be appreciated!

Thanks!
Nicolas

Inductive exp :=
| CONST : nat -> exp
| IFV : exp -> exp -> exp -> exp
| ADD : exp -> exp -> exp.

(* Semantics *)

Fixpoint eval (e : exp) : nat :=
  match e with
  | CONST n => n
  | IFV e1 e2 e3 =>
    match eval e1 with
    | S _ => eval e2
    | O => eval e3
    end
  | ADD e1 e2 =>
    eval e1 + eval e2
  end.

(* Lift IFV to the top-level, duplicating the continuation *)

Fixpoint phi e k {struct e} :=
  match e with
  | CONST n => k (CONST n)
  | IFV e1 e2 e3 =>
    phi e1 (fun e1 => IFV e1 (phi e2 k) (phi e3 k))
  | ADD e1 e2 =>
    phi e1 (fun e1 => phi e2 (fun e2 => k (ADD e1 e2)))
  end.

(* The program transformation [phi] should preserve the semantics. *)

Lemma A: forall e, eval (phi e (fun x => x)) = eval e.
Abort.

(* The above is not general enough for induction on [e], so instead I try to
   prove [Lemma B] below. As stated, [B] is false because [k]
   can "inspect" its argument and do something different depending on its shape. *)

Lemma B: forall e k, eval (phi e k) = eval (k e).
Abort.

(* "Defunctionalization" : a datatype for the possible continuations *)

Inductive cont :=
| Kid : cont
| Kifv : exp -> exp -> cont
| Kadd1 : exp (* "suspended" e2 *) -> cont -> cont
| Kadd2 : exp (* "lifted" e1 *) -> cont -> cont.

(* The following does not work *)

Fixpoint apply k e :=
  match k with
  | Kid => e
  | Kifv e2 e3 => IFV e e2 e3
  | Kadd1 e2 k => phik e2 (Kadd2 e k)
  | Kadd2 e1 k => apply k (ADD e1 e)
  end

with phik e (k : cont) :=
  match e with
  | CONST n => apply k (CONST n)
  | IFV e1 e2 e3 => phik e1 (Kifv (phik e2 k) (phik e3 k))
  | ADD e1 e2 => phik e1 (Kadd1 e2 k)
  end.

A common way to phrase inductive lemmas for functions in CPS is to associate, to the continuation k, a hypothesis of the form H : forall r, Q r -> P (k r), where P is mostly arbitrary (because k is whatever the caller chooses), and Q describes the results k is applied to. Q is the post-condition of your program phi e, expected before calling the continuation, and P is the post-condition of the continuation.

(* Common form of specification for phi, a function in CPS. For some fixed Q: *)
Lemmas my_lemma : forall e k P, (forall r, Q e r -> P (k r)) -> P (phi e k).

In this case, phi e k intends to apply the continuation k to a subexpression that’s equivalent to e (also a bunch more which will be ignored at run-time), so that’s our Q. As it turns out, the post-condition P must be semantic, so it is composed with eval below, that’s why it has type nat -> Prop instead of exp -> Prop that one might try naively.

Lemma A' : forall e k (P : nat -> Prop),
  (forall e', eval e' = eval e -> P (eval (k e'))) -> P (eval (phi e k)).
Proof.
  intros e; induction e; cbn; intros.
  - apply H. reflexivity.
  - apply IHe1; intros; cbn. rewrite H0.
    destruct (eval e1) eqn:E1.
    + apply IHe3. auto.
    + apply IHe2. auto.
  - apply IHe1; intros; cbn. apply IHe2; intros; cbn.
    apply H; cbn. congruence.
Qed.

Lemma A: forall e, eval (phi e (fun x => x)) = eval e.
Proof.
  intros e; apply A'. auto.
Qed.

Lemma B: forall e k, (forall e', eval e' = eval e -> eval (k e') = eval (k e)) -> eval (phi e k) = eval (k e).
Proof.
  intros e k; apply A'; auto.
Qed.

A more syntactic alternative to CPS is to break phi into two steps conceptually: (1) construct a tree whose leaves are where the continuation needs to be applied; (2) apply the continuation. In this case, (1) is arguably what phi is all about, and its implementation happens to interleave (1) (phi) and (2) (subst).

Fixpoint subst (e : exp) (k : exp -> exp) : exp :=
  match e with
  | IFV e1 e2 e3 => IFV e1 (subst e2 k) (subst e3 k)
  | e => k e
  end.

Fixpoint phik (e : exp) : exp :=
  match e with
  | CONST n => CONST n
  | IFV e1 e2 e3 => subst (phik e1) (fun e1 => IFV e1 (phik e2) (phik e3))
  | ADD e1 e2 =>
    subst (phik e1) (fun e1 =>
    subst (phik e2) (fun e2 =>
    ADD e1 e2))
  end.

It makes the correctness proof somewhat easier, since you are going to prove simply that eval e = eval (phik e), by induction on e. (However it requires some auxiliary lemmas about subst that involve additional uses of induction.)


If you squint, or make the invariants more explicit in the types, you can spot a monad in all of this (cf bind), as they have a lot to do with continuations:

Set Implicit Arguments.

Inductive aexp : Type :=
| AAdd : aexp -> aexp -> aexp
| AConst : nat -> aexp
.

Inductive bexp (A : Type) : Type :=
| If : aexp -> bexp A -> bexp A -> bexp A
| Const : A -> bexp A
.

Fixpoint bind {A B} (e : bexp A) (k : A -> bexp B) : bexp B :=
  match e with
  | If e1 e2 e3 => If e1 (bind e2 k) (bind e3 k)
  | Const a => k a
  end.

Fixpoint phik (e : exp) : bexp aexp :=
  match e with
  | CONST n => Const (AConst n)
  | IFV e1 e2 e3 => bind (phik e1) (fun e1 => If e1 (phik e2) (phik e3))
  | ADD e1 e2 =>
    bind (phik e1) (fun e1 =>
    bind (phik e2) (fun e2 =>
    Const (AAdd e1 e2)))
  end.
1 Like

Thanks for the insighful answer @Lyxia! I have two follow-up questions:

Wouldn’t this approach just shift the “continuation induction” stuff that you explained above to the auxiliary lemmas about subst?

Nice. Assuming one switches to using this monad, would that simplify the proof or one would have to perform “continuation induction” as above but for bind this time?

Thanks!
Nicolás

In a way this is only shifting the burden of proof around without truly reducing it, but once you spot subst (aka bind), one idea is that subst and evalb are both examples of “folds” over expressions, and that enables an equational flavor of reasoning (see also Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire), as opposed to reasoning about arbitrary continuations.

Fixpoint fold {A R} (f : aexp -> R -> R -> R) (g : A -> R) (e : bexp A) : R :=
  match e with
  | If e1 e2 e3 => f e1 (fold f g e2) (fold f g e3)
  | Const a => g a
  end.

fold satisfies the following equation, when f/f' and g/g' are suitably related by h:

    fold f' g' e = h (fold f g e).

Full version:

Lemma fold_mor {A R R'} (f : aexp -> R -> R -> R) (f' : aexp -> R' -> R' -> R') (g : A -> R) (g' : A -> R') (h : R -> R') (e : bexp A)
  : (forall a x1 x2, f' a (h x1) (h x2) = h (f a x1 x2)) ->
    (forall x, g' x = h (g x)) ->
    fold f' g' e = h (fold f g e).
Proof.
  intros H1 H2; induction e; cbn; congruence.
Qed.

This is “just” a repackaging of induction on bexp, but where you don’t have to worry about induction hypotheses. Case in point, in the final theorem evalb (phik e) = eval e, the induction hypotheses can be used immediately and the rest of the proof is pure equational reasoning about evalb and subst using the above lemma: link to code.