Homotopy type theory (HoTT) and Cubical type theory (CubicalTT) have been used to formalize foundations of mathematics, and have inspired extensions to Coq such as SProp.
However, demonstrated applications of HoTT and CubicalTT in regular verification tasks, such as for program verification, are to my knowledge few (e.g., SQL rewriting, patch theory, encoding pi-calculus, and probabilistic programming), and the general extent to which univalence increases the productivity of proof engineers is unknown.
Based on a discussion with @spitters on Coq’s Gitter, I would like to raise the issue of finding relevant benchmarks and case studies for comparing verification using vanilla Coq (pCUIC), HoTT for Coq, and Cubical Agda.
Interesting hard measures to compare include:
- lines of spec code
- lines of proof script code
- effort to complete in person hours
Softer measures such as reusability, comprehensibility, extendability could also be considered.
Here are some candidates for benchmarks:
- correctness of abstract data type implementations and algorithms from Okasaki’s book Purely Functional Data Structures (several of these have already been done for HOL4 in CakeML)
- definitions and (category) theory from the book Algebra of Programming by Bird and de Moor (some of this has already been done in plain Agda)
- metatheory and examples for some powerful modal logic, such as the modal ÎĽ-calculus, e.g., as presented by Bradfield and Stirling
The comparison approach could be similar to a code golfing competition, i.e., the goal is to get definitions and proofs that are as elegant and short as possible.
The literature contains many comparisons of different systems for specific verification tasks that can provide inspiration, e.g., correctness of a graph algorithm by Tarjan in Coq, Why3, and Isabelle/HOL.