Announcing Graph2Tac, a prover based on Tactician's new API

We are pleased to announce Tactician’s API, a new AI interface for theorem proving, building on Tactician. This includes a new graph-based dataset of over 520k definitions (of which 260k are theorems) in 120 Coq packages, one of the largest datasets for AI interactive theorem proving. We also present a new state-of-the-art neural theorem proving solver Graph2Tac, designed for proving theorems in Coq projects with new definitions not seen during training.

The main contributions in this work are as follows:

  1. A novel method of calculating an internal representation of definitions and theorems, giving Graph2Tac a deeper semantic understanding of a proof state and which lemmas are appropriate for it. Graph2Tac can create representations of objects that were not seen during training, allowing it to perform well even on new Coq projects.
  2. One of the most comprehensive studies yet of various AI methods in interactive theorem proving including k-NN solvers, transformers, graph-based models, and hammers.
  3. An interface to Coq making it possible to train and connect your own custom machine learning models.
  4. A benchmarking system making it easy to give an apples-to-apples comparison to our work.

For Coq users, our neural models Graph2Tac and Text2Tac, are available as part of Tactician and can be run on a laptop (no GPU required). One can use Tactician’s Suggest command to suggest tactics, and synth tactic to find a complete proof of a proof state.

The details are spelled out in these three papers:

Dataset: Our dataset, which can be explored online, faithfully represents the internal representation of Coq’s universe of mathematical knowledge as a single interconnected graph. The visualization includes hierarchies of modules, global context information, definitions, tactical proofs, and tactic proof transformations.

The dataset contains two representations. The text-based, human-readable representation is useful for training language models. The graph-based representation is designed such that two terms are alpha-equivalent terms if and only if the forward closure of their graphs is equivalent (bisimilar). This allows us to merge alpha-equivalent subterms, heavily compressing the dataset. Having a densely connected graph makes it ideal for graph-based machine learning models. To support this term-sharing, we introduce a novel graph-sharing algorithm with O(n log n) complexity.

Graph2Tac: In practical AI theorem proving, one of the main challenges is handling new definitions and theorems not seen during training. We want a model which can work online, adapting itself to users’ new projects in real time, so we train on one set of projects and test on another set never seen during training. Our novel neural theorem proving architecture, Graph2Tac, adds a new definition task mechanism that improves theorems solved from 17.4% to 26.1%. For example, even though our model has never before seen the Poltac package, it can solve 86.7% of Poltac theorems, more than any other Coq theorem prover in the literature, including ProverBot9001 and CoqHammer.

Our definition task works by learning an embedding for each definition and theorem in the training set, and then simultaneously training a model to predict those learned embeddings. At inference time, when we encounter a new definition not seen during training, we use this definition task to compute its embedding directly.

Our work contains one of the most extensive comparisons with other proving methods, including CoqHammer, a baseline transformer, and Tactician’s built-in k-NN model. The transformer model is similar to those used in GPT-f, PACT, Lisa, etc. The k-NN model is also an online model, learning from proofs or recent theorems, and is actually still an excellent model for short time periods, say one minute, whereas Graph2Tac excels at the longer 10-minute mark. Appendix H of our paper also provides an informal comparison with Proverbot9001 and CoqGym family of solvers. We hope these comparisons will provide a lot of discussion and move the field forward.

Tools for AI research We provide a lot of tools for AI researchers who would like to compare with or build on our results and for Coq developers who would like to build practical tools for Coq users.

  • Interaction protocols: External agents can interface with Coq, either by providing tactic predictions for Tactician’s search procedure for the synth tactic, or by exploring the proof tree themselves through the Tactician Explore command. Agents receive a full copy of Coq’s internal kernel knowledge, which they can utilize to make decisions.

  • PyTactician: A Python library to facilitate reading the dataset and interface with Coq.

  • Benchmarking: Tools to evaluate the proving strength of agents on arbitrary Opam packages. Benchmarks can be run on a laptop, a high-powered server and even massively parallelized on a High Performance Computing (HPC) cluster.

We would love to receive feedback both from Coq users and AI researchers, including possible future collaborators!

Lasse Blaauwbroek, Jason Rute, Miroslav Olšák, Fidel Ivan Schaposnik Massolo, Jelle Piepenbrock, Vasily Pestun

3 Likes