Machine learning and hammers for Coq

CoqHammer has some limited support for MathComp and proofs by reflection out of the box, see the website. However, for arithmetic goals, it’s possible to escape from the MathComp world via mczify. The idea is to first canonize a MathComp arithmetic goal via the zify tactic, and then apply the hammer tactic on the resulting goal (assuming lia with mczify doesn’t do the job directly).

2 Likes

Thank you very much! :slight_smile: thank you @spitters for your response as well. :slight_smile:

note CoqGym has it’s own discussion forum (and gitissues), that might be a better place to direct your questions about CoqGym and related topics to that project: https://github.com/princeton-vl/CoqGym/discussions

1 Like

Hey all, Alex Sanchez-Stern here, author of Proverbot9001 (which @palmskog graciously mentioned earlier in the thread). My team just put together an extended evaluation which compares our work to CoqGym’s model (ASTactic) and another more recent system by some folks from UMass Amherst (shout out to Emily First, Yuriy Brun, and Arjun Guha), TacTok. You can check it out here. Gist of it is, on the CoqGym benchmarks, we can automatically produce significantly more proofs than both tools combined!

More specifically for the topic of conversation here, our system is built without modifying Coq at all, and currently works out-of-the-box for Coq 8.9, 8.10, 8.11, and 8.12. We do this through a custom python module which launches a Serapi instance, and communicates to it through standard in and out. This has the advantage of allowing us to work on many Coq versions, and not requiring any custom building, but it can be kind of brittle at times; support for a more stable python API would be awesome, but I’m also happy to help anyone use/extend the one we’ve built.

Very nice!
The CoqGym was a bit restricted due to the problems mentioned here. Has this been addressed?

Also, you’re comment ““coquelicot” Failed (needs Coq 8.7)” seems surprising.

Yeah, I was a little surprised too, but their webpage says that 8.7 is the latest supported version, and I wasn’t able to get it to build on 8.9/8.10 at all.

As for the CoqGym issues stated, I’m not sure if we’ve addressed them, but I think that Proverbot9001 has a different approach to extracting data which might make those issues moot, or at least less of a problem. With CoqGym, there is an instrumented version of Coq, run by the authors on the projects to be used, producing a canonical set of json data. Users are expected to download and work with this data, so it’s useful for it to have a direct connection to the Coq source from which it was extracted. With Proverbot9001 however, there is no canonical extracted dataset; Proverbot9001 runs directly on uninstrumented Coq, and the user can extract data directly from any Coq source file that they can build (command line options specify a prelude directory where the _CoqProject file is sourced). Therefore, when they have extracted data, it should correspond to source files that they have directly in front of them, so there is less of a need for some other direct connection.

However, I imagine that some users would want to be able to emulate the CoqGym approach in the future, so if someone ends up doing that, and wants the ability to embed metadata in the extracted proofs which points to a git repo, I’m happy to add that support.

The coquelicot website is outdated:
http://coq.io/opam/coq-coquelicot.3.2.0.html

The observation about CoqGym is more that their database is skewed, and may not be a good benchmark.

1 Like

As pointed out by Bas, the website is outdated. (That is kind of the issue with institutional websites on which you do not have complete control.) That said, if you actually follow the link “download the latest version”, you will get Coquelicot 3.2.0, which supports all versions of Coq from 8.8 to now.

If you need a version of Coquelicot that supports Coq 8.7, you can use Coquelicot 3.0.2 (or 3.0.1). If you are using Opam, it will automatically take care of choosing a version of Coquelicot compatible with your version Coq.

Ah, great! I’ll use the source from the opam page to build, and update the results graph webpage when it finishes running.

Hi! I am curious, what are the issues? sorry I’m not a coq expert so I don’t fully understand it. Is it a coq issue or a coqgym issue?

(is it this link? Generated JSON data does not contain any link to the sources of the packages. · Issue #804 · coq/opam · GitHub)

It was a database issue. Coqgym pulled the urls from the wrong fields, thus leaving out many big libraries.
The databases have changed now; see above, but I don’t think coqgym has been updated to reflect that, which is a pity.

Thanks to the helpful comments here, I was able to get coquelicot running as part of the evaluation; I’ve updated the comparison page here to include coquelicot 3.2.0

I would be very interested what your tool does on the corn and math-classes libraries. They are very big complex libraries that were not included in CoqGym because of the indexing issue stated above.

@HazardousPeach Hi Alex! trying to understand this data extraction from coq…prover9001 vs CoqGym data extraction…but the CoqGym author mentioned he instrumented Coq because he needed to get the local context & goals (which I also need). So how did you get that without instrumenting Coq?

Also, why is instrumenting Coq so bad? Mainly due to it pinning it to a version…right?

Thanks in advance!

Hi! To your first question, I’m not sure why the CoqGym author
said they needed instrumentation to get local context and
goals. You can run a Coq file using one of the Coq interfaces
like coqtop or coq-serapi, and observe the state after each
command, including all goals and context.

It might be the case that they wanted to observe the state more
frequently than once-per-command. For instance, if you run a command
with semicolons for sequencing, like induction; simpl, you might
want to get the state after the induction but before the simpl. In
Proverbot, this is handled with a linearizing preprocessor which
unfolds semicolons wherever possible (because of propagation of
existential variables, this is sometimes not possible). But I don’t
recall any reference to that sort of thing in their paper.

I published my code for interacting with Coq through coq-serapi here:
https://github.com/HazardousPeach/coq_serapy . I haven’t included too
much documentation yet, but there’s a simple example that allows you
to step through a Coq proof in Python, and view each context. If you
find it useful, or have any questions, please get in touch!

As for the downsides of instrumenting Coq, I suppose it isn’t really
that bad. Pinning to a version isn’t desirable, but there isn’t a
totally stable api on coq-serapi or coqtop either, so you end up
requiring a version or range of versions in your interfacing code
anyway. Other than that, I’ve found that using a version of Coq that
isn’t the one in Opam can make for annoying development, making it
hard to make sure your path variable is properly synced, and I imagine
it’s even worse if you decide later you want to pull in coq libraries
(since you can’t use opam to manage them).

1 Like

For sure! Will let you know. Thanks for your kind offer :slight_smile:

I still need to decide what benchmarks I want to use. It seems there is a clear line of work on CoqGym i.e. easy to compare results with previous work so that attracted me to CoqGym…plus it’s the most popular? Also, it seemed a good size with good diverse data. But I don’t see a tone of enthusiasm for it and trying to understand the pros & cons of each option offered.

It’s useful to know the version pinning that coqtop & coq-serapi does. I would have assumed honestly that coqtop would have allowed for version changing but it does not – weird. Why is that? doesn’t it just return strings from Coq?

Ah, luckily, the choice of benchmark is orthogonal to the choice of instrumenting vs parsing serapi output.

CoqGym as a benchmark set is great IMO. While Proverbot9001 primarily evaluated on CompCert as one large project, the evaluation on CoqGym benchmarks really helped with comparing; I’m actually primarily using CoqGym in my new work. I’m not sure really what the downsides are, I think it’s primarily been used as a benchmark within the ASTactic framework, which has it’s own pros and cons, but purely as a benchmark set I don’t see anything else that compares.

As for the version thing, I guess I don’t know coqtop that well, it could be that it’s totally stable across versions. I’ve been primarily working with coq-serapi, where the same information is returned but wrapped in a bunch of metadata about the state machine, which appears to have changed between 8.8 and 8.12. I guess I just assumed small formatting things would differ in coqtop too, but maybe not.

Curious, (will number to ease response) @HazardousPeach :

  1. Does your new paper passport use your Coq-Serapi?
  2. What do you think of PyCoq (emilio’s and IBM’s)?
  3. So for using the CoqGym benchmark do you generate the data set you need using your own interaction with coq (your coq-serapi) but using some sort of json/config file that coqgym provides to know which folder/files you need to use? (I assume you use your coq-serapi to get the version of coq you personally want?)
  4. If I wanted to play around with Proverbot9001 or your current favorite prover (Passport’s?) – how would I do that?

Thanks in advance for the help!

@yangky11 just want to confirm Kaiyu, is this accurate? Thanks in advance for your time!

official response here I think: Why did CoqGym instrument/modify Coq? · princeton-vl/CoqGym · Discussion #74 · GitHub

useful general discussions, related links/discussions:

Just to clarify, the problem wasn’t how to get local context and goals, but how to get correct paths (PWD, LOAD_PATH, ML_PATH) so that running the Coq scripts wouldn’t throw errors. I wanted to record the exact paths in the compilation (when running make for a Coq project) so that I can restore them later when running a Coq script. My method was to instrument Coq to print these paths. I am not aware of an alternative method (though there could be one).

1 Like