Is there a full documentation of Coq's grammar?

Looking for the grammar at the doc requires too much time. Is there a nice page or pdf that full grammar of coq is listed?

1 Like

The closest thing is in the files fullGrammar and orderedGrammar in the source code. The first shows the grammar without the actions and minimal changes compared to the .mlg files. The second shows a grammar that has been edited for readability. These are big files, over a thousand lines each. But neither is intended to be read by end users.

We’re in the midst of updating the grammar in the doc, which was very much out of date. Look at the Terms section of the Gallina chapter to see what the updated doc looks like. I think the hyperlinks of nonterminals make that easier to read.

The updates to the Terms section that I mentioned don’t appear in the HTML doc linked from the website. You would need the next release or to build from the source code to see them.

The updates Jim is talking about appear in the “development” version of the reference manual, which is continuously deployed at https://coq.github.io/doc/master/refman/.

But more importantly the question is: why do you need to see the full Coq grammar? If this is to write your own parser of Coq’s grammar, that’s something that is very much advised against. I could write more about why it is so and what are the alternative ways of parsing a Coq document if you want.

It should also be mentioned that the Coq grammar is extensible, some more information about this here.

2 Likes

I was just hoping if I could get better understandings in coq with the grammar, but I would like to try what others don’t try at all. Why do you not recommend it, and what are the alternative ways?

Apologies, I read it but 100% understand it. Does it mean that Coqs language is ever evolving?

I am not a FM/PL expert but somehow had the impressions grammars where fixed a prior.

First, I should mention that the manual now contains some warning regarding parsing your own grammar (cf. the note and warning at https://coq.inria.fr/refman/language/core/basic.html#syntax-and-lexical-conventions).

Now, to answer Brando’s question, the language of Coq is not like any standard programming language. Standard programming languages do indeed generally have a small grammar that is rarely extended and most of the features are provided, not through special parsing constructs, but through primitive functions. On the contrary, the grammar of Coq is very large and constantly expanding. Most commands / tactics have their own parsing constructs, plugins add new ones, and the notation system can be used to expand the grammar even further (and that’s what most Coq libraries do). [Even the simple task of splitting a Coq document into sentences is something complex that most Coq editors do not actually handle properly unless they delegate this task to Coq.]

3 Likes

Thanks Theo! That was very useful.

I have two question:

  1. I am very curious to understand why Coq has such a seemingly complicated grammar lifestyle. Why does it seem so different?
  2. How do you “parse” a Coq file into it’s individual sentences? I was interested in extracting all tactics of a Coq script (and execute them with serapi, basically execute the script with serapi). I know tactics end with a .. Is that enough to parse a proof script?

Thanks for the informative feedback!

I could explain why it’s useful to have such complicated grammar but the question of “why” should rather be answered by people that were there in the beginnings of Coq. But for instance, the very powerful notation system is generally considered to be an ingredient of the success of Coq.

If you are planning to rely on SerAPI, then it is best that you let SerAPI do the splitting into sentences for you (this is something that it supports). Splitting sentences based on . is an approximation that works relatively well but there will be edge cases. An example (which I didn’t invent) is the following:

Notation "( a . b )" := (a, b).
Check (1 . 2).

It is an extreme example because most IDEs (including CoqIDE) won’t accept it (because they try to do the sentence splitting by themselves but are not aware of the notation system) but coqc will. And yet, the sentence splitting code in CoqIDE is already quite complex (cf. https://github.com/coq/coq/blob/51ecccef0308eceec1ddd9776a03fd993b3ea71a/ide/coqide/coq_lex.mll#L102-L154).

2 Likes

Thanks Theo. Let’s hope we someone can chip in for the rich lifestyle that Coq’s grammar has. I am very curious about it!

Also, is there a difference between the (evolving) tactic grammar and the (evolving) proof script grammar and (evolving) formula Coq term grammar? Or are they all the same under the (evolving) proof script grammar?

Thanks again!

Hi Theo,

Thanks for your response.

What I guess I am still confused is how it is possible to specify a correct Coq program if the grammar is always extensible. Let me put it this way. If I want to specify a Coq Tactic since it’s simply a program I could just write OCaml code for it (as far as I understand). If that is correct, isn’t it that at the core the grammar of Coq is the same as the grammar that implements Coq (in this case OCaml if I am understanding correctly).

Thanks for your time!


cross-posted:

If you implement a tactic using OCaml, you are extending Coq with a plugin. Your tactic is not part of your Coq code.

Furthermore, you don’t need to know about the full grammar to prove a program correct. The grammar is just the surface language by which you specify a program, but it has nothing to do with correctness.

@qawbecrdtey did you try doing Print Grammar term. ?

My understanding from reading the Coq documentation is that types can have terms (dependent types) so they have the production rule type := term (https://coq.inria.fr/distrib/current/refman/language/core/basic.html#grammar-token-type). In addition they say:

Terms are the basic expressions of Coq. Terms can represent mathematical expressions, propositions and proofs, but also executable programs and program types.

So terms seem to be basically everything - modulo strings that build proofs (e.g. tactics) and probably other stuff I might not be aware of (e.g. I don’t yet fully understand the extensible grammar part so maybe there is something to) but it seems that prints a lot of information:

Entry constr is
[ LEFTA
  [ "@"; global; univ_annot
  | term LEVEL "8" ] ]
and lconstr is
[ LEFTA
  [ term LEVEL "200" ] ]
where binder_constr is
[ LEFTA
...
  | atomic_constr
  | term_match
  | test_array_opening; "["; "|"; array_elems; "|"; lconstr; type_cstr;
    test_array_closing; "|"; "]"; univ_annot ] ]

shortened since it’s quite long.

Hi all,
I would like an ebnf file for the basic grammar for feeding to llama dot cpp (see github com ggerganov llama.cpp grammars%2FREADME.md ) even if it is extensible the we would need more adapted versions of it. I am working on creating enhanced versions of llama.cpp that integrates coq in checking the generated scripts and just need starting point.

Look at the files fullGrammar and orderedGrammar in the source tree for a start. Correctly parsing everything, if that’s your goal, would be a huge amount of work. In particular, read about notations in the documentation. Coq’s parser is based on Camlp5.

1 Like