I want to understand clearly affects how many steps it might take to completely proof a single goal. i.e. I want to understand the interface to the current function:
steps(goal,...) = # steps to prove goal
intuitively it must depend on the current tactic applied to goal
and the knowledge we have (e.g. environment & context).
Does this mean that the function has the following signature?
steps(goal, Tactic, Env, Context)
my hunch is that it actually only requires
steps(goal, Tactic)
since the information needed to break up the current proof obligation/goal is all in Tactic (e.g. it refers to the knowledge in the context or environment needed to correctly apply to the current goal).
Is this correct?
I guess in principle the # of steps for the current goal depends on the steps for all subsequent goals…and those might need specific knowledge that might not be current reachable from only the tactic (i.e. we do need the current context AND the environment).
As long as a proof does NOT do random imports in the middle now I believe that:
steps(goal,Tactic,context,env)
is necessary (sorry changed my mind).
Is this correct? What’s the right interface to that function?
Just curious what factors affect how long a proof is basically.
Sorry to say, but this question is ambiguous and lacks context, and what you ask might even be impossible (or not), depending on what it is.
Maybe you could elaborate on the context of your question? If you are talking about an existing steps
function, could you give a reference? If you’re trying to design such a function, can you give an example, and explain in which language is steps
supposed to be a “function” (Gallina, Ltac, Ocaml, something else…).
Taking a wild guess, it sounds like you want to measure something about… either how many steps it’ll take to prove a goal without knowing the full proof script (undecidable), or how many steps it’ll take to execute a given tactic (decidable, if you can define “step”) — or maybe what’s the result of running a tactic?
If you want to count proof steps, beware it’s hard to define what a “proof step” should mean — you can measure the length of a completed proof script, or print a finished proof term and measure its size; counting the calls to “atomic” tactics (in whatever sense, say those implemented in OCaml) is already harder; it’s unclear that either measure means much.
Thanks for taking time to reply.
This is just a thought experiment. It’s not a function I am trying to compute (and I am only assuming that the goals this function would ever “run on” are things with proofs, so steps exists by definition). The only way I know to compute it is by running Coq and counting the steps Coq does (which is my definition of steps). If this are finer definitions that’s fine, but counting the # of times a tactic is applied is enough.
I am curious to know if in theory, if we only know the current goal and the tactic (with the right arguments) I applied if that would be enough to “run coq to completion assuming I would keep applying the tactics some human proof was used”, assuming that perhaps the context and imported libraries are missing.
Does that make sense?
Okay, I think I can make sense of the question.
In short, I don’t think you can count steps without actually running the proof scripts, and you can’t run the proof scripts without loading everything that Coq loads — which includes all libraries that are Require
d (directly or indirectly), including definition/lemma bodies except the ones closed by Qed.
The reason is that, in complex proof scripts, some steps only execute if some other tactic fails or succeeds; to test if a tactic fails, you must run it. And to run a tactic, you need all of the above.
For instance, a proof script can contain (tactic1; solve [tactic2]) || tactic3.
— where each of tactic[1-3]
is a different tactic, that takes a different number of steps.
To execute that, Coq runs tactic1
; if that succeeds, Coq runs solve [tactic2]
on the resulting goals — which runs tactic2
, but fails if that doesn’t completely solve a goal. If tactic1; solve [tactic2]
fails, then Coq undoes their effects and runs tactic3
. And for instance, tactic2
could be eauto
, which can do almost anything (depending on the Hint
and Hint Extern
commands in required libraries).
2 Likes
Great comment. Very interesting feedback!
I have two main things that I am curious:
- When exactly are the conditions that future tactics can only execute if passed tactics failed? The reason that I am asking is with the definition of “steps” I had in mind
(tactic1; solve [tactic2]) || tactic3.
would be 1 tactic and the output of is the new subgoals (even if we don’t know which one executed).
- I had in mind to only consider atomic tactics. Does that still require we have all the imports and context? I am wondering how much I can strip down the arguments of
steps(...)
until it stops computing steps correctly (when I say compute I always mean if we were to run an actual proof, so the assumption is we have the proof script if we wanted it)
Actually on a second thought on my point 1, as you said, we can always run the proof scripts (I’m fine with that) and figure out which tactics were actually applied, right? either by asking coq or just applying the tactics in the compound tactics until we form the atomic tactic that produce that step.
Btw, I also improved the title. I hope that helps.
That suggests you’re counting sentences in the proof script. It seems that for that you only need to parse the script; which still requires to look at imported modules (not required ones), but only to know the extra parsing rules.
That seems to contradict the above: if you’re counting sentences, whether a tactic is atomic or not doesn’t seem to matter?
Perhaps reading a formal definition of a sentence might help me clarify what I have in mind. Is there a specification of that somewhere?
Thanks for the help!