Proof of Concept in getting COQ running inside of LLama.cpp, need help

Dear Coq Community,

I am humbly working on a proof of concept code that modifies the llama.cpp llm engine with callbacks for calling native code.

The goals are :slight_smile: that we can construct proofs about the state of neurons or tokens inside of a language model , we can check the output of the llm in real time, we can provide a feedback loop the llm from coq, we can generate better outputs and fine tune model.

This allows for callbacks in script engines to be called while the tokens are generating or even before inside the the llm tensors.
My goal is to have it validate the coq vernacular syntax, I was fighting with using lsp-coq and instead am natively embedding coq.

My architecture is : Driver application (main, webserver, olama, python application with libllama.cpp plugin) , then the llama.cpp, and inside llama we have plugins with python,nodejs,metacall.io and now ocaml.

The ocaml now has the coq modules loaded and I am trying to parse the text.

Screenshot:

I am in discord voice chat (str server) or text if anyone wants to go over this

Code branch is here https://github.com/meta-introspector/llama.cpp/pull/8
Subreddit here : Reddit - Dive into anything
twitter: twitter.com/introsp3ctor

Work in progress, proof of concept, needs help. Start of embedding of coq into llama.cpp for smaller inner loops.

What : Ocaml ocamlopt linking into llama.cpp
Proof of cocept code that shows that we can take tokens and feed them to the coq, trying to get it to parse the venacular is failing, need help.

Why?
Embedding coq in the same address space as the llm with a scripting language can lead to smaller loops and shorter turnarounds for running checks on code generation. We can compile code for cpu and gpu and move data as needed for optimal speed.
Even sampling the tensor values in the forward pass can be done in real time for interesting applications. Giving llm access to the proof state via the latent space and vectorization will be possible.

I got the basics working and get different parse messages on the output, now to use grammar to constrain the output.
llama.cpp/grammars/README.md at master ยท ggerganov/llama.cpp ยท GitHub and better prompting.

1 Like