Proofchat: A TCP client/server chat application written in Coq

Hi all. I just got through writing:

A chat application written in Coq that utilizes extraction facilities and RecDef to generate performant OCaml networking code.

This was a fun experiment and I encountered a ton of quirks regarding extraction that I hadn’t encountered before. I’m likely going to put together a document covering all of the challenges I encountered, hopefully it can be a comprehensive introduction for other people just getting into the extraction facilities.

Small problem

I started this project with the goal of verifying some properties of the application - message serialization/deserialization safety, concurrent behavior, etc. I ran into a few issues and now I’m no longer sure what the path forward is for my verification target:

  • Message serialization/deserialization doesn’t work because messages can have arbitrary length. This means that the message receiving functions need to receive some bytes from the socket, deserialize them, and use that information to decide how many more bytes should be read from the socket to further deserialize. Receiving from a socket is implemented via an Axiom that maps to Unix.recv during extraction, so it can’t be reasoned about in Coq. Serialization safety seems like a very important property, is there any way I can get around this?
  • I started to look at using Iris to verify concurrent properties and was immediately floored by the documentation: every document I’ve found hits you with mathematical descriptions of the system right away - useful to convey information to reviewers right away but un-parse-able to my inexperienced, first-year-PhD-student brain. Iris from the ground up is the closest I’ve come to being able to understand the basics of the system. Iris seems like the “assembly” form of the system I imagined when I started thinking about concurrency. Not sure what I expected. Not sure what the question is here, I guess: is this the only way to approach concurrent verification? Are there even any interesting concurrent properties for this software?
  • Writing proofs about RecDef’s Functions is already pretty tough given their structure - mixing in Sint63 makes it even harder. Any pointers to resources about writing proofs about these things?

Thanks for the input!

3 Likes

About serialization/deserialization, since you mention it in connection with receiving, but (hoping I am not just misunderstanding the problem) there should ideally be no such connection:

In general, serialization/deserialization is a self-contained protocol, including checking validity of a candidate for deserialization, and correctness reduces to the fact that to every “plain message” corresponds one and only one “serialized message”. OTOH, sending/receiving over TCP is of arbitrary sequences of bytes, it is independent of the specific shape or content of the data. So, and essentially for modularity, the two components/concerns should be kept distinct.

I initially modeled the S/DS pipeline like you’re describing - just pass a string to the socket, get a string from a socket. Unfortunately this means that I have to know how many bytes to receive, since Unix.recv hangs until it receives some specified number of bytes. I’d prefer not to do this, it limits message sizes which is annoying.

I think I could handle this another way: when sending a message, serialize it and then prepend its length to the byte sequence before sending it. The receiver receives the length field, then knows how much data to listen for before deserializing. This way, I could write a theorem of inversion for deserialize(serialize(x)) and just trust the length field segment. I like this method, I’ll look into restructuring appropriately.

Unix.recv hangs until it receives some specified number of bytes

That just doesn’t sound right, e.g. see here:
https://www.man7.org/linux/man-pages/man2/recv.2.html
<< The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested. >>

I’ll try and have a better look at your code tomorrow, plus I guess I’ll have to check OCaml docs…

serialize it and then prepend its length to the byte sequence before sending it

Your phrasing is a bit ambiguous, so let me reiterate for clarity: even if you do that, that should be a responsibility of the sending/receiving “component”, and transparent to the user, just an internal way to handle the transmission protocol: in particular, the serialization/deserialization “component” should not know anything about it. Modularity, as from the separation of concerns, is paramount.

I agree on the recv note. My experimental evidence seemed to contradict documentation, but I’m not certain enough to raise an issue about it, seems like something that would’ve been caught.

Right, in my mind this prepended length field is handled entirely by the send/receive wrappers, and serialization/deserialization know nothing about that process.

Unfortunately I cannot build manually (make after git-clone) because Coq is at versions 8.18.0 in the Snap package for Ubuntu. In fact, AFAICT, running ‘make’ almost works, just the extracted files are all over the place and here is the warning I get multiple times:

File "./theories/PCExtract.v", line 78, characters 0-55:
Warning: There is no flag or option with this name:
"Extraction Output Directory". [unknown-option,default]

But I don’t need to look at the Coq code, I just need to look at the extracted OCaml. Could you please confirm that the ‘proofchat’ folder in the GutHub repo already contains everything after extraction (i.e. is complete and ready to be compiled)?

Yes, the contents of proofchat/bin/{client,server}/{client,server}.ml are extracted from Coq. Everything else in there is handwritten boilerplate.

I have looked at the OCaml (I haven’t got to decipher the retry/timeout logic, or the details of the message protocol, or the use of threads, etc., just looked around the use of send and recv) and I cannot see anything obviously wrong, you are not even doing anything fancy with those sockets, but the code is a bit messy, especially quite unstructured, so it’s hard to read and little to even logical bugs are just likely… :slight_smile:

@jp-diegidio All of this is documented in the appropriate files in theories, but yes the auto-generated OCaml tends to be difficult to read on its own.

Unreadable as unstructured: I am using technical SE terminology, the OCaml per se is actually very tidy. But I’ll have to play with extraction before I can continue this conversation (and your project is a nice case study), I don’t know how much level of control there is there…

Do you accept issues/PRs for this project? (Will you keep developing it?)

I plan to keep developing it, but I don’t have a lot of spare bandwidth so this is a low priority project for me at the moment. I’ll gladly accept issues/PRs and help merge them in.

1 Like