How to make verified software more accessible

jco · July 6, 2020, 6:16am

Sorry if this is a naive question. I’m new to the world of proof assistants, and my goal for the year essentially is to become a proficient user, able to write verified software in some domain. Longer term, I really would like to figure out how to make it so that in industry people can write “better” code (more correct, verified, etc).

I’m a deep believer in two things: 1. the state of software is abysmal (look at the 747 max, for example) and 2. we need tools like proof assistants and formal verification to help solve 1. That said, this stuff is not easy. My own background is in data engineering, largely in Java and Scala, and even getting people to use Scala over say Python can be a real battle. People heavily resist the idea of having to “think with a computer” (eg with the compiler), even if ultimately it saves them a lot of time and pain. Interesting Rust has been a bit of a different experience…I don’t know how, but a lot of people I know who would never get excited about Scala or Haskell are excited to write Rust, even though it’s much lower level. This is probably for social reasons rather than technical ones, THOUGH I think that Rust has done a good job, marketing wise, of explaining the problem that the compiler solves for the user.

Regardless…I’m curious what approaches there are to make formally verified more accessible. There are multiple ways to approach this. One is more theoretical: is it even possible? Is it desirable? Perhaps it will always be that a cabal of PhDs will hop from trillion dollar company to trillion dollar company helping them verify critical pieces of infrastructure, while the programming quality tire fire rages on. But the state of security on the web as long as the increasing complexity of software in systems like airplanes and cars makes me hope otherwise.

That said, as I’m new to this stuff myself, it’s sort of hard to imagine where the sweet spot is! Writing Scala or Haskell is easy enough, relatively speaking. Once you have to actually prove things that seem obvious beyond “this function takes these types and outputs this type,” things can get really rough (I’m having flashbacks to the Color module of Software Foundations part 4). I’m currently working through the beta Software Foundations part 5 on Verified C, and I think it’s extremely awesome, but the proofs are not trivial…most critically, figuring out how to represent what is happening in Coq so that a proof is possible is the hardest part, along with coming up with invariants (though I think if you have a good representation the invariants get a lot easier).

So I dunno. I’m wondering if there might be the possibility for domain specific tools (could be a language, but could also be a GUI based something, or anything) that sort of…hide the creation of all of this, but do it under the covers? Where things are correct by construction, and things like representation and invariants are perhaps done by the user, but due to the nature of the constraints of the domain specific whatever, it’s simpler or a good tool can at least make it all more tractable. I have no idea if that is possible or what it would look like, just thinking aloud, and am wondering if there is any work in this direction. I feel like things like Heartbleed and what happened with the 747MAX should be enough to convince industry people that it’s worth the effort but…it seems like that’s not the case, and more effort has to be expended to save software engineers from themselves Or perhaps, to save us from the software engineers!

Edit: I want to add that I’m aware that in the dependently typed language world there are languages like Idris explicitly focused on being something you’d want to write a network driver in. That said, I think the fundamental issue is the same: whether it is Coq or Idris, figuring out how to represent your logic in the system and then verify it is really hard. Idris may have some language constructs to support writing code (whereas most agree Coq excels at writing proofs), but I think the fundamental issue is the same, that the current model of code verification is unlikely to be scalable and we likely will need some other model of doing things if there’s a hope of getting people using these systems. That said, I am also watching the Idris community as I think there’s cool stuff happening there too. But I feel this community has a large, diverse range of practitioners which can lead to cross-pollination of ideas.

jfehrle · July 6, 2020, 7:22pm

I agree that the state of software today is poor. I certainly agree we should encourage industry to use proofs and tools such as Coq to improve software quality.

I think the first issue for industry is that it still takes a very large effort and PhD-level expertise to do proofs on large pieces of code. The field is still immature and an active research area. We’re still learning how to make such proofs easier to create and maintain. As the field matures, the techniques should become more accessible to those with less expertise. I’m sure we’ll need to evolve the tools, such as Coq and its proof libraries, as well.

The second issue is whether the benefits are perceived as sufficient to justify the effort. The benefit of adding a new feature or fixing a bug is often pretty clear. With verification, the benefit may be harder to see; if Heartbleed had been rendered impossible through verification, the cost of the vulnerability exploited by Heartbleed would much be less apparent.

We need to reduce the difficulty of doing these proofs by 10- or 100-fold. Furthermore, we should think more about useful partial verification of existing large systems. We need techniques that will let us prove important properties of (and find bugs in), say, Linux or Windows. I expect this will be an iterative process: partial verification will expose bugs to fix.

Some things are not amenable to formal verification. For example, while machine learning algorithms can be verified, it’s hard to imagine that the learning can be formalized and proven correct. (That learning would include software for self-driving cars.) Evaluating whether a web page has a good layout may have some aspects that could be verified but some of it is subjective.

Creating domain-specific tools and languages may be a useful approach and can certainly be fun. But there’s a risk of putting a great deal of effort into these to replicate necessary features already available in existing tools. Then the tool/language needs to attract a critical mass of users and developers to be self-sustaining. This is a very high bar.

If you can figure out how to make it easier to write proofs and/or generate invariants with existing tools, that will be much easier for the user community to adopt. That may mean finding a better way to do these things with existing tool features or, if necessary, adding incremental new features to the tools. If these aren’t enough, I’d next consider whether you can justify a major enhancement to existing tools. It may be possible to get much closer to “correct by construction” just by creating new libraries and documenting how to apply them.

Better documentation of Coq will make it much easier to learn and use it–a practical step toward getting industry to do more verification. I’ve been working on this. See Refman improvements in 8.12 and beyond · coq/coq Wiki · GitHub.

I think the fundamental issue is the same, that the current model of code verification is likely to be scalable and we likely will need some other model of doing things if there’s a hope of getting people using these systems.

You probably meant “unlikely to be scalable” here?

jco · July 7, 2020, 3:13am

Whoops, thanks for pointing out that rather unfortunate typo. I fixed it.

Thanks for sharing your thoughts and wisdom. I think you’re on point with everything. I want to believe that tools like Coq can be a part of helping software engineers reorganize the field and help with the crisis of quality and safety that exists. You’re right that this is all very early stage, which is exciting, but also frustrating I think the thing I’ve been trying to figure out as I dive into all of this stuff is what the most practical path forward is in the medium term to try and get at least a little bit of traction in industry.

When I say DSLs/tools, they could be implemented in Coq or on top of Coq…you’re right that it’s yet another tool for people to learn, but I’m extremely skeptical that even a highly motivated programmer is likely to use Coq directly unless it evolves significantly. Proofs are really hard. I have the wherewithal to stick with it (thus far!), but 99% of my Scala and Haskell-loving friends will not…even though they want their code to be correct. I’m not sure how to bridge that gap!

jfehrle · July 8, 2020, 4:56am

I believe there are some software verification research positions at some of the larger tech companies such as Microsoft, Google and Facebook. I expect these positions all require a PhD with a relevant research background since it’s an emerging technology for industry.

For example, Microsoft Research supports development of the Lean proof assistant. An MIT group funded in part by Google implemented proven-correct high speed cryptographic arithmetic verified with Coq. IIUC Google is using the generated code in their SSL library. ( Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, Adam Chlipala. Simple High-Level Code For Cryptographic Arithmetic – With Proofs, Without Compromises. To Appear in Proceedings of the IEEE Symposium on Security & Privacy 2019 (S&P’19). May 2019. )

Another interesting tool is “Infer” created by Peter O’Hearn and others at Facebook Research, which uses separation logic to automatically check for various errors (partial verification).

You may also want to attend some conferences. They can be a good way to meet people with common interests–though maybe easier in a non-pandemic year when it’s face to face rather than virtual. For example, the Coq Workshop, which was just this Sunday and Monday and can be seen online. There’s also an annual Coq Implementors Workshop. DeepSpec (project associated with Software Foundations) has had sessions at PLDI. See DeepSpec conference series - REMS-DeepSpec 2020.

If you want to play a significant role in promoting software verification, you should consider getting a PhD. Pick a research focus such as how to make proof assistants/provers more powerful/easier to use. It could be an opportunity to spend full time on software verification while you work toward a degree–that’s more likely to produce something useful than working on your own time while holding down an unrelated job. (FWIW, I never pursued a PhD.)

Proofs are really hard.

I think it’s a skill that one develops over time. But hard to learn because Coq is poorly documented–you have be very motivated to deduce what Coq is actually doing. Proofs in Coq are not structured the way that most of us were introduced to them, such as in math classes. That takes some getting used to.

I think the fundamental issue is the same: whether it is Coq or Idris, figuring out how to represent your logic in the system and then verify it is really hard.

Perhaps the issue is that the representation in the tools doesn’t match how programmers think when they write code. Generally I have a good idea why my code ought to be correct (even though I make make many mistakes along the way).

silene · July 8, 2020, 6:01am

This has been done multiple times already (including with Coq), and there are numerous research papers on the topic. Basically, the theorem that you formally prove is that, if there exists at least one neural network (or whatever you are trying to train) that could successfully categorize all the inputs from your training set, then your learning algorithm is sure to find such a neural network, and conversely.

Obviously, this does not tell you anything about the behavior of your trained neural network outside of the training set, e.g., overfitting. But that does not change the point: Learning algorithms can be formally proved to be correct.

jco · July 8, 2020, 7:04am

Thank you for the response! I definitely agree that going to the right conferences is probably the best next step to meet people face to face and talk about this stuff. I appreciate the suggestions – I definitely want to go to some once the global pandemic is over. And thanks for the link to the paper! It seems like security is one of the big areas where industry is willing to fund this work (though I’ve heard it’s also present in hardware same as well?).

As far as a PhD…that certainly would be easiest But it would be so difficult to go back to subsistence wages…I know a number of people who have sort of backdoored their way into roles with Google Research, though generally in machine learning. So I’m sort of holding out hope that if I meet the right people, maybe contribute to the right open source project, I can at least get into a support role. In my experience there is often a need for people with engineering experience to do all the stuff that the big brain PhDs don’t want to do, and then you can build from there. But I don’t know. I do think about getting a PhD a lot but…the financial aspect is hard, as is the sorry state of academia (plus the trump/gop assault on H1Bs and international students is almost certainly going to lead to some big changes, in my opinion). Perhaps I have worked with too many bitter ex-academics But I mean in my heart I’d love to get a PhD.

I think this is certainly part of it, but I think the bigger part is your next piece

I think this is more of the issue. Plus, I think that many programmers are more than happy to trade up front reasoning for a lot of debugging pain later. I mean, think about how much resistance there is to typed languages at all…despite the fact that, in my experience, a medium sized or larger dynamically typed code bases is a nightmare of land mines. In that milieu, forcing people to have to figure out how to prove things is…not easy. Which of course academics don’t necessarily have to worry about, which is why I’m curious who does care, because I care a lot.

This is why I put out the idea of DSLs…a lot of my professional experience is in data engineering, and I think the trajectory there is very interesting. For a long while, you just had…expensive chaos. Companies that had the data volumes to need large scale processing largely rolled it themselves, and it was often very brittle, poorly abstracted, a complete mess. Then Google produced the MapReduce paper, which then led to HDFS and MapReduce, which was a huge sea change. It was all about the right abstraction, the right DSL, the right tradeoffs…from there there was an explosion in creativity and scale (of course buoyed by other trends, namely the commodification of hardware for distributed storage and computing). I got addicted to programming languages working on the Pig project, a DSL for data analysis. At the time there was also Hive, SQL on map reduce. Eventually I worked on the Scalding project, which presented an API that would evolve into something like Spark. The experience has left a deep impression on me. Where in generic computing, programmers could argue that Ruby or Python let them solve their problems faster or whatever, in the data domain, the success of these models was undeniable, and a lot of people who would never consider Scala for data processing used it b/c of the benefits the platform provided. Of course these days PySpark is also popular, but still, I think that this all goes to show that if you can provide the right abstraction, you can really blow open a use case.

So that’s why I’m sort of wondering if there might be ways via constraining what one can express to make the proofs easier, or to provide building blocks so that at least common things are easier. I’m working through Verifiable C stuff right now and it’s hard but also…it feels like there’s something underneath that could be abstracted out further for common use cases. Could all ADTs be given a common framework to simplify proofs? Is there some way to provide maybe a GUI which people build their program logic in such a way that invariant declaration can be facilitated? Or perhaps if that stuff isn’t possible generically, is it possible if we narrow the scope of problems we solve? What if we just do control systems? Or GUIs? Or server topologies? Or who knows what.

But what I do know is that we have to build a bridge with programmers where they are. I don’t think it is practical that all programmers will be writing this stuff, but I think it’s possible that things can be made so that it’s not just PhDs in programming languages. I’ve seen some interest and excitement around TLA+, for example…I personally don’t like that approach to things, but it presents a lot of advantages that programmers care about in practice. I’d love for the creators of distributed systems, of critical libraries, etc in industry to be able to leverage more than just a scala or haskell level type system to be able to reason about their software. FWIW I know a lot of professional programmers who are intrigued by my passion for the space, but sort of waiting for me to show them that it’s not just a pipe dream.

I’m sort of just rambling out loud I guess. I’m currently unemployed waiting out the pandemic in a small city in China trying to absorb as much as possible. It’s no PhD, though!

This is super interesting. Do you know of any good papers on the topic?

silene · July 8, 2020, 2:07pm

Here are some examples:

Verified Perceptron Convergence Theorem, C. Murphy, P. Gray, G. Stewart, https://www.cs.princeton.edu/~tcm3/docs/mapl_2017.pdf
Certifying the True Error: Machine Learning in Coq with Verified Generalization Guarantees, A. Bagnall, G. Stewart, http://ace.cs.ohio.edu/~gstewart/papers/aaai19-bagnall.pdf

jfehrle · July 8, 2020, 2:39pm

For example, while machine learning algorithms can be verified, it’s hard to imagine that the learning can be formalized and proven correct.

@silene
Learning algorithms can be formally proved to be correct.

My wording was a little casual. The theorems you describe are part of what I meant by “algorithms can be verified”. By “learning” I meant whether the system is any good at detecting cats, pedestrians or skin cancer. Hard to imagine how you formalize “cat”–but I’d be fascinated to see an attempt.

jfehrle · July 8, 2020, 4:34pm

security is one of the big areas where industry is willing to fund this work (though I’ve heard it’s also present in hardware same as well?).

Software for aviation, medical equipment and nuclear power plants is held to a higher standard with different development procedures. For example, very methodical, very lengthy code reviews. I think there is some use of formal methods in these areas. For hardware, the cost and delay involved in revising a chip design is very high, so the CAD and verification tools for hardware are much more developed than they are for software. Hardware companies are making some use of formal methods, including Coq.

But what I do know is that we have to build a bridge with programmers where they are.

I think developers would be more open to approaches that begin with partial verification, such as verification of invariants that can be applied incrementally during the software development process to find bugs in the program. Current proof assistants such as Coq require considerable work to prove theorems. If you’re unable to prove the theorem, it takes more work to figure out whether the problem is in the proving or a bug in the code, which may not be easy to determine. It would be great to have a verifier that could generate counterexamples to what you’re trying to prove.

Writing software is a process of incremental refinement. You have some code that you believe is good and then you add to it while trying not to break what was already there. It’s not clear to me that proof assistants support this model well. (I’ve not tried this with Coq. Maybe it’s not so much of issue.) A “waterfall” development process in which verification starts when you think the code is done makes little sense to me.

In developing verification tools, it makes sense to focus on things that can’t be handled well by existing techniques. For example, testing and in particular automated testing is good at showing that code works for particular examples. But testing is not very good at, say, ensuring there are no memory leaks or corruption (e.g. in C code), that the threads in your multi-threaded application don’t deadlock or corrupt data structures, or that your hash tables and B trees are correct.

Good luck with your endeavors.

jco · July 9, 2020, 2:50am

I brought up security only because I know of some formal methods being used there…I would be extremely interested to know if they’re being used in the other areas you mentioned. I certainly think they should be!

To everything else you wrote: I agree. I think more people just need to make attempts at the problem. I suppose I made this thread hoping to see if there might be some research groups currently thinking deeply about this, or if there might be areas in industry I don’t know about.

Topic		Replies	Views
Proof engineering survey published Announcements	3	1035	March 17, 2020
Any formally verified open source libraries? Miscellaneous	12	2012	December 25, 2019
How to be good in formal verification? Using Coq	3	840	October 7, 2022
Notes from the CoqPL 2024 Q/A session Miscellaneous meeting	9	357	January 29, 2024
CertiuCOS2 Using Coq	0	239	September 25, 2023

How to make verified software more accessible

Related Topics