How to use binary floats in Flocq

Hi! I’m very new to Coq (and provers in general). I’m trying to figure out how binary floats in Flocq work. My end goal is performing calculations on binary32’s and binary64’s and retrieving the result. I’ve figured out how to create a binary_float (based on this thread on stackoverflow). Here’s the code I ended up with:

Require Import BinNums BinInt ZArith Floats.
From Flocq Require Import Core FLT Operations Binary Bits BinarySingleNaN PrimFloat.

Goal Prec_gt_0 24.
Proof.
  reflexivity.
Qed.

Definition my_binary32 (sign: bool) (exponent mantissa: Z): binary_float 24 128 :=
  binary_normalize 24 128 eq_refl eq_refl mode_NE mantissa exponent sign.

My main questions are:

  1. How do I create a specific float (eg. a binary32 that represents the decimal value 2.5)?

  2. How do I interpret a binary float? (I.e. how do I “translate” it to a decimal value.) For instance, Eval simpl in (Bplus mode_NE (my_binary32 false 1 2) (my_binary32 false 1 2)). returns the following:

     = B754_finite false 8388608 (-20)
         (proj1
            (binary_round_correct 24 128 ?prec_gt_0_ 
               ?prec_lt_emax_ mode_NE false 16777216
               (Z.min (fexp 24 128 3) (fexp 24 128 3))))
     : binary_float 24 128
where
?prec_gt_0_ : [ |- Prec_gt_0 24]
?prec_lt_emax_ : [ |- Prec_lt_emax 24 128]

I can tell that the resulting value is positive and finite, but I don’t know about the rest.

Any help or resource would be greatly appreciated!

Regarding computations, I suggest to use vm_compute instead of simpl, and also to compose your computation with a call to erase, so as to get something more readable:

Eval vm_compute in (erase (Bplus mode_NE (my_binary32 false 1 2) (my_binary32 false 1 2))).
(* = B754_finite false 8388608 (-20) eq_refl : binary_float 24 128 *)

The resulting number is 8388608 * 2^(-20). Up to you to match it to get whichever representation fits your bill (e.g., rational numbers).

To create a constant, just call your function my_binary32. For example, to get 2.5, you can do the following:

Eval vm_compute in erase (my_binary32 false (-1) 5).
(* = B754_finite false 10485760 (-22) eq_refl : binary_float 24 128 *)

If your constant is not exactly representable, you can use the following code instead, which goes through the division:

Definition b32_of_rat (s:bool) (num den: positive): binary_float 24 128 :=
  erase (SF2B _ (proj1 (Bdiv_correct_aux 24 128 eq_refl eq_refl mode_NE s num 0 false den 0))).

(* Constant 0.3 *)
Eval vm_compute in b32_of_rat false 3 10.
(* = B754_finite false 10066330 (-25) eq_refl : binary_float 24 128 *)
1 Like

Thank you for the detailed response! I do have a follow-up question: Would there be any difference in using the binary32 and binary64 types along with their respective operations, instead of the generic format binary_float? And if so, how could I create a float of these types?

Note that the types binary32 and binary64 are built on top of Binary.binary_float, while you have been using the type BinarySingleNaN.binary_float till now. The critical difference between both binary_float type is that the Binary one supports NaN signs and payloads, while the BinarySingleNaN one conflates all the NaN values into one. So, it is really a matter of which of these two theories you want to use.

As for creating numbers, the method above works for both theories, although the functions from Binary might take a few more arguments than those of BinarySingleNaN to deal with NaNs.

1 Like