12. Binary Data Format¶
Futhark programs compiled to an executable support both textual and binary input. Both are read via standard input, and can be mixed, such that one argument to an entry point may be binary, and another may be textual. The binary input format takes up significantly less space on disk, and can be read much faster than the textual format. This chapter describes the binary input format and its current limitations. The input formats (whether textual or binary) are not used for Futhark programs compiled to libraries, which instead use whichever format is supported by their host language.
Currently reading binary input is only supported for compiled programs.
It is not supported for futhark run
.
You can generate random data in the binary format with futhark
dataset
(futhark-dataset). This tool can also be used to
convert between binary and textual data.
Futhark-generated executables can be asked to generate binary output
with the -b
option.
12.1. Specification¶
Elements that are bigger than one byte are always stored using little endian – we mostly run our code on x86 hardware so this seemed like a reasonable choice.
When reading input for an argument to the entry function, we need to be able to
differentiate between text and binary input. If the first non-whitespace
character of the input is a b
we will parse this argument as binary,
otherwise we will parse it in text format. Allowing preceding whitespace
characters makes it easy to use binary input for some arguments, and text input
for others.
The general format has this header:
b <version> <num_dims> <type> <values...>
Where version
is a byte containing the version of the binary format used for
encoding (currently 2), num_dims
is the number of dimensions in the array as
a single byte (0 for scalar), and type
is a 4 character string describing
the type of the values(s) – see below for more details.
Encoding a scalar value is done by treating it as a 0-dimensional array:
b <version> 0 <type> <value>
To encode an array, we encode the number of dimensions n
as a
single byte, each dimension dim_i
as an unsigned 64-bit little
endian integer, and finally all the values in row-major order in their
binary little endian representation:
b <version> <n> <type> <dim_1> <dim_2> ... <dim_n> <values...>
12.1.1. Type Values¶
A type is identified by a 4 character ASCII string (four bytes). Valid types are:
" i8"
" i16"
" i32"
" i64"
" u8"
" u16"
" u32"
" u64"
" f16"
" f32"
" f64"
"bool"
Note that unsigned and signed integers have the same byte-level representation.
Values of type bool
are encoded with a byte each. The results are
undefined if this byte is not either 0 or 1.