2. Basic Usage¶

Futhark contains several code generation backends. Each is provided as subcommand of the futhark binary. For example, futhark c compiles a Futhark program by translating it to sequential C code, while futhark pyopencl generates Python code with calls to the PyOpenCL library. The different compilers all contain the same frontend and optimisation pipeline - only the code generator is different. They all provide roughly the same command line interface, but there may be minor differences and quirks due to characteristics of the specific backends.

There are three main ways of compiling a Futhark program: to an ordinary executable (by using --executable, which is the default), to a server executable (--server), and to a library (--library). Plain executables can be run immediately, but are useful mostly for testing and benchmarking. Server executables are discussed in Server Protocol. Libraries can be called from non-Futhark code.

2.1. Compiling to Executable¶

A Futhark program is stored in a file with the extension .fut. It can be compiled to an executable program as follows:

$ futhark c prog.fut

This makes use of the futhark c compiler, but any other will work as well. The compiler will automatically invoke cc to produce an executable binary called prog. If we had used futhark py instead of futhark c, the prog file would instead have contained Python code, along with a shebang for easy execution. In general, when compiling file foo.fut, the result will be written to a file foo (i.e. the extension will be stripped off). This can be overridden using the -o option. For more details on specific compilers, see their individual manual pages.

Executables generated by the various Futhark compilers share a common command-line interface, but may also individually support more options. When a Futhark program is run, execution starts at one of its entry points. By default, the entry point named main is run. An alternative entry point can be indicated by using the -e option. All entry point functions must be declared appropriately in the program (see Entry Points). If the entry point takes any parameters, these will be read from standard input in a subset of the Futhark syntax. A binary input format is also supported; see Binary Data Format. The result of the entry point is printed to standard output.

Only a subset of all Futhark values can be passed to an executable. Specifically, only primitives and arrays of primitive types are supported. In particular, nested tuples and arrays of tuples are not permitted. Non-nested tuples are supported are supported as simply flat values. This restriction is not present for Futhark programs compiled to libraries. If an entry point returns any such value, its printed representation is unspecified. As a special case, an entry point is allowed to return a flat tuple.

Instead of compiling, there is also an interpreter, accessible as futhark run and futhark repl. The latter is an interactive prompt, useful for experimenting with Futhark expressions. Be aware that the interpreter runs code very slowly.

2.1.1. Executable Options¶

All generated executables support the following options.

-h/--help

Print help text to standard output and exit.

-D/--debugging

Print debugging information on standard error. Exactly what is printed, and how it looks, depends on which Futhark compiler is used. This option may also enable more conservative (and slower) execution, such as frequently synchronising to check for errors. This implies --log.

-L/--log

Print low-overhead logging information during initialisation and during execution of entry points. Enabling this option should not affect program performance.

--cache-file FILE

Create (if necessary) and use data in the provided cache file to speed up subsequent launches of the same program. The cache file is automatically updated by the running program as necessary. It is safe to delete at any time, and will be recreated as necessary.

--print-params

Print a list of tuning parameters followed by their parameter class in parentheses, which indicates what they are used for.

--param SIZE=VALUE

Set one of the tunable sizes to the given value. Using the --tuning option is more convenient.

--tuning FILE

Load tuning options from the indicated tuning file. The file must contain lines of the form SIZE=VALUE, where each SIZE must be one of the sizes listed by the --print-params option (without size class), and the VALUE must be a non-negative integer. Extraneous spaces or blank lines are not allowed. A zero means to use the default size, whatever it may be. In case of duplicate assignments to the same size, the last one takes predecence. This is equivalent to passing each size setting on the command line using the --params option, but more convenient.

2.1.2. Non-Server Executable Options¶

The following options are only supported on non-server executables, because they make no sense in a server context.

-t/--write-runtime-to FILE

Print the time taken to execute the program to the indicated file, an integral number of microseconds. The time taken to perform setup or teardown, including reading the input or writing the result, is not included in the measurement. See the documentation for specific compilers to see exactly what is measured.

-r/--runs RUNS

Run the specified entry point the given number of times (plus a warmup run). The program result is only printed once, after the last run. If combined with -t, one measurement is printed per run. This is a good way to perform benchmarking.

-b/--binary-output

Print the result using the binary data format (Binary Data Format). For large outputs, this is significantly faster and takes up less space.

-n/--no-print-result

Do not print the result of running the program.

2.1.2.1. GPU Options¶

The following options are supported by executables generated with the GPU backends (opencl, pyopencl, hip, and cuda).

-d/--device DEVICE

Pick the first device whose name contains the given string. The special string #k, where k is an integer, can be used to pick the k-th device, numbered from zero.

--default-thread-block-size INT

The default size of GPU thread blocks that are launched. Capped to the hardware limit if necessary.

--default-num-thread-blocks INT

The default number of GPU thread blocks that are launched.

-P/--profile

Measure the time taken by various GPU operations (such as kernels) and print a summary at the end. Unfortunately, it is currently nontrivial (and manual) to relate these operations back to source Futhark code.

--unified-memory INT

Corresponds to futhark_context_config_set_unified_memory().

2.1.2.2. OpenCL-specific Options¶

The following options are supported by executables generated with the OpenCL backends (opencl, pyopencl):

-p/--platform PLATFORM

Pick the first OpenCL platform whose name contains the given string. The special string #k, where k is an integer, can be used to pick the k-th platform, numbered from zero. If used in conjunction with -d, only the devices from matching platforms are considered.

--default-group-size INT

The default size of OpenCL workgroups that are launched. Capped to the hardware limit if necessary.

--default-num-groups INT

The default number of OpenCL workgroups that are launched.

--dump-opencl FILE

Don’t run the program, but instead dump the embedded OpenCL program to the indicated file. Useful if you want to see what is actually being executed.

--load-opencl FILE

Instead of using the embedded OpenCL program, load it from the indicated file. This is extremely unlikely to result in succesful execution unless this file is the result of a previous call to --dump-opencl (perhaps lightly modified).

--dump-opencl-binary FILE

Don’t run the program, but instead dump the compiled version of the embedded OpenCL program to the indicated file. On NVIDIA platforms, this will be PTX code. If this option is set, no entry point will be run.

--load-opencl-binary FILE

Load an OpenCL binary from the indicated file.

--build-option OPT

Add an additional build option to the string passed to clBuildProgram(). Refer to the OpenCL documentation for which options are supported. Be careful - some options can easily result in invalid results.

--list-devices

List all OpenCL devices and platforms available on the system.

There is rarely a need to use both -p and -d. For example, to run on the first available NVIDIA GPU, -p NVIDIA is sufficient, as there is likely only a single device associated with this platform. On *nix (including macOS), the clinfo tool (available in many package managers) can be used to determine which OpenCL platforms and devices are available on a given system.

2.1.2.3. CUDA-specific Options¶

The following options are supported by executables generated by the cuda backend:

--dump-cuda FILE

Don’t run the program, but instead dump the embedded CUDA program to the indicated file. Useful if you want to see what is actually being executed.

--load-cuda FILE

Instead of using the embedded CUDA program, load it from the indicated file. This is extremely unlikely to result in succesful execution unless this file is the result of a previous call to --dump-cuda (perhaps lightly modified).

--dump-ptx FILE

As --dump-cuda, but dumps the compiled PTX code instead.

--load-ptx FILE

Instead of using the embedded CUDA program, load compiled PTX code from the indicated file.

--nvrtc-option OPT

Add the given option to the command line used to compile CUDA kernels with NVRTC. The list of supported options varies with the CUDA version but can be found in the NVRTC documentation.

For convenience, CUDA executables also accept the same --default-num-groups and --default-group-size options that the OpenCL backend uses. These then refer to grid size and thread block size, respectively.

2.1.2.4. Multicore options¶

The following options are supported by executables generated by the multicore backend:

--num-threads INT

The number of threads used to run parallel operations. If set to a value less than 1, then the runtime system will use one thread per detected core.

-P/--profile

Measure the time taken by various parallel sections and print a summary at the end. Unfortunately, it is currently nontrivial (and manual) to relate these operations back to source Futhark code.

2.2. Compiling to Library¶

While compiling a Futhark program to an executable is useful for testing, it is not suitable for production use. Instead, a Futhark program should be compiled into a reusable library in some target language, enabling integration into a larger program.

2.2.1. General Concerns¶

Futhark entry points are mapped to some form of function or method in the target language. Generally, an entry point taking n parameters will result in a function taking n parameters. If the entry point returns an m-element tuple, then the function will return m values (although the tuple can be replaced with a single opaque value, see below). Extra parameters may be added to pass in context data, or out-parameters for writing the result, for target languages that do not support multiple return values from functions.

The entry point should have a name that is also a valid identifier in the target language (usually C).

Not all Futhark types can be mapped cleanly to the target language. Arrays of tuples, for example, are a common issue. In such cases, opaque types are used in the generated code. Values of these types cannot be directly inspected, but can be passed back to Futhark entry points. In the general case, these types will be named with a random hash. However, if you insert an explicit type annotation (and the type name contains only characters valid for identifiers for the used backend), the indicated name will be used. Note that arrays contain brackets, which are usually not valid in identifiers. Defining and using a type abbreviation is the best way around this.

2.2.1.1. Value Mapping¶

The rules for how Futhark values are mapped to target language values are as follows:

Primitive types or arrays of primitive types are mapped transparently (although for the C backends, this still involves a distinct type for arrays).
All other types are mapped to an opaque type. Use a type ascription with a type abbreviation to give it a specific name, otherwise one will be generated.

Return types follow these rules, with one addition:

If the return type is an m-element tuple, then the function returns m values, mapped according to the rules above (but not including this one - nested tuples are not mapped directly). This rule does not apply when the entry point has been given a return type ascription that is not syntactically a tuple type.

2.2.1.2. Consumption and Aliasing¶

Futhark’s support for In-place Updates has implications for the generated API. Unfortunately, The type system of most languages (e.g. C) is not rich enough to express the rules, so they are not statically (or currently even dynamically checked). Since Futhark will never infer a unique/consuming type for an entry point parameter, this section can be ignored unless uniqueness annotations have been manually added to the entry points parameter types. The rules are essentially the same as in the language itself:

Each entry point input parameter is either consuming or nonconsuming (the default). This corresponds to unique and nonunique types in the original Futhark program. A value passed for a consuming parameter is considered consumed, now has an unspecified value, and may never be used again. It must still be manually freed, if applicable. Further, any aliases of that value are also considered consumed and may not be used.
Each entry point output is either unique or nonunique. A unique output has no aliases. A nonunique output aliases every nonconsuming input parameter.

Note that these distinctions are currently usually not visible in the generated API, and so correct usage requires knowledge of the original types in the Futhark function. The safest strategy is to not expose unique types in entry points.

2.2.2. Generating C¶

A Futhark program futlib.fut can be compiled to reusable C code using either:

$ futhark c --library futlib.fut

Or:

$ futhark opencl --library futlib.fut

This produces three files in the current directory: futlib.c, futlib.h, and futlib.json ( see Manifest for more on the latter).

If we wish (and are on a Unix system), we can then compile futlib.c to an object file like this:

$ gcc futlib.c -c

This produces a file futlib.o that can then be linked with the main application. Details of how to link the generated code with other C code is highly system-dependent, and outside the scope of this manual. On Unix, we can simply add futlib.o to the final compiler or linker command line:

$ gcc main.c -o main futlib.o

Depending on the Futhark backend you are using, you may need to add some linker flags. For example, futhark opencl requires -lOpenCL (-framework OpenCL on macOS). See the manual page for each compiler for details.

It is also possible to simply add the generated .c file to the C compiler command line used for compiling our whole program (here main.c):

$ gcc main.c -o main futlib.c

The downside of this approach is that the generated .c file may contain code that causes the C compiler to warn (for example, unused support code that is not needed by the Futhark program).

The generated header file (here, futlib.h) specifies the API, and is intended to be human-readable. See C API Reference for more information.

The basic usage revolves around creating a configuration object, which can then be used to obtain a context object, which must be passed whenever entry points are called.

The configuration object is created using the following function:

struct futhark_context_config *futhark_context_config_new();

Depending on the backend, various functions are generated to modify the configuration. The following is always available:

void futhark_context_config_set_debugging(struct futhark_context_config *cfg,
                                          int flag);

A configuration object can be used to create a context with the following function:

struct futhark_context *futhark_context_new(struct futhark_context_config *cfg);

Context creation may fail. Immediately after futhark_context_new(), call futhark_context_get_error() (see below), which will return a non-NULL error string if context creation failed. The API functions are all thread safe.

Memory management is entirely manual. Deallocation functions are provided for all types defined in the header file. Everything returned by an entry point must be manually deallocated.

For now, many internal errors, such as failure to allocate memory, will cause the function to abort() rather than return an error code. However, all application errors (such as bounds and array size checks) will produce an error code.

2.2.2.1. C with OpenCL¶

When generating C code with futhark opencl, you will need to link against the OpenCL library when linking the final binary:

$ gcc main.c -o main futlib.o -lOpenCL

When using the OpenCL backend, extra API functions are provided for directly accessing or providing the OpenCL objects used by Futhark. Take care when using these functions. In particular, a Futhark context can now be configured with the command queue to use:

void futhark_context_config_set_command_queue(struct futhark_context_config *cfg, cl_command_queue queue);

As a cl_command_queue specifies an OpenCL device, this is also how manual platform and device selection is possible. A function is also provided for retrieving the command queue used by some Futhark context:

cl_command_queue futhark_context_get_command_queue(struct futhark_context *ctx);

This can be used to connect two separate Futhark contexts that have been loaded dynamically.

The raw cl_mem object underlying a Futhark array can be accessed with the function named futhark_values_raw_type, where type depends on the array in question. For example:

cl_mem futhark_values_raw_i32_1d(struct futhark_context *ctx, struct futhark_i32_1d *arr);

The array will be stored in row-major form in the returned memory object. The function performs no copying, so the cl_mem still belongs to Futhark, and may be reused for other purposes when the corresponding array is freed. A dual function can be used to construct a Futhark array from a cl_mem:

struct futhark_i32_1d *futhark_new_raw_i32_1d(struct futhark_context *ctx,
                                              cl_mem data,
                                              int offset,
                                              int dim0);

This function does copy the provided memory into fresh internally allocated memory. The array is assumed to be stored in row-major form offset bytes into the memory region.

2.2.3. Generating Python¶

The futhark py and futhark pyopencl compilers both support generating reusable Python code, although the latter of these generates code of sufficient performance to be worthwhile. The following mentions options and parameters only available for futhark pyopencl. You will need at least PyOpenCL version 2015.2.

We can use futhark pyopencl to translate the program futlib.fut into a Python module futlib.py with the following command:

$ futhark pyopencl --library futlib.fut

This will create a file futlib.py, which contains Python code that defines a class named futlib. This class defines one method for each entry point function (see Entry Points) in the Futhark program. The methods take one parameter for each parameter in the corresponding entry point, and return a tuple containing a value for every value returned by the entry point. For entry points returning a single (non-tuple) value, just that value is returned (that is, single-element tuples are not returned).

After the class has been instantiated, these methods can be invoked to run the corresponding Futhark function. The constructor for the class takes various keyword parameters:

interactive=BOOL

If True (the default is False), show a menu of available OpenCL platforms and devices, and use the one chosen by the user.

platform_pref=STR

Use the first platform that contains the given string. Similar to the -p option for executables.

device_pref=STR

Use the first device that contains the given string. Similar to the -d option for executables.

Futhark arrays are mapped to either the Numpy ndarray type or the pyopencl.array type. Scalars are mapped to Numpy scalar types.

2.3. Reproducibility¶

The Futhark compiler is deterministic by design, meaning that repeatedly compiling the same program with the same compilation flags and using the same version of the compiler will produce identical output every time.

Note that this only applies to the code generated by the Futhark compiler itself. When compiling to an executable with one of the C backends (see Compiling to Executable), Futhark will invoke a C compiler that may not be perfectly reproducible. In such cases the generated .c and .h files will be reproducible, but the final executable may not.