futhark cuda [options…] <program.fut>
futhark cuda translates a Futhark program to C code invoking CUDA
kernels, and either compiles that C code with gcc(1) to an executable
binary program, or produces a
.c file that can be
linked with other code. The standard Futhark optimisation pipeline is
used, and GCC is invoked with
resulting program will otherwise behave exactly as one compiled with
futhark cuda uses
-lcuda -lnvrtc to link. If using
--library, you will need to do the same when linking the final
Print help text to standard output and exit.
Generate a library instead of an executable. Appends
.hto the name indicated by the
-ooption to determine output file names.
- -o outfile
Where to write the result. If the source program is named
foo.fut, this defaults to
unsafein program and perform safety checks unconditionally.
- -v verbose
Enable debugging output. If compilation fails due to a compiler error, the result of the last successful compiler step will be printed to standard error.
Print version information on standard output and exit.
Do not print any warnings.
Treat warnings as errors.
Generated executables accept the same options as those generated by
-t option behaves as with
futhark-opencl. For commonality, the options use OpenCL
nomenclature (“group” instead of “thread block”).
The following additional options are accepted.
The default size of thread blocks that are launched. Capped to the hardware limit if necessary.
The default number of thread blocks that are launched.
The default parallelism threshold used for comparisons when selecting between code versions generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate the GPU.
The default tile size used when performing two-dimensional tiling (the workgroup size will be the square of the tile size).
Don’t run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful if you want to see what is actually being executed.
Don’t run the program, but instead dump the PTX-compiled version of the embedded kernels to the indicated file.
Instead of using the embedded CUDA kernels, load them from the indicated file.
Load PTX code from the indicated file.
Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for which options are supported. Be careful - some options can easily result in invalid results.
Print all sizes that can be set with
Set a configurable run-time parameter to the given value.
ASSIGNMENTmust be of the form
--print-sizesto see which names are available.
Read size=value assignments from the given file.
If run without
futhark cuda will invoke
to compile the generated C program into a binary. This only works if
gcc can find the necessary CUDA libraries. On most systems, CUDA
is installed in
/usr/local/cuda, which is not part of the default
gcc search path. You may need to set the following environment
variables before running
LIBRARY_PATH=/usr/local/cuda/lib64 LD_LIBRARY_PATH=/usr/local/cuda/lib64/ CPATH=/usr/local/cuda/include