futhark-cuda¶

SYNOPSIS¶

futhark cuda [options…] <program.fut>

DESCRIPTION¶

futhark cuda translates a Futhark program to C code invoking CUDA kernels, and either compiles that C code with a C compiler to an executable binary program, or produces a .h and .c file that can be linked with other code. The standard Futhark optimisation pipeline is used.

futhark cuda uses -lcuda -lcudart -lnvrtc to link. If using --library, you will need to do the same when linking the final binary.

The generated CUDA code can be called from multiple CPU threads, as it brackets every API operation with cuCtxPushCurrent() and cuCtxPopCurrent().

OPTIONS¶

Accepts the same options as futhark-c.

ENVIRONMENT VARIABLES¶

CC

The C compiler used to compile the program. Defaults to cc if unset.

CFLAGS

Space-separated list of options passed to the C compiler. Defaults to -O -std=c99 if unset.

EXECUTABLE OPTIONS¶

Generated executables accept the same options as those generated by futhark-c. The -t option behaves as with futhark-opencl. For commonality, the options use OpenCL nomenclature (“group” instead of “thread block”).

The following additional options are accepted.

-h, --help: Print help text to standard output and exit.
--default-group-size=INT: The default size of thread blocks that are launched. Capped to the hardware limit if necessary.
--default-num-groups=INT: The default number of thread blocks that are launched.
--default-threshold=INT: The default parallelism threshold used for comparisons when selecting between code versions generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate the GPU.
--default-tile-size=INT: The default tile size used when performing two-dimensional tiling (the workgroup size will be the square of the tile size).
--dump-cuda=FILE: Don’t run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful if you want to see what is actually being executed.
--dump-ptx=FILE: Don’t run the program, but instead dump the PTX-compiled version of the embedded kernels to the indicated file.
--load-cuda=FILE: Instead of using the embedded CUDA kernels, load them from the indicated file.
--load-ptx=FILE: Load PTX code from the indicated file.
-n, --no-print-result: Do not print the program result.
--nvrtc-option=OPT: Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for which options are supported. Be careful - some options can easily result in invalid results.
--param=ASSIGNMENT: Set a tuning parameter to the given value. ASSIGNMENT must be of the form NAME=INT Use --print-params to see which names are available.
--print-params: Print all tuning parameters that can be set with --param or --tuning.
--tuning=FILE: Read size=value assignments from the given file.

ENVIRONMENT¶

If run without --library, futhark cuda will invoke a C compiler to compile the generated C program into a binary. This only works if the C compiler can find the necessary CUDA libraries. On most systems, CUDA is installed in /usr/local/cuda, which is usually not part of the default compiler search path. You may need to set the following environment variables before running futhark cuda:

LIBRARY_PATH=/usr/local/cuda/lib64
LD_LIBRARY_PATH=/usr/local/cuda/lib64/
CPATH=/usr/local/cuda/include

At runtime the generated program must be able to find the CUDA installation directory, which is normally located at /usr/local/cuda. If you have CUDA installed elsewhere, set any of the CUDA_HOME, CUDA_ROOT, or CUDA_PATH environment variables to the proper directory.

futhark-cuda¶

SYNOPSIS¶

DESCRIPTION¶

OPTIONS¶

ENVIRONMENT VARIABLES¶

EXECUTABLE OPTIONS¶

ENVIRONMENT¶

SEE ALSO¶

Table of Contents

Previous topic

Next topic

This Page