futhark-cuda¶

SYNOPSIS¶

futhark cuda [options…] infile

DESCRIPTION¶

futhark cuda translates a Futhark program to C code invoking CUDA kernels, and either compiles that C code with gcc(1) to an executable binary program, or produces a .h and .c file that can be linked with other code. The standard Futhark optimisation pipeline is used, and GCC is invoked with -O3, -lm, and -std=c99. The resulting program will otherwise behave exactly as one compiled with futhark c.

futhark cuda uses -lcuda -lnvrtc to link. If using --library, you will need to do the same when linking the final binary.

OPTIONS¶

`-h`	Print help text to standard output and exit.
`--library`	Generate a library instead of an executable. Appends `.c`/`.h` to the name indicated by the `-o` option to determine output file names.
`-o outfile`	Where to write the result. If the source program is named `foo.fut`, this defaults to `foo`.
`--safe`	Ignore `unsafe` in program and perform safety checks unconditionally.
`-v verbose`	Enable debugging output. If compilation fails due to a compiler error, the result of the last successful compiler step will be printed to standard error.
`-V`	Print version information on standard output and exit.
`-W`	Do not print any warnings.
`--Werror`	Treat warnings as errors.

EXECUTABLE OPTIONS¶

Generated executables accept the same options as those generated by futhark-c. The -t option behaves as with futhark-opencl. For commonality, the options use OpenCL nomenclature (“group” instead of “thread block”).

The following additional options are accepted.

`--default-group-size=INT`
	The default size of thread blocks that are launched. Capped to the hardware limit if necessary.
`--default-num-groups=INT`
	The default number of thread blocks that are launched.
`--default-threshold=INT`
	The default parallelism threshold used for comparisons when selecting between code versions generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate the GPU.
`--default-tile-size=INT`
	The default tile size used when performing two-dimensional tiling (the workgroup size will be the square of the tile size).
`--dump-cuda=FILE`
	Don’t run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful if you want to see what is actually being executed.
`--dump-ptx=FILE`
	Don’t run the program, but instead dump the PTX-compiled version of the embedded kernels to the indicated file.
`--load-cuda=FILE`
	Instead of using the embedded CUDA kernels, load them from the indicated file.
`--load-ptx=FILE`
	Load PTX code from the indicated file.
`--nvrtc-option=OPT`
	Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for which options are supported. Be careful - some options can easily result in invalid results.
`--print-sizes`	Print all sizes that can be set with `-size` or `--tuning`.

–size=NAME=INT

Set a configurable run-time parameter to the given value. Use --print-sizes to see which are available.

--tuning=FILE

Read size=value assignments from the given file.

ENVIRONMENT¶

If run without --library, futhark cuda will invoke gcc(1) to compile the generated C program into a binary. This only works if gcc can find the necessary CUDA libraries. On most systems, CUDA is installed in /usr/local/cuda, which is not part of the default gcc search path. You may need to set the following environment variables before running futhark cuda:

LIBRARY_PATH=/usr/local/cuda/lib64
LD_LIBRARY_PATH=/usr/local/cuda/lib64/
CPATH=/usr/local/cuda/include

Table of Contents

Previous topic

Next topic

This Page