futhark-cuda¶
SYNOPSIS¶
futhark cuda [options…] infile
DESCRIPTION¶
futhark cuda
translates a Futhark program to C code invoking CUDA
kernels, and either compiles that C code with gcc(1) to an executable
binary program, or produces a .h
and .c
file that can be
linked with other code. The standard Futhark optimisation pipeline is
used, and GCC is invoked with -O
, -lm
, and -std=c99
. The
resulting program will otherwise behave exactly as one compiled with
futhark c
.
futhark cuda
uses -lcuda -lnvrtc
to link. If using
--library
, you will need to do the same when linking the final
binary.
OPTIONS¶
-h | Print help text to standard output and exit. |
--library | Generate a library instead of an executable. Appends .c /.h
to the name indicated by the -o option to determine output
file names. |
-o outfile | Where to write the result. If the source program is named
foo.fut , this defaults to foo . |
--safe | Ignore unsafe in program and perform safety checks unconditionally. |
-v verbose | Enable debugging output. If compilation fails due to a compiler error, the result of the last successful compiler step will be printed to standard error. |
-V | Print version information on standard output and exit. |
-W | Do not print any warnings. |
--Werror | Treat warnings as errors. |
EXECUTABLE OPTIONS¶
Generated executables accept the same options as those generated by
futhark-c. The -t
option behaves as with
futhark-opencl. For commonality, the options use OpenCL
nomenclature (“group” instead of “thread block”).
The following additional options are accepted.
--default-group-size=INT | |
The default size of thread blocks that are launched. Capped to the hardware limit if necessary. | |
--default-num-groups=INT | |
The default number of thread blocks that are launched. | |
--default-threshold=INT | |
The default parallelism threshold used for comparisons when selecting between code versions generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate the GPU. | |
--default-tile-size=INT | |
The default tile size used when performing two-dimensional tiling (the workgroup size will be the square of the tile size). | |
--dump-cuda=FILE | |
Don’t run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful if you want to see what is actually being executed. | |
--dump-ptx=FILE | |
Don’t run the program, but instead dump the PTX-compiled version of the embedded kernels to the indicated file. | |
--load-cuda=FILE | |
Instead of using the embedded CUDA kernels, load them from the indicated file. | |
--load-ptx=FILE | |
Load PTX code from the indicated file. | |
--nvrtc-option=OPT | |
Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for which options are supported. Be careful - some options can easily result in invalid results. | |
--print-sizes | Print all sizes that can be set with -size or --tuning . |
--size=ASSIGNMENT | |
Set a configurable run-time parameter to the given
value. ASSIGNMENT must be of the form NAME=INT Use
--print-sizes to see which names are available. | |
--tuning=FILE | Read size=value assignments from the given file. |
ENVIRONMENT¶
If run without --library
, futhark cuda
will invoke gcc(1)
to compile the generated C program into a binary. This only works if
gcc
can find the necessary CUDA libraries. On most systems, CUDA
is installed in /usr/local/cuda
, which is not part of the default
gcc
search path. You may need to set the following environment
variables before running futhark cuda
:
LIBRARY_PATH=/usr/local/cuda/lib64
LD_LIBRARY_PATH=/usr/local/cuda/lib64/
CPATH=/usr/local/cuda/include