.. role:: ref(emphasis) .. _futhark-cuda(1): ============== futhark-cuda ============== SYNOPSIS ======== futhark cuda [options...] DESCRIPTION =========== ``futhark cuda`` translates a Futhark program to C code invoking CUDA kernels, and either compiles that C code with gcc(1) to an executable binary program, or produces a ``.h`` and ``.c`` file that can be linked with other code. The standard Futhark optimisation pipeline is used, and GCC is invoked with ``-O``, ``-lm``, and ``-std=c99``. The resulting program will otherwise behave exactly as one compiled with ``futhark c``. ``futhark cuda`` uses ``-lcuda -lnvrtc`` to link. If using ``--library``, you will need to do the same when linking the final binary. OPTIONS ======= -h Print help text to standard output and exit. --library Generate a library instead of an executable. Appends ``.c``/``.h`` to the name indicated by the ``-o`` option to determine output file names. -o outfile Where to write the result. If the source program is named ``foo.fut``, this defaults to ``foo``. --safe Ignore ``unsafe`` in program and perform safety checks unconditionally. -v verbose Enable debugging output. If compilation fails due to a compiler error, the result of the last successful compiler step will be printed to standard error. -V Print version information on standard output and exit. -W Do not print any warnings. --Werror Treat warnings as errors. EXECUTABLE OPTIONS ================== Generated executables accept the same options as those generated by :ref:`futhark-c(1)`. The ``-t`` option behaves as with :ref:`futhark-opencl(1)`. For commonality, the options use OpenCL nomenclature ("group" instead of "thread block"). The following additional options are accepted. --default-group-size=INT The default size of thread blocks that are launched. Capped to the hardware limit if necessary. --default-num-groups=INT The default number of thread blocks that are launched. --default-threshold=INT The default parallelism threshold used for comparisons when selecting between code versions generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate the GPU. --default-tile-size=INT The default tile size used when performing two-dimensional tiling (the workgroup size will be the square of the tile size). --dump-cuda=FILE Don't run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful if you want to see what is actually being executed. --dump-ptx=FILE Don't run the program, but instead dump the PTX-compiled version of the embedded kernels to the indicated file. --load-cuda=FILE Instead of using the embedded CUDA kernels, load them from the indicated file. --load-ptx=FILE Load PTX code from the indicated file. --nvrtc-option=OPT Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for which options are supported. Be careful - some options can easily result in invalid results. --print-sizes Print all sizes that can be set with ``-size`` or ``--tuning``. --size=ASSIGNMENT Set a configurable run-time parameter to the given value. ``ASSIGNMENT`` must be of the form ``NAME=INT`` Use ``--print-sizes`` to see which names are available. --tuning=FILE Read size=value assignments from the given file. ENVIRONMENT =========== If run without ``--library``, ``futhark cuda`` will invoke ``gcc(1)`` to compile the generated C program into a binary. This only works if ``gcc`` can find the necessary CUDA libraries. On most systems, CUDA is installed in ``/usr/local/cuda``, which is not part of the default ``gcc`` search path. You may need to set the following environment variables before running ``futhark cuda``:: LIBRARY_PATH=/usr/local/cuda/lib64 LD_LIBRARY_PATH=/usr/local/cuda/lib64/ CPATH=/usr/local/cuda/include SEE ALSO ======== :ref:`futhark-opencl(1)`