.. role:: ref(emphasis)

.. _futhark-cuda(1):

==============
futhark-cuda
==============

SYNOPSIS
========

futhark cuda [options...] <program.fut>

DESCRIPTION
===========


``futhark cuda`` translates a Futhark program to C code invoking CUDA
kernels, and either compiles that C code with gcc(1) to an executable
binary program, or produces a ``.h`` and ``.c`` file that can be
linked with other code. The standard Futhark optimisation pipeline is
used, and GCC is invoked with ``-O``, ``-lm``, and ``-std=c99``. The
resulting program will otherwise behave exactly as one compiled with
``futhark c``.

``futhark cuda`` uses ``-lcuda -lnvrtc`` to link.  If using
``--library``, you will need to do the same when linking the final
binary.

OPTIONS
=======

-h
  Print help text to standard output and exit.

--library
  Generate a library instead of an executable.  Appends ``.c``/``.h``
  to the name indicated by the ``-o`` option to determine output
  file names.

-o outfile
  Where to write the result.  If the source program is named
  ``foo.fut``, this defaults to ``foo``.

--safe
  Ignore ``unsafe`` in program and perform safety checks unconditionally.

-v verbose
  Enable debugging output.  If compilation fails due to a compiler
  error, the result of the last successful compiler step will be
  printed to standard error.

-V
  Print version information on standard output and exit.

-W
  Do not print any warnings.

--Werror
  Treat warnings as errors.

EXECUTABLE OPTIONS
==================

Generated executables accept the same options as those generated by
:ref:`futhark-c(1)`.  The ``-t`` option behaves as with
:ref:`futhark-opencl(1)`.  For commonality, the options use OpenCL
nomenclature ("group" instead of "thread block").

The following additional options are accepted.

--default-group-size=INT

  The default size of thread blocks that are launched.  Capped to the
  hardware limit if necessary.

--default-num-groups=INT

  The default number of thread blocks that are launched.

--default-threshold=INT

  The default parallelism threshold used for comparisons when
  selecting between code versions generated by incremental flattening.
  Intuitively, the amount of parallelism needed to saturate the GPU.

--default-tile-size=INT

  The default tile size used when performing two-dimensional tiling
  (the workgroup size will be the square of the tile size).

--dump-cuda=FILE

  Don't run the program, but instead dump the embedded CUDA kernels to
  the indicated file.  Useful if you want to see what is actually
  being executed.

--dump-ptx=FILE

  Don't run the program, but instead dump the PTX-compiled version of
  the embedded kernels to the indicated file.

--load-cuda=FILE

  Instead of using the embedded CUDA kernels, load them from the
  indicated file.

--load-ptx=FILE

  Load PTX code from the indicated file.

--nvrtc-option=OPT

  Add an additional build option to the string passed to NVRTC.  Refer
  to the CUDA documentation for which options are supported.  Be
  careful - some options can easily result in invalid results.

--print-sizes

  Print all sizes that can be set with ``-size`` or ``--tuning``.

--size=ASSIGNMENT

  Set a configurable run-time parameter to the given
  value. ``ASSIGNMENT`` must be of the form ``NAME=INT`` Use
  ``--print-sizes`` to see which names are available.

--tuning=FILE

  Read size=value assignments from the given file.

ENVIRONMENT
===========

If run without ``--library``, ``futhark cuda`` will invoke ``gcc(1)``
to compile the generated C program into a binary.  This only works if
``gcc`` can find the necessary CUDA libraries.  On most systems, CUDA
is installed in ``/usr/local/cuda``, which is not part of the default
``gcc`` search path.  You may need to set the following environment
variables before running ``futhark cuda``::

  LIBRARY_PATH=/usr/local/cuda/lib64
  LD_LIBRARY_PATH=/usr/local/cuda/lib64/
  CPATH=/usr/local/cuda/include

SEE ALSO
========

:ref:`futhark-opencl(1)`