Shader Authoring and Dispatch ============================= vkdispatch lets you write compute logic in Python syntax and compile it to GLSL at runtime. This page covers shader launch patterns and the key semantics of vkdispatch's runtime shader generation model. Examples below omit ``vd.initialize()`` and ``vd.make_context()`` because vkdispatch creates them automatically on first runtime use. That default path is intentional: generated shaders are specialized against the current machine/runtime unless you explicitly choose dummy-mode codegen. Runtime Generation Model ------------------------ ``@vd.shader`` executes your Python function with tracing objects and emits shader code as each operation runs. In practice: 1. vkdispatch inspects type-annotated arguments and creates shader variables. 2. arithmetic, indexing, swizzles, and assignment append GLSL statements. 3. the generated source is compiled into a compute plan and then dispatched. This is different from AST/IR compilers: it is a forward streaming model, so explicit register materialization and explicit shader control-flow helpers matter for performance and correctness. Default Runtime-Coupled Generation ---------------------------------- By default, ``vkdispatch`` generates shaders for the active runtime backend and uses that runtime's limits when choosing implicit launch defaults such as ``local_size``. This is the normal mode for end-to-end execution: 1. define the kernel with ``@vd.shader`` 2. let ``vkdispatch`` auto-initialize or call ``vd.initialize(...)`` yourself 3. execute the shader or inspect ``get_src()`` for the current machine If you want controlled source generation without relying on the active runtime, use the dummy backend explicitly. Imports and Type Annotations ---------------------------- Most shader examples use these imports: .. code-block:: python import vkdispatch as vd import vkdispatch.codegen as vc from vkdispatch.codegen.abbreviations import * * ``Buff[...]`` is a shader buffer argument type. * ``Const[...]`` is a uniform/constant argument type. * Dtype aliases such as ``f32``, ``i32``, and ``v2`` come from abbreviations. Basic In-Place Kernel --------------------- .. code-block:: python import numpy as np import vkdispatch as vd import vkdispatch.codegen as vc from vkdispatch.codegen.abbreviations import * # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def add_scalar(buff: Buff[f32], bias: Const[f32]): tid = vc.global_invocation_id().x buff[tid] = buff[tid] + bias arr = np.arange(32, dtype=np.float32) buff = vd.asbuffer(arr) add_scalar(buff, 1.5) result = buff.read(0) print(result[:4]) # [1.5 2.5 3.5 4.5] Launch Configuration -------------------- Use one of these launch patterns: * String expression (evaluated from function argument names): .. code-block:: python # @vd.shader(exec_size=lambda args: args.in_buf.size) @vd.shader("in_buf.size") def kernel(in_buf: Buff[f32], out_buf: Buff[f32]): ... * Fixed total dispatch size: .. code-block:: python @vd.shader(exec_size=(1024, 1, 1)) def kernel(...): ... * Dynamic size from call arguments: .. code-block:: python @vd.shader(exec_size=lambda args: args.in_buf.size) def kernel(in_buf: Buff[f32], out_buf: Buff[f32]): ... * Explicit workgroups instead of ``exec_size``: .. code-block:: python @vd.shader(workgroups=(64, 1, 1), local_size=(128, 1, 1)) def kernel(...): ... ``exec_size`` and ``workgroups`` are mutually exclusive. The string form is often the most concise option for argument-dependent dispatch size. It is evaluated dynamically, so it is slightly more brittle than the lambda form. When you want the declaration itself to be more explicit and deterministic, prefer ``exec_size=lambda args: ...``. You can also override launch parameters per call: .. code-block:: python # Reuse the same compiled shader with different dispatch sizes. add_scalar(buff, 1.5, exec_size=buff.size) Symbolic Expressions vs Mutable Registers ----------------------------------------- vkdispatch variables are symbolic by default. Reusing an expression in multiple places inlines that expression each time in generated code. To materialize a value once and mutate it, convert it to a register with ``to_register()``: .. code-block:: python # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def register_example(buff: Buff[f32]): tid = vc.global_invocation_id().x # Expression variable: may be inlined at each use. expr = vc.sin(tid * 0.1) # Register variable: emitted once, then reused. cached = expr.to_register("cached") buff[tid] = cached * 2.0 + cached / 3.0 Register Store Syntax (``[:]``) ------------------------------- Python assignment rebinding (``x = ...``) changes the Python name, not the generated shader register. To emit a GLSL assignment into an existing register, use full-slice store syntax ``x[:] = ...``. .. code-block:: python # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def register_store(buff: Buff[f32]): tid = vc.global_invocation_id().x value = buff[tid].to_register("value") value[:] = value * 0.5 + 1.0 buff[tid] = value Shader Control Flow vs Python Control Flow ------------------------------------------ Native Python control flow with vkdispatch variables is intentionally blocked: .. code-block:: python # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def bad_branch(buff: Buff[f32]): tid = vc.global_invocation_id().x if tid < 10: # Raises ValueError: vkdispatch variables are not Python booleans. buff[tid] = 1.0 Use shader control-flow helpers so both branches are emitted into generated code: .. code-block:: python # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def threshold(buff: Buff[f32], cutoff: Const[f32]): tid = vc.global_invocation_id().x vc.if_statement(buff[tid] > cutoff) buff[tid] = 1.0 vc.else_statement() buff[tid] = 0.0 vc.end() Generation-Time Specialization (Meta-Programming) ------------------------------------------------- Because kernel bodies execute as normal Python during generation, Python loops and conditionals are useful for specialization and unrolling. .. code-block:: python def make_unrolled_sum(unroll: int): # @vd.shader(exec_size=lambda args: args.dst.size) @vd.shader("dst.size") def unrolled_sum(src: Buff[f32], dst: Buff[f32]): tid = vc.global_invocation_id().x base = (tid * unroll).to_register("base") acc = vc.new_float_register(0.0) # Unrolled at generation time. for i in range(unroll): acc += src[base + i] dst[tid] = acc return unrolled_sum sum4 = make_unrolled_sum(4) sum8 = make_unrolled_sum(8) # sum4 and sum8 compile to different shaders with different unrolled bodies. Mapping Functions ----------------- Mapping functions are reusable typed snippets (often used with reductions and FFT I/O). .. code-block:: python @vd.map def square_value(x: Buff[f32]) -> f32: idx = vd.reduce.mapped_io_index() return x[idx] * x[idx] You can pass mapping functions into APIs that accept ``mapping_function``, ``input_map``, or ``output_map`` arguments. Inspecting Generated Shader Source ---------------------------------- ``get_src()`` returns the generated source for the currently selected runtime/codegen configuration. In the default mode, that means the generated shader is tied to the current machine/runtime by design. For explicit codegen-only workflows, initialize the dummy backend first and select the output backend you want: .. code-block:: python import vkdispatch as vd import vkdispatch.codegen as vc from vkdispatch.codegen.abbreviations import Buff, Const, f32 vd.initialize(backend="dummy") vd.set_dummy_context_params( subgroup_size=32, max_workgroup_size=(128, 1, 1), max_workgroup_count=(65535, 65535, 65535), ) vc.set_codegen_backend("cuda") # @vd.shader(exec_size=lambda args: args.buff.size) @vd.shader("buff.size") def add_scalar(buff: Buff[f32], bias: Const[f32]): tid = vc.global_invocation_id().x buff[tid] = buff[tid] + bias print(add_scalar.get_src(line_numbers=True)) A built shader can be printed for debugging: .. code-block:: python print(add_scalar) This prints GLSL-like generated source with line numbers, which is useful when debugging type issues or unsupported expressions. Common Notes ------------ * All shader parameters must be type annotated. * Buffer/image arguments must use codegen types (for example, ``Buff[f32]``, ``Img2[f32]``). * If you need batched submissions, prefer :doc:`Command Graph Recording `. Shader API Reference -------------------- See the :doc:`Full Python API Reference <../python_api>` for complete API details on: * ``vkdispatch.shader`` * ``vkdispatch.map`` * ``vkdispatch.ShaderFunction`` * ``vkdispatch.MappingFunction``