Shader Authoring and Dispatch¶
vkdispatch lets you write compute logic in Python syntax and compile it to GLSL at runtime. This page covers shader launch patterns and the key semantics of vkdispatch’s runtime shader generation model.
Examples below omit vd.initialize() and vd.make_context() because vkdispatch
creates them automatically on first runtime use. That default path is intentional:
generated shaders are specialized against the current machine/runtime unless you
explicitly choose dummy-mode codegen.
Runtime Generation Model¶
@vd.shader executes your Python function with tracing objects and emits shader code
as each operation runs. In practice:
vkdispatch inspects type-annotated arguments and creates shader variables.
arithmetic, indexing, swizzles, and assignment append GLSL statements.
the generated source is compiled into a compute plan and then dispatched.
This is different from AST/IR compilers: it is a forward streaming model, so explicit register materialization and explicit shader control-flow helpers matter for performance and correctness.
Default Runtime-Coupled Generation¶
By default, vkdispatch generates shaders for the active runtime backend and uses that
runtime’s limits when choosing implicit launch defaults such as local_size.
This is the normal mode for end-to-end execution:
define the kernel with
@vd.shaderlet
vkdispatchauto-initialize or callvd.initialize(...)yourselfexecute the shader or inspect
get_src()for the current machine
If you want controlled source generation without relying on the active runtime, use the dummy backend explicitly.
Imports and Type Annotations¶
Most shader examples use these imports:
import vkdispatch as vd
import vkdispatch.codegen as vc
from vkdispatch.codegen.abbreviations import *
Buff[...]is a shader buffer argument type.Const[...]is a uniform/constant argument type.Dtype aliases such as
f32,i32, andv2come from abbreviations.
Basic In-Place Kernel¶
import numpy as np
import vkdispatch as vd
import vkdispatch.codegen as vc
from vkdispatch.codegen.abbreviations import *
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def add_scalar(buff: Buff[f32], bias: Const[f32]):
tid = vc.global_invocation_id().x
buff[tid] = buff[tid] + bias
arr = np.arange(32, dtype=np.float32)
buff = vd.asbuffer(arr)
add_scalar(buff, 1.5)
result = buff.read(0)
print(result[:4]) # [1.5 2.5 3.5 4.5]
Launch Configuration¶
Use one of these launch patterns:
String expression (evaluated from function argument names):
# @vd.shader(exec_size=lambda args: args.in_buf.size) @vd.shader("in_buf.size") def kernel(in_buf: Buff[f32], out_buf: Buff[f32]): ...
Fixed total dispatch size:
@vd.shader(exec_size=(1024, 1, 1)) def kernel(...): ...
Dynamic size from call arguments:
@vd.shader(exec_size=lambda args: args.in_buf.size) def kernel(in_buf: Buff[f32], out_buf: Buff[f32]): ...
Explicit workgroups instead of
exec_size:@vd.shader(workgroups=(64, 1, 1), local_size=(128, 1, 1)) def kernel(...): ...
exec_size and workgroups are mutually exclusive.
The string form is often the most concise option for argument-dependent dispatch size.
It is evaluated dynamically, so it is slightly more brittle than the lambda form.
When you want the declaration itself to be more explicit and deterministic, prefer
exec_size=lambda args: ....
You can also override launch parameters per call:
# Reuse the same compiled shader with different dispatch sizes.
add_scalar(buff, 1.5, exec_size=buff.size)
Symbolic Expressions vs Mutable Registers¶
vkdispatch variables are symbolic by default. Reusing an expression in multiple places inlines that expression each time in generated code.
To materialize a value once and mutate it, convert it to a register with
to_register():
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def register_example(buff: Buff[f32]):
tid = vc.global_invocation_id().x
# Expression variable: may be inlined at each use.
expr = vc.sin(tid * 0.1)
# Register variable: emitted once, then reused.
cached = expr.to_register("cached")
buff[tid] = cached * 2.0 + cached / 3.0
Register Store Syntax ([:])¶
Python assignment rebinding (x = ...) changes the Python name, not the generated
shader register. To emit a GLSL assignment into an existing register, use full-slice
store syntax x[:] = ....
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def register_store(buff: Buff[f32]):
tid = vc.global_invocation_id().x
value = buff[tid].to_register("value")
value[:] = value * 0.5 + 1.0
buff[tid] = value
Shader Control Flow vs Python Control Flow¶
Native Python control flow with vkdispatch variables is intentionally blocked:
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def bad_branch(buff: Buff[f32]):
tid = vc.global_invocation_id().x
if tid < 10: # Raises ValueError: vkdispatch variables are not Python booleans.
buff[tid] = 1.0
Use shader control-flow helpers so both branches are emitted into generated code:
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def threshold(buff: Buff[f32], cutoff: Const[f32]):
tid = vc.global_invocation_id().x
vc.if_statement(buff[tid] > cutoff)
buff[tid] = 1.0
vc.else_statement()
buff[tid] = 0.0
vc.end()
Generation-Time Specialization (Meta-Programming)¶
Because kernel bodies execute as normal Python during generation, Python loops and conditionals are useful for specialization and unrolling.
def make_unrolled_sum(unroll: int):
# @vd.shader(exec_size=lambda args: args.dst.size)
@vd.shader("dst.size")
def unrolled_sum(src: Buff[f32], dst: Buff[f32]):
tid = vc.global_invocation_id().x
base = (tid * unroll).to_register("base")
acc = vc.new_float_register(0.0)
# Unrolled at generation time.
for i in range(unroll):
acc += src[base + i]
dst[tid] = acc
return unrolled_sum
sum4 = make_unrolled_sum(4)
sum8 = make_unrolled_sum(8)
# sum4 and sum8 compile to different shaders with different unrolled bodies.
Mapping Functions¶
Mapping functions are reusable typed snippets (often used with reductions and FFT I/O).
@vd.map
def square_value(x: Buff[f32]) -> f32:
idx = vd.reduce.mapped_io_index()
return x[idx] * x[idx]
You can pass mapping functions into APIs that accept mapping_function,
input_map, or output_map arguments.
Inspecting Generated Shader Source¶
get_src() returns the generated source for the currently selected runtime/codegen
configuration. In the default mode, that means the generated shader is tied to the
current machine/runtime by design.
For explicit codegen-only workflows, initialize the dummy backend first and select the output backend you want:
import vkdispatch as vd
import vkdispatch.codegen as vc
from vkdispatch.codegen.abbreviations import Buff, Const, f32
vd.initialize(backend="dummy")
vd.set_dummy_context_params(
subgroup_size=32,
max_workgroup_size=(128, 1, 1),
max_workgroup_count=(65535, 65535, 65535),
)
vc.set_codegen_backend("cuda")
# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def add_scalar(buff: Buff[f32], bias: Const[f32]):
tid = vc.global_invocation_id().x
buff[tid] = buff[tid] + bias
print(add_scalar.get_src(line_numbers=True))
A built shader can be printed for debugging:
print(add_scalar)
This prints GLSL-like generated source with line numbers, which is useful when debugging type issues or unsupported expressions.
Common Notes¶
All shader parameters must be type annotated.
Buffer/image arguments must use codegen types (for example,
Buff[f32],Img2[f32]).If you need batched submissions, prefer Command Graph Recording.
Shader API Reference¶
See the Full Python API Reference for complete API details on:
vkdispatch.shadervkdispatch.mapvkdispatch.ShaderFunctionvkdispatch.MappingFunction