For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
trace_buf
Zero-overhead per-CTA trace buffer for GPU kernel instrumentation.
A TraceBuf is a kernel-arg–shaped handle to a per-CTA timestamp slot
buffer. Implementations:
NullTraceis zero-sized; passing it as a kernel argument adds no bytes to the kernel ABI. Itsstoreispass, so the body of the surroundingcomptime if enable_trace:strips entirely at compile time.GmemTracewraps a singleUnsafePointer[UInt64]to a buffer sized fornum_blocks * events_per_blockslots and records timestamps via PTXglobaltimer(lowered fromglobal_perf_counter_ns).
Usage pattern (see nn/gemv_partial_norm.mojo and the SM100 grouped
SwiGLU+NVFP4 kernel):
fn my_kernel[..., enable_trace: Bool = False, TraceBufT: TraceBuf](
..., trace_buf: TraceBufT
):
comptime if enable_trace:
if thread_idx.x == 0:
trace_buf.store(
Int(block_idx.x) * EVENTS_PER_BLOCK + role,
UInt64(global_perf_counter_ns()),
)When enable_trace=False (default), every comptime if block strips
to nothing and the resulting PTX is byte-identical to a build with no
trace plumbing at all.
Structs
-
GmemTrace: HBM-backed trace buffer. -
NullTrace: Zero-sized no-op trace buffer.
Traits
-
TraceBuf: Trace-buffer interface. Implementations:NullTrace,GmemTrace.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!