IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

matmul_1x1x1_conv3d

Direct _matmul_gpu dispatch for 1x1x1 3D convolutions.

A 1x1x1 conv (Q=R=S=1) with stride=1, dilation=1, groups=1, and zero padding is algebraically identical to a single matmul:

output[b, d, h, w, f] = Σ_c input[b, d, h, w, c] * filter[0, 0, 0, c, f]

NDHWC input is already C-innermost contiguous, so we can view it as [M, C_in] with M = BDH*W on the same pointer. Filter FCQRS/QRSCF with Q=R=S=1 reduces to [F, C] or [C, F] respectively — no transpose kernel needed. Output NDHWC collapses to [M, F]. No scratch allocation is required, and the epilogue unflattens (m, f) -> (b, d, h, w, f) in one call to the 5D lambda.

Covers every 1x1x1 case in the WAN VAE (post_quant_conv, per-block conv_shortcut). Used in conv_gpu's 5D arm as the first branch in the QRSCF dispatch chain, before dispatch_im2col_matmul_conv3d.

Functions