For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
mha_mask
Mask functor application for the MhaPrefillV2 att_block.
MaskApplier[mask_t, Q_BLOCK_SIZE, KV_BLOCK_SIZE] (below) bundles the
mask functor with the comptime block sizes and exposes a single
apply() entry that comptime-dispatches over the mask type:
NullMaskβ comptime-elided no-op. The mask trait would always reportNO_MASK, so the entry is statically dead.CausalMaskβ 16-wide SIMD fast path (onev_cmp+ onev_cndmaskper stripe), generalized forstart_posso the causal cap moves with the cache start position. Gated on the runtimeq_start_pos < kv_end_posshortcut so fully-unmasked tiles bypass the work entirely.- Anything else (
SlidingWindowCausalMask,ChunkedCausalMask,MaterializedMask, fused combinations) β runtimemask_functor.status(...)dispatch overNO_MASK(return),FULL_MASK(fill-inf),PARTIAL(per-element loop callingmask_functor.mask(coord, score)over the 16 fragment slots).
Per-element row-within-tile mapping comes from the
v_mfma_f32_32x32x16_bf16 accumulator fragment geometry; see
MhaMmaOp.ACC_ROW_OFFSETS_32x32.
Structsβ
- β
MaskApplier: Mask functor + comptime block-size bundle forMhaPrefillV2.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!