For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
mla_components
MLA-prefill math components for AMD MI355X (gfx950).
MlaPrefillV2-owned home of the MLA-prefill numeric closure that
mla_prefill_v2.mojo consumes: the MlaPrefillV2Core[config] struct
(trimmed to the methods/constants _attend_exact and the
host launcher actually reference), the _MlaKDmaPair single-base K DMA
helper, and the module-level scheduling primitives (_sched_barrier_zero,
_s_barrier_raw, _s_setprio).
The MLA-prefill MATH (QK with nope d=128 + rope d=64; FlashAttention-2
online softmax via OnlineSoftmax; in-place FP8 P collapse; PV
accumulate; normalize + store; causal / null mask) lives here so the
inner loop can reuse the verified primitives while emitting its own
cluster cadence in mla_prefill_v2.mojo. The shared building blocks
(MhaMmaOp/MlaConfigV2 from mha_mma_op.mojo, the OnlineSoftmax
recurrence, the MaskApplier mask dispatch, the SubTileLoaderLDS_*
loaders from amd_tile_io.mojo) are imported in place; this module owns
only the MLA-prefill-specific assembly.
MlaPrefillV2Core's _FP32_SOFTMAX_SCORES gate (FP8 + KV>=128 + 32x32x64) is
the default-on path for the FP8 / KV>=128 / 32x32x64 shape, so every
reused primitive exercises the codegen MlaPrefillV2 ships.
Structs
-
MlaPrefillV2Core: 8-warp MLA forward kernel parameterized byMlaConfigV2.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!