For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

sm100

NVIDIA SM100 (Blackwell) attention kernels.

Covers MHA (flash-attention v4) and MLA (multi-head latent attention) for both prefill and decode, including FP8 and block-scaled quantization variants.

Packages

mha_depth512:

Modules

attention: FA4 (Flash Attention 4) configuration for SM100 (Blackwell) kernels.
attention_utils: Shared SM100 attention primitives used by both MHA and MLA kernels.
correction_warp: Correction warp group logic for FA4 (SM100 Flash Attention).
dispatch:
kernel:
load_warp: TMA load warp logic for FA4 (SM100 Flash Attention).
mha_1q:
mla_decode_combine: MLA Decode Split-K Combine Kernel for SM100 (B200).
mla_decode_dispatch:
mla_decode_kv_bf16:
mla_decode_kv_fp8:
mla_decode_qkv_fp8: Native FP8 MLA decode kernel for SM100 (B200).
mla_decode_qkv_fp8_layout_g: Native FP8 MLA decode kernel for SM100 (B200) — Layout G fold path.
mla_decode_qkv_fp8_per_token_scale_rope_aware: SnapMLA FP8+BF16 MLA decode kernel for SM100 (B200).
mla_decode_sparse:
mla_decode_sparse_kv_bf16: SM100 (B200) sparse MLA decode kernel with BF16 KV cache.
mla_decode_sparse_kv_fp8:
mla_decode_utils:
mla_prefill:
mla_prefill_blockscale:
mla_prefill_generic:
mla_prefill_per_token_scale: Per-token-scale MLA prefill kernel.
mla_prefill_sparse:
mla_prefill_utils:
mma_warp: MMA warp logic for FA4 (SM100 Flash Attention).
smem: Shared memory layout for SM100 attention kernels.
softmax_warp: Softmax warp group logic for FA4 (SM100 Flash Attention).

Packages​

Modules​

Packages

Modules