For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

attention

Registers attention graph ops (MHA, MLA, fused QKV) and dispatches them to the nn.attention kernels.

`comptime` values

`logger`

comptime logger = Logger(stdout, prefix=String(""), source_location=False)

Structs

FlashAttentionGPU: Registers the mo.mha.no_cache graph op with the graph compiler.
MaskedFlashAttentionGPU: Registers the mo.composite.masked_flash_attention_gpu graph op with the graph compiler.
MLAIndexerRaggedFloat8Paged: Registers the mo.mla.indexer.ragged.float8.paged graph op with the graph compiler.
NoMaskFlashAttentionCPU: Registers the mo.composite.no_mask_flash_attention_cpu graph op with the graph compiler.
PaddedFlashAttentionGPU: Registers the mo.mha.padded.no_cache graph op with the graph compiler.
RaggedFlashAttentionGPU: Registers the mo.mha.ragged.no_cache graph op with the graph compiler.
Struct_cross_attention_ragged_paged: Registers the mo.cross_attention.ragged.paged graph op with the graph compiler.
Struct_fused_qk_rope_padded_paged: Registers the mo.fused_qk_rope.padded.paged graph op with the graph compiler.
Struct_fused_qk_rope_ragged_paged: Registers the mo.fused_qk_rope.ragged.paged graph op with the graph compiler.
Struct_fused_qk_rope_ragged_paged_with_position_id: Registers the mo.fused_qk_rope.ragged.paged.with_position_id graph op with the graph compiler.
Struct_fused_qkv_index_matmul_padded_ragged:
Struct_fused_qkv_index_matmul_padded_ragged_scale_mxfp8: Registers the mo.fused_qkv_index_matmul.ragged.paged.scale.mxfp8 graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_paged: Registers the mo.fused_qkv_matmul.padded.paged graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged: Registers the mo.fused_qkv_matmul.ragged.paged graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_bias: Registers the mo.fused_qkv_matmul.ragged.paged.bias graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_bias_quantized: Registers the mo.fused_qkv_matmul.ragged.paged.bias.quantized graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_quantized: Registers the mo.fused_qkv_matmul.ragged.paged.quantized graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_scale: Registers the mo.fused_qkv_matmul.ragged.paged.scale graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_scale_bias: Registers the mo.fused_qkv_matmul.ragged.paged.scale.bias graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_scale_float4: Registers the mo.fused_qkv_matmul.ragged.paged.scale.float4 graph op with the graph compiler.
Struct_fused_qkv_matmul_padded_ragged_scale_mxfp8: Registers the mo.fused_qkv_matmul.ragged.paged.scale.mxfp8 graph op with the graph compiler.
Struct_mha_decode_num_partitions: Registers the mo.mha.decode.get_num_partitions graph op with the graph compiler.
Struct_mha_padded_paged: Registers the mo.mha.padded.paged graph op with the graph compiler.
Struct_mha_ragged_paged_scalar_args: Registers the mo.mha.ragged.paged graph op with the graph compiler.
Struct_mha_ragged_paged_sink_weights_scalar_args: Registers the mo.mha.ragged.paged.sink_weights graph op with the graph compiler.
Struct_mla_compute_dispatch_args_scalar: Registers the mo.mla.compute_dispatch_args.scalar graph op with the graph compiler.
Struct_mla_decode_graph_bf16_paged: Registers the mo.mla.graph.decode.paged graph op with the graph compiler.
Struct_mla_decode_graph_bf16_paged_sparse: Registers the mo.mla.graph.decode.paged.sparse graph op with the graph compiler.
Struct_mla_decode_graph_paged_fp8: Registers the mo.mla.graph.decode.paged.fp8 graph op with the graph compiler.
Struct_mla_decode_graph_paged_fp8_sparse: Registers the mo.mla.graph.decode.paged.fp8.sparse graph op with the graph compiler.
Struct_mla_decode_ragged_paged: Registers the mo.mla.decode.ragged.paged graph op with the graph compiler.
Struct_mla_decode_ragged_paged_scaled: Registers the mo.mla.decode.ragged.paged.scaled graph op with the graph compiler.
Struct_mla_decompress_k_cache_ragged_paged: Registers the mo.mla.decompress.k.cache.ragged.paged graph op with the graph compiler.
Struct_mla_prefill_graph_bf16_paged: Registers the mo.mla.graph.prefill.paged graph op with the graph compiler.
Struct_mla_prefill_graph_decode_bf16_paged: Registers the mo.mla.graph.prefill.decode.paged graph op with the graph compiler.
Struct_mla_prefill_graph_decode_bf16_paged_quantized: Registers the mo.mla.graph.prefill.decode.paged.quantized graph op with the graph compiler.
Struct_mla_prefill_graph_decode_paged_fp8: Registers the mo.mla.graph.prefill.decode.paged.fp8 graph op with the graph compiler.
Struct_mla_prefill_graph_decode_paged_fp8_sparse: Registers the mo.mla.graph.prefill.decode.paged.fp8.sparse graph op with the graph compiler.
Struct_mla_prefill_graph_decode_paged_sparse:
Struct_mla_prefill_graph_paged: Registers the mo.mla.graph.prefill.paged.fp8 graph op with the graph compiler.
Struct_mla_prefill_ragged_paged: Registers the mo.mla.prefill.ragged.paged graph op with the graph compiler.
Struct_mla_prefill_ragged_plan: Registers the mo.mla.prefill.ragged.plan graph op with the graph compiler.
Struct_mla_prefill_sparse_paged: Registers the mo.mla.prefill.sparse.paged graph op with the graph compiler.
Struct_mla_prefill_sparse_paged_fp8: Registers the mo.mla.prefill.sparse.paged.fp8 graph op with the graph compiler.
WithMaskFlashAttentionCPU: Registers the mo.composite.masked_flash_attention_cpu graph op with the graph compiler.
WithMaskFlashAttentionSplitKVCPU: Registers the with_mask_flash_attention_split_kv_cpu graph op with the graph compiler.

Functions

with_mask_flash_attention_split_kv_cpu_shape: Computes the output shape for the with_mask_flash_attention_split_kv_cpu graph op.

comptime values​

logger​

Structs​

Functions​

`comptime` values

`logger`

Structs

Functions