For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mla

`comptime` values

`AMD_MLA_DECODE_FOLD_M_MAX`

comptime AMD_MLA_DECODE_FOLD_M_MAX = 128

`AMD_MLA_DECODE_FOLD_MAX_NUM_HEADS`

comptime AMD_MLA_DECODE_FOLD_MAX_NUM_HEADS = 16

`MLA_DECODE_MAX_SEQ_LEN`

comptime MLA_DECODE_MAX_SEQ_LEN = 8

Functions

copy_fn_unified:
flare_mla_decoding: MLA decoding kernel that would only be called in the optimized compute graph.
flare_mla_decoding_dispatch:
flare_mla_prefill: MLA prefill kernel that would only be called in the optimized compute graph. Only supports ragged Q/K/V inputs.
flare_mla_prefill_dispatch:
mla_decode_max_seq_len: Max query tokens (S) the MLA decode branch can fold for this config.
mla_decoding:
mla_decoding_single_batch: Flash attention v2 algorithm.
mla_prefill:
mla_prefill_plan: This calls a GPU kernel that plans how to process a batch of sequences with varying lengths using a fixed-size buffer.
mla_prefill_plan_kernel:
mla_prefill_single_batch: MLA for encoding where seqlen > 1.
mla_splitk_reduce:
q_block_idx:
set_buffer_lengths_to_zero:

comptime values​

AMD_MLA_DECODE_FOLD_M_MAX​

AMD_MLA_DECODE_FOLD_MAX_NUM_HEADS​

MLA_DECODE_MAX_SEQ_LEN​

Functions​

`comptime` values

`AMD_MLA_DECODE_FOLD_M_MAX`

`AMD_MLA_DECODE_FOLD_MAX_NUM_HEADS`

`MLA_DECODE_MAX_SEQ_LEN`

Functions