IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

Struct_msa_indexer_ragged_paged

struct Struct_msa_indexer_ragged_paged

Implemented traits​

AnyType, ImplicitlyDeletable

Methods​

execute​

static def execute[*, num_index_heads: Int, idx_head_dim: Int, block_size: Int, topk: Int, init_blocks: Int, local_blocks: Int](out_idxs: ManagedTensorSlice[Output, static_spec=out_idxs.static_spec], q: ManagedTensorSlice[Input, static_spec=q.static_spec], input_row_offsets: ManagedTensorSlice[Input, static_spec=input_row_offsets.static_spec], prefix_lens: ManagedTensorSlice[Input, static_spec=prefix_lens.static_spec], k_blocks: ManagedTensorSlice[MutableInput, static_spec=k_blocks.static_spec], k_cache_lengths: ManagedTensorSlice[Input, static_spec=k_cache_lengths.static_spec], k_lookup_table: ManagedTensorSlice[Input, static_spec=k_lookup_table.static_spec], k_max_lengths: ManagedTensorSlice[Input, static_spec=k_max_lengths.static_spec], layer_idx: UInt32, scale: Float32, ctx: DeviceContext)

Select top-k key blocks per (index head, query token).

Dispatches to the decode kernel when kv_collection.max_seq_length == 1 (one new index-K token per sequence) and to the prefill kernel otherwise.

Parameters:

  • ​num_index_heads (Int): Number of index (query) heads.
  • ​idx_head_dim (Int): Index head dimension.
  • ​block_size (Int): KV block size in tokens (== page_size).
  • ​topk (Int): Number of blocks to select per token.
  • ​init_blocks (Int): Always-keep leading blocks (forced high score).
  • ​local_blocks (Int): Always-keep trailing/local blocks (forced score).

Args: