IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

generic_fused_qk_rope_bshd_paged

def generic_fused_qk_rope_bshd_paged[dtype: DType, //, *, interleaved: Bool, target: StringSlice[StaticConstantOrigin]](q_proj: TileTensor[dtype, address_space=q_proj.address_space, linear_idx_type=q_proj.linear_idx_type, element_size=q_proj.element_size], kv_collection: PagedKVCacheCollection, freqs_cis: TileTensor[dtype, address_space=freqs_cis.address_space, linear_idx_type=freqs_cis.linear_idx_type, element_size=freqs_cis.element_size], layer_idx: UInt32, valid_lengths: TileTensor[DType.uint32, address_space=valid_lengths.address_space, linear_idx_type=valid_lengths.linear_idx_type, element_size=valid_lengths.element_size], output: TileTensor[dtype, address_space=output.address_space, linear_idx_type=output.linear_idx_type, element_size=output.element_size], context: DeviceContext)

Performs a fused RoPE projection for Q and K with paged KV cache.

This is the paged equivalent of generic_fused_qk_rope_bshd_continuous_batch. It applies RoPE to both Q (returned) and K (in paged cache) to ensure proper dependency ordering after fused_qkv_padded_matmul.

Args: