Mojo function
generic_fused_qk_rope_bshd_paged
generic_fused_qk_rope_bshd_paged[dtype: DType, //, *, interleaved: Bool, target: StringSlice[StaticConstantOrigin]](q_proj: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], kv_collection: PagedKVCacheCollection[dtype_, kv_params_, page_size], freqs_cis: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], layer_idx: UInt32, valid_lengths: LayoutTensor[DType.uint32, Layout.row_major(-1), MutAnyOrigin], output: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], context: DeviceContextPtr = DeviceContextPtr())
Performs a fused RoPE projection for Q and K with paged KV cache.
This is the paged equivalent of generic_fused_qk_rope_bshd_continuous_batch. It applies RoPE to both Q (returned) and K (in paged cache) to ensure proper dependency ordering after fused_qkv_padded_matmul.
Args:
- q_proj (
LayoutTensor): Query projection tensor of shape [batch, seq_len, n_heads, head_dim]. - kv_collection (
PagedKVCacheCollection): The paged KV cache collection. - freqs_cis (
LayoutTensor): Frequency tensor for RoPE of shape [max_seq_len, head_dim]. - layer_idx (
UInt32): The layer index for accessing the correct cache. - valid_lengths (
LayoutTensor): Tensor of shape [batch] containing the valid length for each sequence. RoPE is only applied to positions within these lengths. - output (
LayoutTensor): Output tensor for Q with RoPE applied, same shape as q_proj. - context (
DeviceContextPtr): Device context pointer for execution.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!