For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
rope_split_store_paged_ragged
def rope_split_store_paged_ragged[dtype: DType, freq_dtype: DType, q_out_dtype: DType = dtype, target: StringSlice[StaticConstantOrigin] = StringSlice("cpu"), interleaved: Bool = True](qkv: TileTensor[dtype, Storage=qkv.Storage, address_space=qkv.address_space, linear_idx_type=qkv.linear_idx_type, element_size=qkv.element_size], input_row_offsets: TileTensor[DType.uint32, Storage=input_row_offsets.Storage, address_space=input_row_offsets.address_space, linear_idx_type=input_row_offsets.linear_idx_type, element_size=input_row_offsets.element_size], freqs_cis: TileTensor[freq_dtype, Storage=freqs_cis.Storage, address_space=freqs_cis.address_space, linear_idx_type=freqs_cis.linear_idx_type, element_size=freqs_cis.element_size], kv_collection: PagedKVCacheCollection[scale_dtype_=kv_collection.scale_dtype_, quantization_granularity_=kv_collection.quantization_granularity_], layer_idx: UInt32, q_output: TileTensor[q_out_dtype, Storage=q_output.Storage, address_space=q_output.address_space, linear_idx_type=q_output.linear_idx_type, element_size=q_output.element_size], ctx: DeviceContext)
Rope+split+store with paged KV cache collection.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!