Mojo function
mla_decoding_single_batch
mla_decoding_single_batch[q_type: DType, k_t: MHAOperand, output_type: DType, mask_t: MHAMask, *, BM: UInt, BN: UInt, BK: UInt, WM: UInt, WN: UInt, depth: UInt, depth_v: UInt, num_heads: UInt, num_threads: UInt, num_pipeline_stages: UInt, group: UInt = 1, decoding_warp_split_k: Bool = False](q_ptr: UnsafePointer[Scalar[q_type], MutAnyOrigin], k: k_t, output_ptr: UnsafePointer[Scalar[output_type], MutAnyOrigin], exp_sum_ptr: UnsafePointer[Scalar[get_accum_type[q_type]()], MutAnyOrigin], qk_max_ptr: UnsafePointer[Scalar[get_accum_type[q_type]()], MutAnyOrigin], scale: Float32, num_keys: UInt, num_partitions: UInt, max_cache_valid_length: UInt, mask: mask_t, batch_idx: Int)
Flash attention v2 algorithm.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!