Skip to main content

Mojo function

mla_decoding

mla_decoding[q_type: DType, k_t: MHAOperand, output_type: DType, mask_t: MHAMask, ValidLT: TensorLayout, BM: Int, BN: Int, BK: Int, WM: Int, WN: Int, depth: Int, num_heads: Int, num_threads: Int, num_pipeline_stages: Int, group: Int = 1, ragged: Bool = False, _use_valid_length: Bool = False, _is_cache_length_accurate: Bool = False, decoding_warp_split_k: Bool = False](q_ptr: UnsafePointer[Scalar[q_type], MutAnyOrigin], k: k_t, output_ptr: UnsafePointer[Scalar[output_type], MutAnyOrigin], exp_sum_ptr: UnsafePointer[Scalar[get_accum_type[q_type]()], MutAnyOrigin], qk_max_ptr: UnsafePointer[Scalar[get_accum_type[q_type]()], MutAnyOrigin], scale: Float32, batch_size: Int, num_partitions: Int, max_cache_valid_length: Int, valid_length_tt: TileTensor[DType.uint32, ValidLT, MutAnyOrigin], mask: mask_t)

Was this page helpful?