Mojo function

mla_prefill_single_batch

mla_prefill_single_batch[q_type: DType, k_t: MHAOperand, v_t: MHAOperand, k_rope_t: MHAOperand, output_type: DType, mask_t: MHAMask, score_mod_t: ScoreModTrait, *, config: MHAConfig[dtype], group: Int = 1, q_depth: Int = 192, cache_depth: Int = 576, use_score_mod: Bool = False, write_softmax_info: Bool = False, use_cascade_attention: Bool = False](q_ptr: LegacyUnsafePointer[Scalar[q_type]], k: k_t, v: v_t, k_rope: k_rope_t, output_ptr: LegacyUnsafePointer[Scalar[output_type]], softmax_info_ptr: LegacyUnsafePointer[Scalar[get_accum_type[q_type]()]], prev_output_ptr: LegacyUnsafePointer[Scalar[output_type]], prev_softmax_info_ptr: LegacyUnsafePointer[Scalar[get_accum_type[q_type]()]], scale: Float32, seq_len: Int, max_seq_len: Int, start_pos: UInt32, cache_start_pos: UInt32, num_keys: Int, mask: mask_t, score_mod: score_mod_t, batch_idx: Int)

MLA for encoding where seqlen > 1.