Mojo module
mla
Functions
-
flare_mla_decoding
: MLA decoding kernel that would only be called in the optimized compute graph. -
flare_mla_decoding_dispatch
: -
flare_mla_prefill
: MLA prefill kernel that would only be called in the optimized compute graph. Only supports ragged Q/K/V inputs. -
flare_mla_prefill_dispatch
: -
mla_decoding
: -
mla_decoding_single_batch
: Flash attention v2 algorithm. -
mla_prefill
: -
mla_prefill_plan
: This calls a GPU kernel that plans how to process a batch of sequences with varying lengths using a fixed-size buffer. -
mla_prefill_plan_kernel
: -
mla_prefill_single_batch
: MLA for encoding where seqlen > 1.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!