Skip to main content

/

Mojo module

mha

Functions

depth_supported_by_gpu:
flash_attention:
flash_attention_dispatch:
flash_attention_hw_supported:
flash_attention_ragged:
get_mha_decoding_num_partitions:
mha:
mha_decoding:
mha_decoding_single_batch: Flash attention v2 algorithm.
mha_decoding_single_batch_pipelined: Flash attention v2 algorithm.
mha_gpu_naive:
mha_single_batch: MHA for token gen where seqlen = 1 and num_keys >= 1.
mha_single_batch_pipelined: MHA for token gen where seqlen = 1 and num_keys >= 1.
mha_splitk_reduce:
q_num_matrix_view_rows:
scale_and_mask_helper:

Functions

View source

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!