Mojo module
mha_sm100_2q
Aliases
LocalTensor
alias LocalTensor[dtype: DType, layout: Layout, element_layout: Layout = Layout(IntTuple(1), IntTuple(1))] = LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(5), element_layout=element_layout]
Parameters
MBarType
alias MBarType = UnsafePointer[SharedMemBarrier, address_space=AddressSpace(3)]
SharedMemPointer
alias SharedMemPointer[type: AnyType] = UnsafePointer[type, address_space=AddressSpace(3)]
Parameters
- type (
AnyType):
SharedMemTensor
alias SharedMemTensor[dtype: DType, layout: Layout] = LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(3), layout_int_type=DType.int32, linear_idx_type=DType.int32, alignment=128]
Parameters
Structs
-
ConsumerPipeline: -
FA4Config: -
FA4MiscMBars: -
KVConsumerPipeline: Pipeline for managing the consumption of K and V. This follows the order of Tri Dao and Cutlass implementations (modulo any rotation of the ops through the iterations). -
KVPipeline: KVPipeline hasnum_kv_stages * num_mma_stagesstages.num_kv_stagesrefers to how manyKandVtiles we pipeline for performing theS = Q@K'andO += P@VMMAs. Each of these MMAs is broken up intonum_mma_stagespipelined MMAs. We setstep=Falsefor all but the last MMA that completes the operation. An alternative implementation would separate the two, and potentially allow for more overall stages at the cost of slightly more bookkeeping. -
KVProducerPipeline: -
MBarPipeline: -
ProducerPipeline: -
SM100MHA2Q: -
SM100TensorAccumulatorSS: -
SM100TensorAccumulatorTS: -
STMatrixLayout: Layout for usingst_matrixfor writing the final accumulator to smem. -
STMatrixOffsets: -
TMADestination: -
TMemTile:
Functions
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!