Mojo function
tma_wait_pipelined
tma_wait_pipelined[c_type: DType, c_layout: Layout, c_desc_layout: Layout, is_last_stage: Bool](c_tma_op: TMATensorTile[c_type, c_layout, c_desc_layout])
Wait for TMA stores with pipelining.
For SM100 output pipeline:
- Non-last stages: Keep 1 store in flight for pipelining
- Last stage: Wait for all stores to complete
Template Parameters: c_type: Output data type. c_layout: Global memory layout for C. c_desc_layout: TMA descriptor layout for C. is_last_stage: If True, wait for all; else keep 1 in flight.
Args:
- c_tma_op (
TMATensorTile): TMA tensor tile descriptor.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!