Mojo struct
OutputPipelineConfig
@register_passable(trivial)
struct OutputPipelineConfig
Configuration for the MMA-to-Epilogue output pipeline.
Bundles the three parameters that jointly define TMEM accumulator stage management for MMA/epilogue synchronization:
- num_stages: Number of accumulator pipeline stages (typically 1 or 2).
- stage_stride_cols: TMEM column stride between accumulator stages.
- cta_group: CTA group size (1 or 2).
stage_stride_cols computation: Two strategies are used depending on the kernel family:
- Standard kernels (default, blockwise_fp8):
NUM_TMEM_COLS // num_stages(= 512 // stages). Divides all 512 TMEM columns evenly among stages. - Block-scaled kernels (block_scaled, grouped, 1d1d variants):
MMA_N. Sizes each stage to match the MMA output width, which may be smaller than half of TMEM when MMA_N < 256.
Constructed once per kernel struct and propagated to all pipeline types (OutputTilePipeline, warp contexts, TileWriter, etc.).
Fields
- num_stages (
Int): - stage_stride_cols (
Int): - cta_group (
Int):
Implemented traits
AnyType,
Copyable,
Equatable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime members
__copy_ctor_is_trivial
comptime __copy_ctor_is_trivial = True
__del__is_trivial
comptime __del__is_trivial = True
__move_ctor_is_trivial
comptime __move_ctor_is_trivial = True
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!