Skip to main content

Mojo struct

OutputPipelineConfig

@register_passable(trivial) struct OutputPipelineConfig

Configuration for the MMA-to-Epilogue output pipeline.

Bundles the three parameters that jointly define TMEM accumulator stage management for MMA/epilogue synchronization:

  • num_stages: Number of accumulator pipeline stages (typically 1 or 2).
  • stage_stride_cols: TMEM column stride between accumulator stages.
  • cta_group: CTA group size (1 or 2).

stage_stride_cols computation: Two strategies are used depending on the kernel family:

  • Standard kernels (default, blockwise_fp8): NUM_TMEM_COLS // num_stages (= 512 // stages). Divides all 512 TMEM columns evenly among stages.
  • Block-scaled kernels (block_scaled, grouped, 1d1d variants): MMA_N. Sizes each stage to match the MMA output width, which may be smaller than half of TMEM when MMA_N < 256.

Constructed once per kernel struct and propagated to all pipeline types (OutputTilePipeline, warp contexts, TileWriter, etc.).

Fields

  • num_stages (Int):
  • stage_stride_cols (Int):
  • cta_group (Int):

Implemented traits

AnyType, Copyable, Equatable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members

__copy_ctor_is_trivial

comptime __copy_ctor_is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__move_ctor_is_trivial

comptime __move_ctor_is_trivial = True

Was this page helpful?