Skip to main content

Mojo struct

Grouped1D1DSmem

struct Grouped1D1DSmem[a_type: DType, b_type: DType, c_type: DType, sfa_dtype: DType, sfb_dtype: DType, transpose_b: Bool, *, config: BlockScaledMatmulConfig[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b]]

SMEM struct for grouped 1D-1D block-scaled GEMM without CLC scheduler.

Thin wrapper over BlockScaledTileCore + SmemPipelineBundleNoClc. Uses 3-warp specialization (Load, MMA, Epilogue) without a scheduler warp.

Fields

  • core (Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core):
  • pipelines (Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Pipelines):
  • sfb_load_barriers (RawBarrierStorage[Int[Int](Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_group_pipeline_stages)]):
  • sfb_tma_barriers (BarrierPair[Int[Int](Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_group_pipeline_stages)]):

Implemented traits

AnyType, ImplicitlyDestructible

comptime members

Core

comptime Core = BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config]

Pipelines

comptime Pipelines = SmemPipelineBundleNoClc[Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_group_pipeline_stages, Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_accum_pipeline_stages, BlockScaledTilePayload[a_type, b_type, sfa_dtype, sfb_dtype, IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BM, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BK, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BN, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BK, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFA_DIM0, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFA_DIM1, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFB_DIM0, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFB_DIM1, __list_literal__=Tuple()), BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].num_pipeline_stages]]

Methods

sfb_load_mbars_ptr

sfb_load_mbars_ptr(ref[AddressSpace._value] self) -> MbarPtr

Get pointer to SFB-load mbarrier array (SFB Load→MMA).

Returns:

MbarPtr

sfb_tma_mbars_ptr

sfb_tma_mbars_ptr(ref[AddressSpace._value] self) -> MbarPtr

Get pointer to SFB TMA pipeline mbarrier array (SfbTMALoad↔MMA).

Returns:

MbarPtr

a_tiles

a_tiles(ref[AddressSpace._value] self) -> Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.ATileArray

Get A tile array accessor.

Returns:

Grouped1D1DSmem

b_tiles

b_tiles(ref[AddressSpace._value] self) -> Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.BTileArray

Get B tile array accessor.

Returns:

Grouped1D1DSmem

c_tiles

c_tiles(ref[AddressSpace._value] self) -> Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.CTileArray

Get C tile array accessor.

Returns:

Grouped1D1DSmem

sfa_tiles

sfa_tiles(ref[AddressSpace._value] self) -> Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.SFATileArray

Get SFA tile array accessor.

Returns:

Grouped1D1DSmem

sfb_tiles

sfb_tiles(ref[AddressSpace._value] self) -> Grouped1D1DSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.SFBTileArray

Get SFB tile array accessor.

Returns:

Grouped1D1DSmem

ab_pipeline_size

static ab_pipeline_size() -> Int

Total size of A+B tiles for all pipeline stages (in elements).

Returns:

Int

sf_pipeline_size

static sf_pipeline_size() -> Int

Total size of SFA+SFB tiles for all pipeline stages (in elements).

Returns:

Int

c_output_size

static c_output_size() -> Int

Size of C tiles for all output stages (in elements).

Returns:

Int

total_tile_size

static total_tile_size() -> Int

Total tile storage size (A+B+SFA+SFB+C) in elements.

Returns:

Int

Was this page helpful?