Skip to main content

Mojo function

grouped_matmul_1d1d_nvfp4

grouped_matmul_1d1d_nvfp4[a_type: DType, b_type: DType, c_type: DType, sfa_dtype: DType, sfb_dtype: DType, transpose_b: Bool, *, config: BlockScaledMatmulConfig[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b]](c_device: TileTensor[c_device.dtype, c_device.LayoutType, c_device.origin, address_space=c_device.address_space, linear_idx_type=c_device.linear_idx_type, element_shape_types=c_device.element_shape_types], a_device: TileTensor[a_device.dtype, a_device.LayoutType, a_device.origin, address_space=a_device.address_space, linear_idx_type=a_device.linear_idx_type, element_shape_types=a_device.element_shape_types], a_offsets: TileTensor[a_offsets.dtype, a_offsets.LayoutType, a_offsets.origin, address_space=a_offsets.address_space, linear_idx_type=a_offsets.linear_idx_type, element_shape_types=a_offsets.element_shape_types], a_scale_offsets: TileTensor[a_scale_offsets.dtype, a_scale_offsets.LayoutType, a_scale_offsets.origin, address_space=a_scale_offsets.address_space, linear_idx_type=a_scale_offsets.linear_idx_type, element_shape_types=a_scale_offsets.element_shape_types], _b_device: TileTensor[_b_device.dtype, _b_device.LayoutType, _b_device.origin, address_space=_b_device.address_space, linear_idx_type=_b_device.linear_idx_type, element_shape_types=_b_device.element_shape_types], expert_ids: TileTensor[expert_ids.dtype, expert_ids.LayoutType, expert_ids.origin, address_space=expert_ids.address_space, linear_idx_type=expert_ids.linear_idx_type, element_shape_types=expert_ids.element_shape_types], a_scales: TileTensor[a_scales.dtype, a_scales.LayoutType, a_scales.origin, address_space=a_scales.address_space, linear_idx_type=a_scales.linear_idx_type, element_shape_types=a_scales.element_shape_types], _b_scales: TileTensor[_b_scales.dtype, _b_scales.LayoutType, _b_scales.origin, address_space=_b_scales.address_space, linear_idx_type=_b_scales.linear_idx_type, element_shape_types=_b_scales.element_shape_types], expert_scales: TileTensor[expert_scales.dtype, expert_scales.LayoutType, expert_scales.origin, address_space=expert_scales.address_space, linear_idx_type=expert_scales.linear_idx_type, element_shape_types=expert_scales.element_shape_types], num_active_experts: Int, ctx: DeviceContext)

Launch grouped 1D-1D block-scaled matmul kernel.

This function sets up TMA descriptors and launches the kernel with the proper configuration for 1D-1D tensor layout.

Args:

  • c_device (TileTensor): Output tensor (total_tokens, N).
  • a_device (TileTensor): Input A tensor (total_tokens, K).
  • a_offsets (TileTensor): Per-expert offsets (num_active_experts + 1).
  • a_scale_offsets (TileTensor): Per-expert scale offsets (num_active_experts).
  • _b_device (TileTensor): Weight tensor B (num_experts, N, K).
  • expert_ids (TileTensor): Active expert IDs (num_active_experts).
  • a_scales (TileTensor): Scale factors for A (5D).
  • _b_scales (TileTensor): Scale factors for B (6D).
  • expert_scales (TileTensor): Per-expert output scaling (num_experts).
  • num_active_experts (Int): Number of active experts.
  • ctx (DeviceContext): Device context.

Was this page helpful?