IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

Struct_grouped_matmul_swiglu_nvfp4

struct Struct_grouped_matmul_swiglu_nvfp4

MOGG wrapper for fused grouped NVFP4 matmul + SwiGLU + NVFP4 quant.

Fuses the MoE gate/up grouped matmul, SwiGLU activation, and per-block NVFP4 quantization into a single SM100 kernel. The caller must pre-permute the weight b and its scale tile b_scales on the N axis with sigma(2i)=i, sigma(2i+1)=D+i (where D = moe_dim, N = 2D).

Implemented traits​

AnyType, ImplicitlyDestructible

Methods​

execute​

static def execute[a_type: DType, b_type: DType, scales_type: DType, //, target: StringSlice[StaticConstantOrigin]](c_packed: ManagedTensorSlice[Output, static_spec=c_packed.static_spec], c_swiglu_scales: ManagedTensorSlice[Output, static_spec=c_swiglu_scales.static_spec], a: ManagedTensorSlice[Input, static_spec=a.static_spec], b: ManagedTensorSlice[Input, static_spec=b.static_spec], a_scales: ManagedTensorSlice[Input, static_spec=a_scales.static_spec], b_scales: ManagedTensorSlice[Input, static_spec=b_scales.static_spec], expert_start_indices: ManagedTensorSlice[Input, static_spec=expert_start_indices.static_spec], expert_ids: ManagedTensorSlice[Input, static_spec=expert_ids.static_spec], a_scale_offsets: ManagedTensorSlice[Input, static_spec=a_scale_offsets.static_spec], expert_scales: ManagedTensorSlice[Input, static_spec=expert_scales.static_spec], c_input_scales: ManagedTensorSlice[Input, static_spec=c_input_scales.static_spec], estimated_total_m: UInt32, num_active_experts: UInt32, context: DeviceContext)

Executes fused grouped NVFP4 matmul + SwiGLU + NVFP4 quant.

Computes (c_packed, c_swiglu_scales) = quantize_nvfp4(silu(C[..., even]) * C[..., odd], c_input_scales) where C = A @ B^T for multiple expert groups. Because B is sigma-permuted on N, adjacent matmul-output columns carry (gate, up) pairs that the epilogue consumes in-place.

Parameters:

  • ​a_type (DType): The input A data type. Constraints: Must be uint8.
  • ​b_type (DType): The input B data type. Constraints: Must be uint8.
  • ​scales_type (DType): The scale factor data type. Constraints: Must be float8_e4m3fn.
  • ​target (StringSlice[StaticConstantOrigin]): The target GPU device.

Args: