Skip to main content

Python class

MoEQuantized

MoEQuantizedโ€‹

class max.nn.MoEQuantized(devices, hidden_dim, num_experts, num_experts_per_token, moe_dim, gate_cls=<class 'max.nn.moe.moe.MoEGate'>, mlp_cls=<class 'max.nn.linear.MLP'>, has_shared_experts=False, shared_experts_dim=0, ep_size=1, dtype=bfloat16, apply_router_weight_first=False, swiglu_limit=0.0, ep_batch_manager=None, quant_config=None, is_sharding=False)

source

Bases: MoE

Mixture of Experts with FP8 or NVFP4 quantization.

Parameters:

  • devices (list[DeviceRef])
  • hidden_dim (int)
  • num_experts (int)
  • num_experts_per_token (int)
  • moe_dim (int)
  • gate_cls (Callable[..., MoEGate])
  • mlp_cls (Callable[..., MLP])
  • has_shared_experts (bool)
  • shared_experts_dim (int)
  • ep_size (int)
  • dtype (DType)
  • apply_router_weight_first (bool)
  • swiglu_limit (float)
  • ep_batch_manager (EPBatchManager | None)
  • quant_config (QuantConfig | None)
  • is_sharding (bool)

down_proj_scalesโ€‹

property down_proj_scales: TensorValue

source

Returns stacked down-projection weight scales.

gate_up_proj_scalesโ€‹

property gate_up_proj_scales: TensorValue

source

Returns stacked gate/up weight scales for grouped matmul.