For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python function

forward_moe_sharded_layers

`forward_moe_sharded_layers()`

max.nn.forward_moe_sharded_layers(shards, xs, eplb_counter_buffers=None, layer_idx_per_device=None)

source

Forward pass through DP-sharded layers (EP MoE or replicated MLP/MoE).

For EP-enabled MoE shards this runs the full expert-parallel communication path (dispatch -> local compute -> combine). For everything else (replicated MLP, non-EP MoE) it falls back to forward_sharded_layers().

Parameters:

shards (Sequence[Callable[[TensorValue], TensorValue]]) – Per-device shard callables (MoE, MLP, etc.).
xs (list[TensorValue]) – Input tensors, one per shard.
eplb_counter_buffers (list[BufferValue] | None)
layer_idx_per_device (list[TensorValue] | None)

Returns:

Output tensors, one per shard.

Return type:

outputs

forward_moe_sharded_layers()​

`forward_moe_sharded_layers()`