For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo trait

TokenFormat

Specifies the wire format for a single MoE token in EP dispatch/combine.

Implementors encode how a token's hidden-state vector is packed for cross-GPU transfer (quantization, scale placement, alignment) and how the received bytes are unpacked into the output tensor. The graph compiler selects a concrete implementation based on the model's quantization config.

All size and alignment properties are compile-time constants so the dispatch kernel can allocate receive buffers and issue vectorized copies without runtime branching.

Implemented traits

AnyType, DevicePassable, ImplicitlyDeletable

`comptime` members

`alignment`

comptime alignment

`device_type`

comptime device_type

Indicate the type being used on accelerator devices.

`dispatch_smem_size`

comptime dispatch_smem_size

`dispatch_wait_tile_shape`

comptime dispatch_wait_tile_shape

`hid_dim`

comptime hid_dim

`top_k`

comptime top_k

Required methods

`token_size`

static def token_size() -> Int

Returns the size of the (quantized) token in bytes.

Returns:

Int

`copy_token_to_send_buf`

static def copy_token_to_send_buf[src_type: DType, block_size: Int, buf_addr_space: AddressSpace = AddressSpace.GENERIC](buf_p: Pointer[UInt8, address_space=buf_addr_space, _safe=False], src_p: Pointer[Scalar[src_type], address_space=src_p.address_space, _safe=False], input_scale: Float32)

Copy the token to the send buffer. This function needs to be called by all threads in the block.

`copy_msg_to_output_tensor`

def copy_msg_to_output_tensor[buf_addr_space: AddressSpace = AddressSpace.GENERIC](self, buf_p: Pointer[UInt8, address_space=buf_addr_space, _safe=False], token_index: Int, expert_slot: Int = Int(0), expert_start: Int = Int(0))

Copy the message to the output tensor. This function needs to be called by all threads in a warp.

expert_slot (= expert_id + shared_expert_offset) and expert_start (the expert's first output row) are supplied by the tile loop and used only by formats that fold the grouped-matmul scale preshuffle into this copy (MXFP4 KS224); other formats ignore them.

`get_type_name`

static def get_type_name() -> String

Gets the name of the host type (the one implementing this trait). For example, Int would return "Int", DeviceBuffer[DType.float32] would return "DeviceBuffer[DType.float32]". This is used for error messages when passing types to the device. TODO: This method will be retired soon when better kernel call error messages arrive.

Returns:

String: The host type's name.

Provided methods

`src_info_size`

static def src_info_size() -> Int

Returns the size of the source info in bytes. Currently, source info is a single int32 that stores a token's index in the original rank.

Returns:

Int

`topk_info_size`

static def topk_info_size() -> Int

Returns the size of the top-k info in bytes. Currently, top-k info is an array of uint16 that stores a token's top-k expert IDs.

Returns:

Int

`msg_size`

static def msg_size() -> Int

Returns the size of the message in bytes.

Returns:

Int

`src_info_offset`

static def src_info_offset() -> Int

Returns the offset of the source info in the message.

Returns:

Int

`topk_info_offset`

static def topk_info_offset() -> Int

Returns the offset of the top-k info in the message.

Returns:

Int

`pad_expert_offsets`

def pad_expert_offsets[n_groups: Int](self, row_offsets: Pointer[UInt32, address_space=row_offsets.address_space, _safe=False])

Pad the offsets to satisfy the grouped matmul alignment requirement.

`init_smem_resources`

def init_smem_resources(self)

Initialize the shared memory resources for the token format.

`copy_msg_tile_to_output_tensor`

def copy_msg_tile_to_output_tensor[extract_topk_info_func: def(Pointer[UInt8, MutUntrackedOrigin, _safe=False], Int) -> None, recv_buf_ptr_func: def(Int) -> Pointer[UInt8, MutUntrackedOrigin, _safe=False], //, n_warps: Int, shared_expert_offset: Int = Int(0)](self, expert_id: Int, expert_start_pos: Int, tile_id: Int, tile_end: Int, extract_topk_info_functor: extract_topk_info_func, recv_buf_ptr_functor: recv_buf_ptr_func)

Copy a tile of tokens from the receive buffer to the output tensor.

Implemented traits​

comptime members​

alignment​

device_type​

dispatch_smem_size​

dispatch_wait_tile_shape​

hid_dim​

top_k​

Required methods​

token_size​

copy_token_to_send_buf​

copy_msg_to_output_tensor​

get_type_name​

Provided methods​

src_info_size​

topk_info_size​

msg_size​

src_info_offset​

topk_info_offset​

pad_expert_offsets​

init_smem_resources​

copy_msg_tile_to_output_tensor​

Implemented traits

`comptime` members

`alignment`

`device_type`

`dispatch_smem_size`

`dispatch_wait_tile_shape`

`hid_dim`

`top_k`

Required methods

`token_size`

`copy_token_to_send_buf`

`copy_msg_to_output_tensor`

`get_type_name`

Provided methods

`src_info_size`

`topk_info_size`

`msg_size`

`src_info_offset`

`topk_info_offset`

`pad_expert_offsets`

`init_smem_resources`

`copy_msg_tile_to_output_tensor`