For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

gumbel_sampling_gpu

def gumbel_sampling_gpu[dtype: DType, out_idx_type: DType, //, TemperatureLayoutType: TensorLayout = Layout[*?, *?], SeedLayoutType: TensorLayout = Layout[*?, *?]](ctx: DeviceContext, input: TileTensor[dtype, Storage=input.Storage, address_space=input.address_space, linear_idx_type=input.linear_idx_type, element_size=input.element_size], out_idxs: TileTensor[out_idx_type, Storage=out_idxs.Storage, address_space=out_idxs.address_space, linear_idx_type=out_idxs.linear_idx_type, element_size=out_idxs.element_size], temperature: Optional[TileTensor[DType.float32, TemperatureLayoutType, ImmutAnyOrigin]] = None, seed: Optional[TileTensor[DType.uint64, SeedLayoutType, ImmutAnyOrigin]] = None)

Gumbel sampling using the Gumbel-max trick for categorical distributions.

Applies Gumbel(0,1) noise to input logits, then selects the argmax. This is mathematically equivalent to sampling from softmax(logits/temperature) but avoids expensive softmax computation.

Args:

ctx (DeviceContext): Device context for GPU operations.
input (TileTensor[dtype, Storage=input.Storage, address_space=input.address_space, linear_idx_type=input.linear_idx_type, element_size=input.element_size]): Input logits tensor [batch, vocab_size].
out_idxs (TileTensor[out_idx_type, Storage=out_idxs.Storage, address_space=out_idxs.address_space, linear_idx_type=out_idxs.linear_idx_type, element_size=out_idxs.element_size]): Output tensor for sampled indices [batch, 1].
temperature (Optional[TileTensor[DType.float32, TemperatureLayoutType, ImmutAnyOrigin]]): Optional per-token temperature scaling [batch].
seed (Optional[TileTensor[DType.uint64, SeedLayoutType, ImmutAnyOrigin]]): Optional per-token random seeds [batch] for reproducibility.