For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

topk_sampling_from_prob

def topk_sampling_from_prob[dtype: DType, out_idx_type: DType, block_size: Int = Int(1024), TopKArrLayoutType: TensorLayout = Layout[*?, *?], IndicesLayoutType: TensorLayout = Layout[*?, *?]](ctx: DeviceContext, probs: TileTensor[dtype, Storage=probs.Storage, address_space=probs.address_space, linear_idx_type=probs.linear_idx_type, element_size=probs.element_size], output: TileTensor[out_idx_type, Storage=output.Storage, address_space=output.address_space, linear_idx_type=output.linear_idx_type, element_size=output.element_size], top_k_val: Int, deterministic: Bool = False, rng_seed: UInt64 = UInt64(0), rng_offset: UInt64 = UInt64(0), indices: Optional[TileTensor[out_idx_type, IndicesLayoutType, MutUntrackedOrigin]] = None, top_k_arr: Optional[TileTensor[out_idx_type, TopKArrLayoutType, MutUntrackedOrigin]] = None)

Top-K sampling from probability distribution.

Performs stochastic sampling from a probability distribution, considering only the top-k most probable tokens. Uses rejection sampling with ternary search to efficiently find appropriate samples.

Args:

ctx (DeviceContext): Device context for kernel execution.
probs (TileTensor[dtype, Storage=probs.Storage, address_space=probs.address_space, linear_idx_type=probs.linear_idx_type, element_size=probs.element_size]): Input probability distribution [batch_size, d].
output (TileTensor[out_idx_type, Storage=output.Storage, address_space=output.address_space, linear_idx_type=output.linear_idx_type, element_size=output.element_size]): Output sampled indices [batch_size].
top_k_val (Int): Default top-k value (number of top tokens to consider).
deterministic (Bool): Whether to use deterministic sampling.
rng_seed (UInt64): Random seed for Random number generator.
rng_offset (UInt64): Random offset for Random number generator.
indices (Optional[TileTensor[out_idx_type, IndicesLayoutType, MutUntrackedOrigin]]): Optional row indices for batch indexing [batch_size].
top_k_arr (Optional[TileTensor[out_idx_type, TopKArrLayoutType, MutUntrackedOrigin]]): Optional per-row top-k values [batch_size].

Raises:

Error: If tensor ranks or shapes are invalid.