IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

TopKTopPSamplingFromProbKernel

def TopKTopPSamplingFromProbKernel[ProbsLayoutType: TensorLayout, probs_origin: ImmutOrigin, OutputLayoutType: TensorLayout, output_origin: MutOrigin, block_size: Int, vec_size: Int, dtype: DType, out_idx_type: DType, deterministic: Bool](probs: TileTensor[dtype, ProbsLayoutType, probs_origin], output: TileTensor[out_idx_type, OutputLayoutType, output_origin], indices: Optional[UnsafePointer[Scalar[out_idx_type], ImmutAnyOrigin]], top_k_arr: Optional[UnsafePointer[Scalar[out_idx_type], ImmutAnyOrigin]], top_k_val: Int, top_p_arr: Optional[UnsafePointer[Float32, ImmutAnyOrigin]], top_p_val: Float32, d: Int, rng_seed: Optional[UnsafePointer[UInt64, ImmutAnyOrigin]], rng_offset: UInt64)

Kernel for joint top-k + top-p sampling from probability distribution.

Identical to TopKSamplingFromProbKernel but additionally enforces a nucleus (top-p) constraint: a token is accepted only when both the count of tokens above the pivot is less than k AND the cumulative probability of those tokens is less than p.

When top_p_val = 1.0 and top_p_arr is null, this degrades to top-k-only with zero overhead since sum < 1.0 is always true.

Args: