Skip to main content

Mojo function

TopKTopPSamplingFromProbKernel

TopKTopPSamplingFromProbKernel[ProbsLayoutType: TensorLayout, probs_origin: ImmutOrigin, OutputLayoutType: TensorLayout, output_origin: MutOrigin, block_size: Int, vec_size: Int, dtype: DType, out_idx_type: DType, deterministic: Bool](probs: TileTensor[dtype, ProbsLayoutType, probs_origin], output: TileTensor[out_idx_type, OutputLayoutType, output_origin], indices: Optional[UnsafePointer[Scalar[out_idx_type], ImmutAnyOrigin]], top_k_arr: Optional[UnsafePointer[Scalar[out_idx_type], ImmutAnyOrigin]], top_k_val: Int, top_p_arr: Optional[UnsafePointer[Float32, ImmutAnyOrigin]], top_p_val: Float32, d: Int, rng_seed: Optional[UnsafePointer[UInt64, ImmutAnyOrigin]], rng_offset: UInt64)

Kernel for joint top-k + top-p sampling from probability distribution.

Identical to TopKSamplingFromProbKernel but additionally enforces a nucleus (top-p) constraint: a token is accepted only when both the count of tokens above the pivot is less than k AND the cumulative probability of those tokens is less than p.

When top_p_val = 1.0 and top_p_arr is null, this degrades to top-k-only with zero overhead since sum < 1.0 is always true.

Args: