For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

min_p_sampling_gpu

def min_p_sampling_gpu[dtype: DType, out_idx_type: DType, //, _test_sort: Bool = False](ctx: DeviceContext, min_ps: TileTensor[dtype, Storage=min_ps.Storage, linear_idx_type=min_ps.linear_idx_type, element_size=min_ps.element_size], input_logits: TileTensor[dtype, Storage=input_logits.Storage, linear_idx_type=input_logits.linear_idx_type, element_size=input_logits.element_size], out_token_ids: TileTensor[out_idx_type, Storage=out_token_ids.Storage, linear_idx_type=out_token_ids.linear_idx_type, element_size=out_token_ids.element_size], temperature: Scalar[dtype] = 1)

GPU implementation of Min-P sampling for token selection. This function applies temperature scaling, softmax, a radix sort, and then samples tokens based on the calculated probability threshold (Min-P).