For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

fused_token_sampling_gpu

def fused_token_sampling_gpu[dtype: DType, out_idx_type: DType, //, KLayoutType: TensorLayout = Layout[*?, *?], TemperatureLayoutType: TensorLayout = Layout[*?, *?], TopPLayoutType: TensorLayout = Layout[*?, *?], MinPLayoutType: TensorLayout = Layout[*?, *?], SeedLayoutType: TensorLayout = Layout[*?, *?]](ctx: DeviceContext, max_k: Int, min_top_p: Float32, input: TileTensor[dtype, address_space=input.address_space, linear_idx_type=input.linear_idx_type, element_size=input.element_size], out_idxs: TileTensor[out_idx_type, address_space=out_idxs.address_space, linear_idx_type=out_idxs.linear_idx_type, element_size=out_idxs.element_size], block_size: Optional[Int] = None, num_blocks_per_input: Optional[Int] = None, k: Optional[TileTensor[DType.int64, KLayoutType, ImmutAnyOrigin]] = None, temperature: Optional[TileTensor[DType.float32, TemperatureLayoutType, ImmutAnyOrigin]] = None, top_p: Optional[TileTensor[DType.float32, TopPLayoutType, ImmutAnyOrigin]] = None, min_p: Optional[TileTensor[DType.float32, MinPLayoutType, ImmutAnyOrigin]] = None, seed: Optional[TileTensor[DType.uint64, SeedLayoutType, ImmutAnyOrigin]] = None)

Top K algorithm with fused sampling. Returns the sampled indices from the Top-K of the innermost dimension of the input tensor for each row/subvolume.