IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

fused_token_sampling_cpu

def fused_token_sampling_cpu[dtype: DType, out_idx_type: DType, KLayoutType: TensorLayout = Layout[*?, *?], TemperatureLayoutType: TensorLayout = Layout[*?, *?], TopPLayoutType: TensorLayout = Layout[*?, *?], SeedLayoutType: TensorLayout = Layout[*?, *?]](max_k: Int, input: TileTensor[dtype, address_space=input.address_space, linear_idx_type=input.linear_idx_type, element_size=input.element_size], out_idxs: TileTensor[out_idx_type, address_space=out_idxs.address_space, linear_idx_type=out_idxs.linear_idx_type, element_size=out_idxs.element_size], k: Optional[TileTensor[DType.int64, KLayoutType, ImmutAnyOrigin]] = None, temperature: Optional[TileTensor[DType.float32, TemperatureLayoutType, ImmutAnyOrigin]] = None, top_p: Optional[TileTensor[DType.float32, TopPLayoutType, ImmutAnyOrigin]] = None, seed: Optional[TileTensor[DType.uint64, SeedLayoutType, ImmutAnyOrigin]] = None)

Generalized implementation of the Top K algorithm with sampling. Returns the sampled index from the innermost dimension of the input tensor for each row/subvolume.

Parameters:

  • ​dtype (DType): Data type of the input buffer.
  • ​out_idx_type (DType): Data type of the output indices.
  • ​KLayoutType (TensorLayout): Layout type of the k buffer.
  • ​TemperatureLayoutType (TensorLayout): Layout type of the temperature buffer.
  • ​TopPLayoutType (TensorLayout): Layout type of the top_p buffer.
  • ​SeedLayoutType (TensorLayout): Layout type of the seed buffer.

Args: