IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

FusedSamplingProcessor

FusedSamplingProcessor​

class max.pipelines.sampling.FusedSamplingProcessor(sampler, pipeline_config, context_batch, num_steps, device, bitmask=None, vocab_size=None, pinned_new_tokens=None, identity_logit_offsets=None)

source

Bases: object

Applies sampling parameters to logits and stores the chosen tokens.

Parameters:

allocate_identity_logit_offsets()​

static allocate_identity_logit_offsets(pipeline_config, device)

source

Returns a preallocated [0, 1, ..., max_batch_size] index buffer.

Used by logits_for_sampling when sampling from next_token_logits with a variable-logit sampler. Returns None when the buffer is not needed (variable-logit sampling disabled, or running in virtual-device mode).

Parameters:

Return type:

Buffer | None

generated_tokens​

generated_tokens: Buffer

source

The generated tokens that have been sampled so far.

get_new_tokens_numpy()​

get_new_tokens_numpy()

source

Wait for D2H copy and return the new tokens as numpy array.

If async copy was started via start_async_token_copy(), this waits for the copy event. Otherwise, falls back to synchronous copy.

Returns:

Numpy array of the new tokens with shape (batch_size,).

Return type:

ndarray[tuple[Any, …], dtype[int64]]

logits_for_sampling()​

logits_for_sampling(*, logits, next_token_logits, logit_offsets)

source

Returns the logits and offsets to pass to logits processors.

Parameters:

Return type:

tuple[Buffer, Buffer | None]

new_tokens​

new_tokens: Buffer | None = None

source

The new tokens that were sampled.

start_async_token_copy()​

start_async_token_copy()

source

Start D2H copy of new_tokens to pinned buffer on the default stream.

The copy happens on the default stream after sampling completes. We record an event after the copy so get_new_tokens_numpy() can wait for just the copy without waiting for subsequent GPU operations (like the next forward pass).

Return type:

None

update_bitmask()​

update_bitmask(packed_bitmask)

source

Update the GPU bitmask with new FSM state for multi-step execution.

This method unpacks the packed-int bitmask from llguidance, copies it to the pinned host buffer, and transfers it to the GPU. This keeps the bitmask synchronized with the FSM state after each token is sampled.

Parameters:

packed_bitmask (ndarray[tuple[Any, ...], dtype[int32]]) – Packed int32 bitmask from llguidance.numpy.allocate_token_bitmask. Shape is [batch_size, ceil(vocab_size/32)].

Return type:

None