Skip to main content

Python class

StandaloneSpeculativeDecodingPipeline

StandaloneSpeculativeDecodingPipeline

final class max.pipelines.lib.StandaloneSpeculativeDecodingPipeline(pipeline_config, pipeline_model, eos_token_id, weight_adapters, tokenizer, draft_pipeline_model=None, draft_weight_adapters=None)

source

Bases: SpeculativeDecodingPipelineBase

Standalone speculative decoding where draft model runs independently.

In this approach, the draft model generates tokens without any information from the target model, then the target model verifies these tokens.

Parameters:

execute()

execute(inputs)

source

Executes standalone speculative decoding.

In standalone mode:

  1. Draft model generates tokens independently
  2. Target model verifies draft tokens
  3. Apply rejection sampling to accept/reject tokens

Parameters:

inputs (TextGenerationInputs[TextContext])

Return type:

dict[RequestID, TextGenerationOutput]

generate_draft_tokens()

generate_draft_tokens(batch, num_steps, model_inputs)

source

Generates draft tokens for the batch using the draft model.

Parameters:

Return type:

tuple[int, Buffer, Buffer, ModelInputs, Buffer | None]

prepare_batch()

prepare_batch(model, batch, replica_batches, return_n_logits, is_draft=False, draft_inputs=None, merged_draft_tokens=None, merged_draft_offsets=None)

source

Prepares batch inputs and KV cache for draft or target model.

Parameters:

Return type:

tuple[ModelInputs, int]

spec_decode_metrics()

spec_decode_metrics()

source

Returns the draft token acceptance metrics for speculative decoding.

Return type:

SpeculativeDecodingMetrics

verify_draft_tokens_with_target_model()

verify_draft_tokens_with_target_model(draft_inputs, context_batch, replica_batches, num_draft_tokens_generated, draft_tokens, draft_logits, merged_draft_tokens, merged_draft_offsets, all_draft_logits)

source

Verifies draft tokens against the target model and returns merged outputs.

Parameters:

Return type:

tuple[Buffer, Buffer, Buffer | None]