Python module
max.pipelines.architectures.qwen3_embedding
Qwen3 architecture for embeddings generation.
Qwen3EmbeddingConfig
class max.pipelines.architectures.qwen3_embedding.Qwen3EmbeddingConfig(*, pipeline_config)
Bases: ArchConfig
Qwen3 embedding model configuration.
-
Parameters:
-
pipeline_config (PipelineConfig)
get_max_seq_len()
get_max_seq_len()
Returns the default maximum sequence length for the model.
Subclasses should determine whether this value can be overridden by
setting the --max-length (pipeline_config.model.max_length) flag.
-
Return type:
initialize()
classmethod initialize(pipeline_config, model_config=None)
Initialize the config from a PipelineConfig.
-
Parameters:
-
- pipeline_config (PipelineConfig) – The pipeline configuration.
- model_config (MAXModelConfig | None) – The model configuration to read from. When
None(the default),pipeline_config.modelis used. Pass an explicit config (e.g.pipeline_config.draft_model) to initialize the arch config for a different model.
-
Return type:
pipeline_config
pipeline_config: PipelineConfig
Qwen3EmbeddingInputs
class max.pipelines.architectures.qwen3_embedding.Qwen3EmbeddingInputs(tokens, input_row_offsets, return_n_logits, *, kv_cache_inputs=None, lora_ids=None, lora_ranks=None, hidden_states=None)
Bases: ModelInputs
Input structure for Qwen3 embedding models.
-
Parameters:
input_row_offsets
input_row_offsets: Buffer
Row offsets for ragged tensors [batch_size + 1]
return_n_logits
return_n_logits: Buffer
Number of logits to return (kept for interface compatibility)
tokens
tokens: Buffer
Input token IDs [total_seq_len]
Qwen3EmbeddingModel
class max.pipelines.architectures.qwen3_embedding.Qwen3EmbeddingModel(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL)
Bases: PipelineModel[TextContext]
Qwen3 embedding pipeline model without KV caching.
This model is optimized for embedding generation with:
- No KV cache overhead
- Single-pass forward computation
- Flash attention without cache operations
- Last token pooling with L2 normalization
Initialize the Qwen3 embedding pipeline model.
-
Parameters:
-
- pipeline_config (PipelineConfig) – Pipeline configuration
- session (InferenceSession) – Inference session
- devices (list[Device]) – List of devices
- kv_cache_config (KVCacheConfig) – KV cache configuration
- weights (Weights) – Model weights
- adapter (WeightsAdapter | None) – Optional weight adapter
- return_logits (ReturnLogits) – Return logits mode
attention_bias
attention_bias: bool = False
Whether to use attention bias.
calculate_max_seq_len()
classmethod calculate_max_seq_len(pipeline_config, huggingface_config)
Calculate maximum sequence length.
-
Parameters:
-
- pipeline_config (PipelineConfig) – Pipeline configuration
- huggingface_config (AutoConfig) – HuggingFace configuration
-
Returns:
-
Maximum sequence length
-
Return type:
execute()
execute(model_inputs)
Execute the model.
-
Parameters:
-
model_inputs (ModelInputs) – Model inputs
-
Returns:
-
Model outputs with embeddings in the logits field
-
Return type:
model
model: Model
Compiled and initialized model.
norm_method
norm_method: Literal['rms_norm'] | Literal['layer_norm'] = 'rms_norm'
Normalization method.
prepare_initial_token_inputs()
prepare_initial_token_inputs(replica_batches, kv_cache_inputs=None, return_n_logits=1)
Prepare initial inputs for embedding generation.
-
Parameters:
-
- replica_batches (Sequence[Sequence[TextContext]]) – Batches of text contexts
- kv_cache_inputs (KVCacheInputs[Buffer, Buffer] | None) – Ignored (no KV cache for embeddings)
- return_n_logits (int) – Number of logits (ignored for embeddings)
-
Returns:
-
Prepared inputs
-
Return type:
prepare_next_token_inputs()
prepare_next_token_inputs(next_tokens, prev_model_inputs)
Prepare next token inputs (not supported for embedding models).
-
Parameters:
-
- next_tokens (Buffer) – Next tokens
- prev_model_inputs (ModelInputs) – Previous inputs
-
Raises:
-
NotImplementedError – Embedding models don’t support autoregressive generation
-
Return type:
state_dict
Model weights.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!