For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python module
max.pipelines.architectures.mpnet_modulev3
MPNet sentence transformer architecture for embeddings generation.
MPNetConfigβ
class max.pipelines.architectures.mpnet_modulev3.MPNetConfig(*, pool_embeddings, huggingface_config, max_seq_len)
Bases: ArchConfigWithBoundedMaxSeqLen, ArchConfig
Configuration for MPNet V3 models.
huggingface_configβ
huggingface_config: AutoConfig
initialize()β
classmethod initialize(pipeline_config, model_config=None)
Initialize the config from a PipelineConfig.
-
Parameters:
-
- pipeline_config (PipelineConfig) β The pipeline configuration.
- model_config (MAXModelConfig | None) β The model configuration to read from. When
None(the default),pipeline_config.modelis used. Pass an explicit config (e.g.pipeline_config.draft_model) to initialize the arch config for a different model.
-
Return type:
max_seq_lenβ
max_seq_len: int
pool_embeddingsβ
pool_embeddings: bool
MPNetInputsβ
class max.pipelines.architectures.mpnet_modulev3.MPNetInputs(next_tokens_batch, attention_mask, *, kv_cache_inputs=None, lora_ids=None, lora_ranks=None, hidden_states=None)
Bases: ModelInputs
Input tensors for the MPNet model.
-
Parameters:
attention_maskβ
attention_mask: Buffer
next_tokens_batchβ
next_tokens_batch: Buffer
MPNetPipelineModelβ
class max.pipelines.architectures.mpnet_modulev3.MPNetPipelineModel(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL)
Bases: PipelineModel[TextContext]
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
execute()β
execute(model_inputs)
Executes the graph with the given inputs.
-
Parameters:
-
model_inputs (ModelInputs) β The model inputs to execute, containing tensors and any other required data for model execution.
-
Returns:
-
ModelOutputs containing the pipelineβs output tensors.
-
Return type:
This is an abstract method that must be implemented by concrete PipelineModels to define their specific execution logic.
load_model()β
load_model()
model_config_clsβ
model_config_cls
alias of MPNetConfig
prepare_initial_token_inputs()β
prepare_initial_token_inputs(replica_batches, kv_cache_inputs=None, return_n_logits=1)
Prepares the initial inputs to be passed to execute().
The inputs and functionality can vary per model. For example, model
inputs could include encoded tensors, unique IDs per tensor when using
a KV cache manager, and kv_cache_inputs (or None if the model does
not use KV cache). This method typically batches encoded tensors,
claims a KV cache slot if needed, and returns the inputs and caches.
-
Parameters:
-
- replica_batches (Sequence[Sequence[TextContext]])
- kv_cache_inputs (KVCacheInputs[Buffer, Buffer] | None)
- return_n_logits (int)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!