For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.olmo_modulev3

OLMo transformer architecture for text generation.

`OlmoConfig`

class max.pipelines.architectures.olmo_modulev3.OlmoConfig(*, hidden_size, num_attention_heads, num_key_value_heads, num_hidden_layers, rope_theta, rope_scaling_params, max_seq_len, intermediate_size, interleaved_rope_weights, vocab_size, dtype, kv_params, return_logits=ReturnLogits.LAST_TOKEN, norm_method='rms_norm', attention_bias=False, rms_norm_eps=None, tie_word_embeddings=False, stacked_mlp=False, stacked_qkv=False, attention_multiplier, embedding_multiplier, residual_multiplier, devices, clip_qkv=None, norm_elementwise_affine=False, longrope_scaling_params=None, logits_scaling=1.0, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: Llama3Config

Model configuration for Olmo graph construction/execution.

Parameters:

hidden_size (int)
num_attention_heads (int)
num_key_value_heads (int)
num_hidden_layers (int)
rope_theta (float)
rope_scaling_params (Llama3RopeScalingParams | None)
max_seq_len (int)
intermediate_size (int)
interleaved_rope_weights (bool)
vocab_size (int)
dtype (DType)
kv_params (KVCacheParams)
return_logits (ReturnLogits)
norm_method (Literal['rms_norm', 'layer_norm'])
attention_bias (bool)
rms_norm_eps (float | None)
tie_word_embeddings (bool)
stacked_mlp (bool)
stacked_qkv (bool)
attention_multiplier (float)
embedding_multiplier (float)
residual_multiplier (float)
devices (list[DeviceRef])
clip_qkv (float | None)
norm_elementwise_affine (bool)
longrope_scaling_params (LongRoPEScalingParams | None)
logits_scaling (float)
return_hidden_states (ReturnHiddenStates)

`finalize()`

finalize(huggingface_config, state_dict, return_logits, return_hidden_states=ReturnHiddenStates.NONE, norm_method='rms_norm', attention_bias=False)

source

Define parameters that can’t be determined just from the pipeline config.

Parameters:

huggingface_config (AutoConfig)
state_dict (dict[str, WeightData])
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
norm_method (Literal['rms_norm', 'layer_norm'])
attention_bias (bool)

Return type:

None

`norm_elementwise_affine`

norm_elementwise_affine: bool = False

source

`OlmoModel`

class max.pipelines.architectures.olmo_modulev3.OlmoModel(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: Llama3Model

Olmo pipeline model implementation.

Parameters:

pipeline_config (PipelineConfig)
session (InferenceSession)
devices (list[Device])
kv_cache_config (KVCacheConfig)
weights (Weights)
adapter (WeightsAdapter | None)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)

`config_class`

config_class

source

alias of OlmoConfig

`norm_method`

norm_method: Literal['rms_norm'] | Literal['layer_norm'] = 'layer_norm'

source

OlmoConfig​

finalize()​

norm_elementwise_affine​

OlmoModel​

config_class​

norm_method​

`OlmoConfig`

`finalize()`

`norm_elementwise_affine`

`OlmoModel`

`config_class`

`norm_method`