Python module
max.pipelines.architectures.olmo_modulev3
OLMo transformer architecture for text generation.
OlmoConfig
class max.pipelines.architectures.olmo_modulev3.OlmoConfig(*, hidden_size, num_attention_heads, num_key_value_heads, num_hidden_layers, rope_theta, rope_scaling_params, max_seq_len, intermediate_size, interleaved_rope_weights, vocab_size, dtype, kv_params, return_logits=ReturnLogits.LAST_TOKEN, norm_method='rms_norm', attention_bias=False, rms_norm_eps=None, tie_word_embeddings=False, stacked_mlp=False, stacked_qkv=False, attention_multiplier, embedding_multiplier, residual_multiplier, devices, clip_qkv=None, norm_elementwise_affine=False, longrope_scaling_params=None, logits_scaling=1.0, return_hidden_states=ReturnHiddenStates.NONE)
Bases: Llama3Config
Model configuration for Olmo graph construction/execution.
-
Parameters:
-
- hidden_size (int)
- num_attention_heads (int)
- num_key_value_heads (int)
- num_hidden_layers (int)
- rope_theta (float)
- rope_scaling_params (Llama3RopeScalingParams | None)
- max_seq_len (int)
- intermediate_size (int)
- interleaved_rope_weights (bool)
- vocab_size (int)
- dtype (DType)
- kv_params (KVCacheParams)
- return_logits (ReturnLogits)
- norm_method (Literal['rms_norm', 'layer_norm'])
- attention_bias (bool)
- rms_norm_eps (float | None)
- tie_word_embeddings (bool)
- stacked_mlp (bool)
- stacked_qkv (bool)
- attention_multiplier (float)
- embedding_multiplier (float)
- residual_multiplier (float)
- devices (list[DeviceRef])
- clip_qkv (float | None)
- norm_elementwise_affine (bool)
- longrope_scaling_params (LongRoPEScalingParams | None)
- logits_scaling (float)
- return_hidden_states (ReturnHiddenStates)
finalize()
finalize(huggingface_config, state_dict, return_logits, return_hidden_states=ReturnHiddenStates.NONE, norm_method='rms_norm', attention_bias=False)
Define parameters that can’t be determined just from the pipeline config.
-
Parameters:
-
- huggingface_config (AutoConfig)
- state_dict (dict[str, WeightData])
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
- norm_method (Literal['rms_norm', 'layer_norm'])
- attention_bias (bool)
-
Return type:
-
None
norm_elementwise_affine
norm_elementwise_affine: bool = False
OlmoModel
class max.pipelines.architectures.olmo_modulev3.OlmoModel(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE)
Bases: Llama3Model
Olmo pipeline model implementation.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
config_class
config_class
alias of OlmoConfig
norm_method
norm_method: Literal['rms_norm'] | Literal['layer_norm'] = 'layer_norm'
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!