Python module
rotary_embedding
The rope embedding used within the model.
DeepseekYarnRopeScalingParams
class max.nn.rotary_embedding.DeepseekYarnRopeScalingParams(scaling_factor: float, original_max_position_embeddings: int, beta_fast: int, beta_slow: int, mscale: float, mscale_all_dim: float)
-
Parameters:
beta_fast
beta_fast: int
Fast interpolation rate.
beta_slow
beta_slow: int
Slow interpolation rate.
mscale
mscale: float
Scaling factor for middle frequencies.
mscale_all_dim
mscale_all_dim: float
Scaling factor applied to all dimensions.
original_max_position_embeddings
original_max_position_embeddings: int
Original maximum sequence length during training.
scaling_factor
scaling_factor: float
Scaling factor for frequency interpolation.
DeepseekYarnRotaryEmbedding
class max.nn.rotary_embedding.DeepseekYarnRotaryEmbedding(dim, n_heads, theta, max_seq_len, device, head_dim=None, _freqs_cis=None, interleaved=True, scaling_params=None)
Deepseek’s YaRN (Yet another RoPE eNhancement) Rotary Position Embedding layer.
Unlike Llama3RotaryEmbedding, the dim argument here is the rope dimension of the model, not the hidden dimension.
-
Parameters:
compute_scale()
compute_scale(user_scale=None)
freqs_cis_base()
freqs_cis_base()
Computes the frequency tensor for complex exponentials (cis) for a given seq_len. Tensor is scaled with theta parameter. Required to apply Rotary Position Embedding (RoPE) to tensor. See ‘Roformer: Enhanced Transformer with Rotary Embedding’ (arxiv.org/pdf/2104.09864).
-
Returns:
-
The frequency tensor for complex exponentials with shape (max_seq_len, rope_dim // 2, 2)
-
Return type:
scaling_params
scaling_params: DeepseekYarnRopeScalingParams | None = None
LinearScalingParams
class max.nn.rotary_embedding.LinearScalingParams(factor: float)
-
Parameters:
-
factor (
float
)
factor
factor: float
Main scaling factor for the frequency components of the rope.
Llama3RopeScalingParams
class max.nn.rotary_embedding.Llama3RopeScalingParams(factor: float, low_freq_factor: float, high_freq_factor: float, orig_max_position: int)
-
Parameters:
factor
factor: float
Main scaling factor for the frequency components of the rope.
high_freq_factor
high_freq_factor: float
Factor to scale the high frequency components of the rope.
low_freq_factor
low_freq_factor: float
Factor to scale the low frequency components of the rope.
orig_max_position
orig_max_position: int
The original maximum position length supported by the model.
Llama3RotaryEmbedding
class max.nn.rotary_embedding.Llama3RotaryEmbedding(dim, n_heads, theta, max_seq_len, device, head_dim=None, _freqs_cis=None, interleaved=True, scaling_params=None)
RotaryEmbedding for Llama3 that takes rope scaling into account.
-
Parameters:
scaling_params
scaling_params: Llama3RopeScalingParams | None = None
Scaling parameters to enable llama to function with a longer context length.
RotaryEmbedding
class max.nn.rotary_embedding.RotaryEmbedding(dim, n_heads, theta, max_seq_len, device, head_dim=None, _freqs_cis=None, interleaved=True)
RotaryEmbedding layer to calculate and apply the frequency tensor for complex exponentials.
-
Parameters:
compute_scale()
compute_scale(user_scale=None)
device
device: DeviceRef
dim
dim: int
freqs_cis
property freqs_cis: TensorValue
freqs_cis_base()
freqs_cis_base()
Computes the frequency tensor for complex exponentials (cis) for a given seq_len. Tensor is scaled with theta parameter. Required to apply Rotary Position Embedding (RoPE) to tensor. See ‘Roformer: Enhanced Transformer with Rotary Embedding’ (arxiv.org/pdf/2104.09864).
-
Returns:
-
The frequency tensor for complex exponentials with shape (max_seq_len * 2, head_dim / 2, 2)
-
Return type:
head_dim
head_dim = dim // n_heads if not specified in the config.
interleaved
interleaved: bool = True
max_seq_len
max_seq_len: int
The maximum sequence length for model’s input.
n_heads
n_heads: int
theta
theta: float
Hyperparameter used to control the frequency scaling of the sinusoidal components of the embeddings.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!