Python class
LongRoPERotaryEmbedding
LongRoPERotaryEmbeddingβ
class max.nn.LongRoPERotaryEmbedding(dim, n_heads, theta, max_seq_len, head_dim=None, _freqs_cis=None, interleaved=True, scaling_params=None)
Bases: RotaryEmbedding
Applies RoPE with LongRoPE scaling for Phi-3.5 models.
Uses a stitched frequency table where positions up to
original_max_position use short_factor scaling and positions
beyond use long_factor scaling.
-
Parameters:
-
- dim (int) β The modelβs hidden dimension.
- n_heads (int) β The number of attention heads.
- theta (float) β The base for computing frequencies. A common value is
10000.0. - max_seq_len (int) β The maximum sequence length for model input.
- head_dim (int) β The per-head dimension. Defaults to
dim // n_headsifNone. - _freqs_cis (Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray | None) β A pre-computed frequency tensor. Defaults to
None. - interleaved (bool) β Whether to apply RoPE using interleaved complex
representation. Defaults to
True. - scaling_params (LongRoPEScalingParams | None) β The LongRoPE scaling configuration. Defaults to
None, in which case standard RoPE is used.
compute_scale()β
compute_scale(user_scale=None)
Returns the attention scale factor with LongRoPE adjustment.
Applies a logarithmic attention factor when the context length exceeds the original training maximum.
freqs_cis_base()β
freqs_cis_base()
Computes the frequency cosine-sine tensor with LongRoPE scaling.
Creates a stitched table where:
- Positions 0 to
original_max_positionuseshort_factorscaling. - Positions from
original_max_positiononward uselong_factorscaling.
-
Returns:
-
The frequency tensor with shape
(max_seq_len * 2, head_dim // 2, 2). -
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!