Mojo module
learnable_2d_interp_pos_emb
Learnable 2D interpolated position embedding (Kimi K2.5 MoonViT3d).
Equivalent to Learnable2DInterpPosEmbDivided_fixed.forward() from
nvidia/Kimi-K2.5-NVFP4/modeling_kimi_k25.py.
For each video described by grid_thws:
- Bicubic-interpolates the learnable 2D weight grid from (H, W) to (h, w).
When
(h, w) == (H, W)the grid is used directly. - If
t > 1adds a 1D sincos temporal embedding per frame. - Adds the result element-wise to
x.
Tensor layout (all row-major):
x: (L, dim) patch embeddingsweight: (H, W, dim) learnable 2D gridgrid_thws: (N, 3) int64 per-video (t, h, w)time_weight: (num_frames, dim) float32 1D sincos temporal embeddingoutput: (L, dim) x + interpolated position embedding
Functions
-
learnable_2d_interp_pos_emb: Applies learnable 2D interpolated position embedding on GPU.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!