For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.kimik2_5

Kimi K2.5 mixture-of-experts architecture for text generation.

`KimiK2_5Config`

class max.pipelines.architectures.kimik2_5.KimiK2_5Config(*, devices, dtype, bos_token_id, eos_token_id, ignore_index, media_placeholder_token_id, pad_token_id, tie_word_embeddings, use_unified_vision_chunk, video_placeholder, vision_config, llm_config)

source

Bases: ArchVLConfigWithTextSubconfig, ArchConfigWithKVCache

Configuration for Kimi-K2.5 models.

Parameters:

devices (list[DeviceRef])
dtype (DType)
bos_token_id (int)
eos_token_id (int)
ignore_index (int)
media_placeholder_token_id (int)
pad_token_id (int)
tie_word_embeddings (bool)
use_unified_vision_chunk (bool | None)
video_placeholder (str | None)
vision_config (VisionConfig)
llm_config (KimiK2_5TextConfig)

`bos_token_id`

bos_token_id: int

source

ID of the beginning-of-sequence (BOS) token.

`devices`

devices: list[DeviceRef]

source

Devices that the Kimi-K2.5 model is parallelized over.

`dtype`

dtype: DType

source

DType of the Kimi-K2.5 model weights.

`eos_token_id`

eos_token_id: int

source

ID of the end-of-sequence (EOS) token.

`get_kv_params()`

get_kv_params()

source

Returns the KV cache parameters from the embedded LLM config.

Return type:: KVCacheParamInterface

`get_num_layers()`

static get_num_layers(huggingface_config)

source

Parameters:: huggingface_config (AutoConfig)
Return type:: int

`ignore_index`

ignore_index: int

source

Index that should be ignored when calculating loss (e.g., for padding).

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a Qwen3VLConfig instance from pipeline configuration.

Parameters:

pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
model_config (MAXModelConfig | None)

Returns:

A Qwen3VLConfig instance with fields initialized from config.

Return type:

Self

`initialize_from_config()`

classmethod initialize_from_config(pipeline_config, huggingface_config, llm_config=None)

source

Initializes a KimiK2_5Config from pipeline and HuggingFace configs.

This method creates a config instance with all fields that can be determined from the pipeline and HuggingFace configurations, without needing the state_dict. Fields that depend on the state_dict should be set via the finalize() method.

Parameters:

pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
huggingface_config (AutoConfig) – HuggingFace model configuration.
llm_config (KimiK2_5TextConfig | None) – Pre-initialized DeepseekV3 configuration.

Returns:

A KimiK2_5Config instance ready for finalization.

Return type:

Self

`llm_config`

llm_config: KimiK2_5TextConfig

source

Language model configuration using DeepseekV3 architecture.

`media_placeholder_token_id`

media_placeholder_token_id: int

source

Token ID used as a placeholder for media (e.g., images, video frames) within sequences.

`pad_token_id`

pad_token_id: int

source

Token ID used for padding sequences to uniform length.

`tie_word_embeddings`

tie_word_embeddings: bool

source

Whether to share (tie) the input and output word embeddings in the language model.

`use_unified_vision_chunk`

use_unified_vision_chunk: bool | None

source

Whether to use a unified chunk for vision inputs.

`video_placeholder`

video_placeholder: str | None

source

Placeholder string used to represent video segments in input text.

`vision_config`

vision_config: VisionConfig

source

Vision encoder configuration.

`KimiK2_5Model`

class max.pipelines.architectures.kimik2_5.KimiK2_5Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE, max_batch_size=1)

source

Bases: AlwaysSignalBuffersMixin, MultiGraphPipelineModelWithKVCache[KimiK2_5TextAndVisionContext]

A Kimi-K2.5 pipeline model for multimodal text generation.

Parameters:

pipeline_config (PipelineConfig)
session (InferenceSession)
devices (list[Device])
kv_cache_config (KVCacheConfig)
weights (Weights)
adapter (WeightsAdapter | None)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
max_batch_size (int)

`batch_processor_cls`

batch_processor_cls

source

alias of KimiK2_5BatchProcessor

`execute()`

execute(model_inputs)

source

Executes the graph with the given inputs.

Parameters:: model_inputs (ModelInputs) – The model inputs to execute, containing tensors and any other required data for model execution.
Returns:: ModelOutputs containing the pipeline’s output tensors.
Return type:: ModelOutputs

This is an abstract method that must be implemented by concrete PipelineModels to define their specific execution logic.

`get_kv_params()`

classmethod get_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Returns the KV cache params for the pipeline model.

Delegates to model_config_cls.construct_kv_params(...). Subclasses with custom KV behavior should override this method.

Parameters:

huggingface_config (AutoConfig)
pipeline_config (PipelineConfig)
devices (list[DeviceRef])
kv_cache_config (KVCacheConfig)
cache_dtype (DType)

Return type:

KVCacheParamInterface

`language_model`

language_model: Model

source

The compiled language model for text generation.

`load_model()`

load_model(session)

source

Build, compile, and load vision and language graphs into session.

Parameters:: session (InferenceSession)
Return type:: tuple[Model | None, Model]

`model`

property model: Model

source

Expose language model for graph capture/replay.

Only the language model is captured since vision runs during prefill

`model_config_cls`

model_config_cls

source

alias of KimiK2_5Config

`prepare_initial_token_inputs()`

prepare_initial_token_inputs(replica_batches, kv_cache_inputs=None, return_n_logits=1)

source

Delegates to the batch processor; typed for Eagle subclasses.

Parameters:

replica_batches (Sequence[Sequence[KimiK2_5TextAndVisionContext]])
kv_cache_inputs (KVCacheInputsInterface[Buffer, Buffer] | None)
return_n_logits (int)

Return type:

KimiK2_5ModelInputs

`release()`

release(request_id)

source

Release vision encoder cache entries for a completed request.

Parameters:: request_id (RequestID)
Return type:: None

`vision_model`

vision_model: Model | None

source

The compiled vision model for processing images.

`KimiK2_5ModelInputs`

class max.pipelines.architectures.kimik2_5.KimiK2_5ModelInputs(tokens, input_row_offsets, signal_buffers, host_input_row_offsets, batch_context_lengths, image_token_indices=None, precomputed_image_embeddings=None, pixel_values=None, grid_thws=None, cu_seqlens=None, max_seqlen=None, vision_position_ids=None, language_image_embeddings=<factory>, language_image_token_indices=<factory>, eplb_counter_buffers=<factory>, *, kv_cache_inputs=None, lora=None, hidden_states=None, return_n_logits, data_parallel_splits, ep_inputs=())

source

Bases: DeepseekV3Inputs

A class representing inputs for the KimiK2_5M model.

This class encapsulates the input tensors required for the KimiK2_5M model execution, including both text and vision inputs. Vision inputs are optional and can be None for text-only processing.

Parameters:

tokens (Buffer)
input_row_offsets (Buffer)
signal_buffers (list[Buffer])
host_input_row_offsets (Buffer)
batch_context_lengths (list[Buffer])
image_token_indices (list[Buffer] | None)
precomputed_image_embeddings (list[Buffer] | None)
pixel_values (list[Buffer] | None)
grid_thws (list[Buffer] | None)
cu_seqlens (list[Buffer] | None)
max_seqlen (list[Buffer] | None)
vision_position_ids (list[Buffer] | None)
language_image_embeddings (list[Buffer])
language_image_token_indices (list[Buffer])
eplb_counter_buffers (list[Buffer])
kv_cache_inputs (KVCacheInputsInterface[Buffer, Buffer] | None)
lora (LoRAInputs | None)
hidden_states (Buffer | list[Buffer] | None)
return_n_logits (Buffer)
data_parallel_splits (Buffer)
ep_inputs (tuple[Buffer, ...])

`buffers`

property buffers: tuple[Buffer, ...]

source

Returns the language model input ABI tuple.

`cu_seqlens`

cu_seqlens: list[Buffer] | None = None

source

Cumulative sequence lengths for full attention per device.

`eplb_counter_buffers`

eplb_counter_buffers: list[Buffer]

source

Per-device EP counter buffers for the language model graph.

`grid_thws`

grid_thws: list[Buffer] | None = None

source

Grid dimensions (temporal, height, width) for each image/video, shape (n_images, 3) per device.

`has_vision_inputs`

property has_vision_inputs: bool

source

Check if this input contains vision data.

`image_token_indices`

image_token_indices: list[Buffer] | None = None

source

Per-device pre-computed multimodal merge indices for the image embeddings.

These are the locations of the image_token_id in the inputs fed to the model.

Some indices may be negative, which means that they are ignored by the multimodal merge.

`language_image_embeddings`

language_image_embeddings: list[Buffer]

source

Per-device image embeddings for the language model graph. Shape [0, hidden_size] during decode, [num_patches, hidden_size] during prefill.

`language_image_token_indices`

language_image_token_indices: list[Buffer]

source

Per-device scatter indices for the language model graph. Shape [0] during decode, [num_image_tokens] during prefill.

`max_seqlen`

max_seqlen: list[Buffer] | None = None

source

Maximum sequence length for full attention for vision inputs per device.

`pixel_values`

pixel_values: list[Buffer] | None = None

source

Pixel values for vision inputs.

`precomputed_image_embeddings`

precomputed_image_embeddings: list[Buffer] | None = None

source

Pre-computed image embeddings from VisionEncoderCache.

`vision_position_ids`

vision_position_ids: list[Buffer] | None = None

source

Vision rotary position IDs per device.

`KimiK2_5ReasoningParser`

class max.pipelines.architectures.kimik2_5.KimiK2_5ReasoningParser(think_start_token_id, think_end_token_id, tool_section_start_token_id=None)

source

Bases: ReasoningParser

Kimi K2.5 reasoning parser for … sections.

Per Moonshot’s “Interleaved Thinking” design (see https://platform.moonshot.ai/docs/guide/use-kimi-k2-thinking-model and https://huggingface.co/moonshotai/Kimi-K2.5), a single assistant turn can interleave multiple <think>...</think> blocks with <|tool_calls_section_begin|>...<|tool_calls_section_end|> blocks.

A reasoning span ends on </think> or <|tool_calls_section_begin|>; the model may open the tool-call section directly from inside the prefilled <think> block without a closing </think>. The section marker is left as content rather than consumed as a delimiter, so the tool parser (which only sees content) receives the whole section.

Reasoning may begin implicitly, without an explicit <think> token, when the chat template prefilled the assistant turn already inside a thinking block.

Reasoning can be disabled through the chat template by including a </think> token at the end of the prompt; this is detected by will_reason_after_prompt().

Parameters:

think_start_token_id (int)
think_end_token_id (int)
tool_section_start_token_id (int | None)

`from_tokenizer()`

async classmethod from_tokenizer(tokenizer)

source

Construct a reasoning parser from a tokenizer.

Parameters:: tokenizer (PipelineTokenizer[Any, Any, Any])
Return type:: KimiK2_5ReasoningParser

`reasoning_end_token_id()`

async classmethod reasoning_end_token_id(tokenizer)

source

Returns the </think> token id.

Parameters:: tokenizer (PipelineTokenizer[Any, Any, Any])
Return type:: int | None

`stream()`

stream(delta_token_ids, is_currently_reasoning=True)

source

Identify a reasoning span within a streaming delta chunk.

When is_currently_reasoning=False and the chunk contains no <think> opener, returns an empty span so non-reasoning chunks (turns where the chat template prefilled </think>, or any chunk after reasoning ended in a prior chunk) aren’t misclassified as reasoning.

Parameters:

delta_token_ids (Sequence[int])
is_currently_reasoning (bool)

Return type:

ParsedReasoningDelta

`will_reason_after_prompt()`

will_reason_after_prompt(prompt_token_ids)

source

Predicts whether the model will emit reasoning after this prompt.

Kimi K2.5 chat templates emit <think> to open the new assistant turn’s reasoning section, and </think> to close the prior assistant turn’s reasoning section.

Scan right-to-left and return based on the first delimiter seen:

<think> → reasoning is currently open → True.
</think> (or <|tool_calls_section_begin|>) → reasoning is currently closed → False.
No delimiters at all → reasoning is not in use → False.

Uses the same end-of-reasoning delimiters as stream() so both agree on where reasoning ends.

Parameters:: prompt_token_ids (Sequence[int])
Return type:: bool

`KimiK2_5TextConfig`

class max.pipelines.architectures.kimik2_5.KimiK2_5TextConfig(*, dtype: 'DType', kv_params: 'KVCacheParamInterface', devices: 'list[DeviceRef]', use_subgraphs: 'bool' = True, data_parallel_degree: 'int' = 1, vocab_size: 'int' = 129280, hidden_size: 'int' = 7168, intermediate_size: 'int' = 18432, moe_intermediate_size: 'int' = 2048, moe_layer_freq: 'int' = 1, num_hidden_layers: 'int' = 61, num_attention_heads: 'int' = 128, num_key_value_heads: 'int' = 128, n_shared_experts: 'int' = 1, n_routed_experts: 'int' = 256, routed_scaling_factor: 'float' = 2.5, kv_lora_rank: 'int' = 512, q_lora_rank: 'int' = 1536, qk_rope_head_dim: 'int' = 64, v_head_dim: 'int' = 128, qk_nope_head_dim: 'int' = 128, topk_method: 'str' = 'greedy', n_group: 'int' = 8, topk_group: 'int' = 4, num_experts_per_tok: 'int' = 8, first_k_dense_replace: 'int' = 3, norm_topk_prob: 'bool' = True, hidden_act: 'str' = 'silu', max_position_embeddings: 'int' = 4096, max_seq_len: 'int' = 163840, rms_norm_eps: 'float' = 1e-06, tie_word_embeddings: 'bool' = False, rope_theta: 'float' = 10000.0, rope_scaling: 'dict[str, Any] | None' = None, rope_interleave: 'bool' = True, scoring_func: 'str' = 'sigmoid', attention_bias: 'bool' = False, attention_dropout: 'float' = 0.0, norm_dtype: 'DType' = bfloat16, gate_dtype: 'DType | None' = None, correction_bias_dtype: 'DType | None' = None, max_batch_context_length: 'int' = 131072, quant_config: 'QuantConfig | None' = None, dense_mlp_layers_without_quant: 'frozenset[int]' = frozenset(), ep_config: 'EPConfig | None' = None, graph_mode: 'str' = 'auto', return_logits: 'ReturnLogits' = <ReturnLogits.LAST_TOKEN: 'last_token'>, return_hidden_states: 'ReturnHiddenStates' = <ReturnHiddenStates.NONE: 'none'>, eagle_aux_hidden_state_layer_ids: 'list[int] | None' = None, eplb_profile_enabled: 'bool' = False)

source

Bases: DeepseekV3Config

Parameters:

dtype (DType)
kv_params (KVCacheParamInterface)
devices (list[DeviceRef])
use_subgraphs (bool)
data_parallel_degree (int)
vocab_size (int)
hidden_size (int)
intermediate_size (int)
moe_intermediate_size (int)
moe_layer_freq (int)
num_hidden_layers (int)
num_attention_heads (int)
num_key_value_heads (int)
n_shared_experts (int)
n_routed_experts (int)
routed_scaling_factor (float)
kv_lora_rank (int)
q_lora_rank (int)
qk_rope_head_dim (int)
v_head_dim (int)
qk_nope_head_dim (int)
topk_method (str)
n_group (int)
topk_group (int)
num_experts_per_tok (int)
first_k_dense_replace (int)
norm_topk_prob (bool)
hidden_act (str)
max_position_embeddings (int)
max_seq_len (int)
rms_norm_eps (float)
tie_word_embeddings (bool)
rope_theta (float)
rope_scaling (dict[str, Any] | None)
rope_interleave (bool)
scoring_func (str)
attention_bias (bool)
attention_dropout (float)
norm_dtype (DType)
gate_dtype (DType | None)
correction_bias_dtype (DType | None)
max_batch_context_length (int)
quant_config (QuantConfig | None)
dense_mlp_layers_without_quant (frozenset[int])
ep_config (EPConfig | None)
graph_mode (str)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
eagle_aux_hidden_state_layer_ids (list[int] | None)
eplb_profile_enabled (bool)

`calculate_max_seq_len()`

classmethod calculate_max_seq_len(pipeline_config, huggingface_config, model_config=None)

source

Parameters:

pipeline_config (PipelineConfig)
huggingface_config (AutoConfig)
model_config (MAXModelConfig | None)

Return type:

int

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3Config instance from pipeline configuration.

This method creates a config instance with all fields that can be determined from the pipeline configuration, without needing the state_dict. Fields that depend on the state_dict (like norm_dtype, quant_config, etc.) should be set via the finalize() method.

Parameters:

pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
model_config (MAXModelConfig | None)

Returns:

An initialized DeepseekV3Config instance.

Return type:

Self

`KimiToolParser`

class max.pipelines.architectures.kimik2_5.KimiToolParser

source

Bases: StructuralTagToolParser

Parses Kimi K2.5-style tool calls from model responses.

Kimi K2.5 wraps tool calls in section/call markers and embeds the function name as a compound functions.{name}:{idx} identifier before a dedicated argument-begin marker. Arguments are raw JSON, which the base class can diff directly.

`CALL_BEGIN`

CALL_BEGIN: ClassVar[str] = '<|tool_call_begin|>'

source

`CALL_END`

CALL_END: ClassVar[str] = '<|tool_call_end|>'

source

`SECTION_BEGIN`

SECTION_BEGIN: ClassVar[str] = '<|tool_calls_section_begin|>'

source

`SECTION_END`

SECTION_END: ClassVar[str] = '<|tool_calls_section_end|>'

source

`XGRAMMAR_FORMAT`

XGRAMMAR_FORMAT = 'kimi'

source

`generate_tool_call_grammar()`

static generate_tool_call_grammar(response_format_schema=None, tools=None, tokenizer=None, backend='xgrammar', tool_choice=None, **kwargs)

source

Generates a grammar for constrained decoding of Kimi tool calls.

With the default backend="xgrammar" this returns a serialized xgrammar StructuralTag (which constrains each call’s arguments to that tool’s JSON schema). With backend="llguidance" it returns a Lark grammar whose argument body is freeform.

Kimi K2.5 performs “interleaved thinking”: a single assistant turn can interleave multiple <think>...</think> reasoning blocks with multiple <|tool_calls_section_begin|>...<|tool_calls_section_end|> tool-call sections, and ends the turn with <|im_end|>. The grammar admits up to _MAX_TOOL_CALL_SECTIONS sections, an optional reasoning block before each, and an optional trailing <|im_end|> so the model can stop before the cap.

Structural markers, <think>/</think>, and <|im_end|> are referenced as single-token symbols (<[id]>) resolved from tokenizer — they are atomic special tokens, so the freeform /[\s\S]*/ argument and reasoning bodies terminate cleanly at the closing marker. Reasoning enforced this way is plain text; a mid-reasoning special token is not admitted under forced decoding.

When response_format_schema is provided, the grammar also accepts a JSON response matching the schema (the model’s first tokens select the branch).

Parameters:

response_format_schema (dict[str, Any] | None) – Optional JSON schema dict. When provided, the grammar also accepts a JSON response matching the schema.
tools (list[dict[str, Any]] | None) – Optional list of OpenAI-style tool dicts. None accepts any length-capped identifier as the function name.
tokenizer (PipelineTokenizer[Any, Any, Any] | None) – Pipeline tokenizer used to resolve special-token IDs. Required.
**kwargs (Any) – Ignored; accepts future kwargs.
backend (str)
tool_choice (str | dict[str, Any] | None)
**kwargs

Returns:

A grammar string compatible with the selected backend.

Return type:

str

`VisionConfig`

class max.pipelines.architectures.kimik2_5.VisionConfig(dtype, devices, init_pos_emb_height, init_pos_emb_time, init_pos_emb_width, merge_kernel_size, mm_hidden_size, patch_size, projector_ln_eps, text_hidden_size, vt_hidden_size, vt_intermediate_size, vt_num_attention_heads, vt_num_hidden_layers, merge_type=None, mm_projector_type=None, model_type='', pos_emb_type=None, projector_hidden_act=None, video_attn_type=None, has_bias=True, in_channels=3, rope_max_height=512, rope_max_width=512, rope_theta=10000.0)

source

Bases: object

Vision configuration for Kimi-K2.5 models with required fields.

Parameters:

dtype (DType)
devices (list[DeviceRef])
init_pos_emb_height (int)
init_pos_emb_time (int)
init_pos_emb_width (int)
merge_kernel_size (list[int])
mm_hidden_size (int)
patch_size (int)
projector_ln_eps (float)
text_hidden_size (int)
vt_hidden_size (int)
vt_intermediate_size (int)
vt_num_attention_heads (int)
vt_num_hidden_layers (int)
merge_type (str | None)
mm_projector_type (str | None)
model_type (str)
pos_emb_type (str | None)
projector_hidden_act (str | None)
video_attn_type (str | None)
has_bias (bool)
in_channels (int)
rope_max_height (int)
rope_max_width (int)
rope_theta (float)

`devices`

devices: list[DeviceRef]

source

Devices that the Kimi-K2.5 vision encoder model is parallelized over.

`dtype`

dtype: DType

source

DType of the Kimi-K2.5 vision model weights.

`finalize()`

finalize(vision_dtype)

source

Finalize VisionConfig with state_dict dependent fields.

Parameters:: vision_dtype (DType)
Return type:: None

`has_bias`

has_bias: bool = True

source

Whether linear projections in the vision transformer include bias terms.

`in_channels`

in_channels: int = 3

source

Number of input image channels (3 for RGB).

`init_pos_emb_height`

init_pos_emb_height: int

source

Height of the initial position embedding.

`init_pos_emb_time`

init_pos_emb_time: int

source

Time of the initial position embedding.

`init_pos_emb_width`

init_pos_emb_width: int

source

Width of the initial position embedding.

`initialize_from_config()`

classmethod initialize_from_config(pipeline_config, hf_vision_config, huggingface_config=None)

source

Initialize VisionConfig from HuggingFace vision config.

Parameters:

pipeline_config (PipelineConfig) – MAX Engine pipeline configuration.
hf_vision_config (AutoConfig) – HuggingFace vision sub-config.
huggingface_config (AutoConfig | None) – Full HuggingFace model config, used to derive text_hidden_size from text_config.hidden_size when hf_vision_config does not carry the attribute directly (e.g. moonshotai/Kimi-VL-A3B-Instruct vs nvidia/Kimi-K2.5-NVFP4).

Return type:

VisionConfig

Note: dtype fields will be set to defaults and should be updated via finalize() once state_dict is available.

`merge_kernel_size`

merge_kernel_size: list[int]

source

Kernel size for the merge operation.

`merge_type`

merge_type: str | None = None

source

Type of the merge operation.

`mm_hidden_size`

mm_hidden_size: int

source

Hidden size of the multi-modal hidden layer.

`mm_projector_type`

mm_projector_type: str | None = None

source

Type of the multi-modal projector.

`model_type`

model_type: str = ''

source

Type of the model.

`patch_size`

patch_size: int

source

Size of the patch.

`pos_emb_type`

pos_emb_type: str | None = None

source

Type of the position embedding.

`projector_hidden_act`

projector_hidden_act: str | None = None

source

Activation function for the projector.

`projector_ln_eps`

projector_ln_eps: float

source

Epsilon for the layer normalization.

`rope_max_height`

rope_max_height: int = 512

source

Maximum grid height for RoPE frequency precomputation. Hardcoded to 512 in https://huggingface.co/nvidia/Kimi-K2.5-NVFP4/blob/main/modeling_kimi_k25.py#L571

`rope_max_width`

rope_max_width: int = 512

source

Maximum grid width for RoPE frequency precomputation. Hardcoded to 512 in https://huggingface.co/nvidia/Kimi-K2.5-NVFP4/blob/main/modeling_kimi_k25.py#L571

`rope_theta`

rope_theta: float = 10000.0

source

Base for the RoPE inverse-frequency exponent. Hardcoded to 10000 in https://huggingface.co/nvidia/Kimi-K2.5-NVFP4/blob/main/modeling_kimi_k25.py#L379

`text_hidden_size`

text_hidden_size: int

source

Hidden size of the text hidden layer.

`video_attn_type`

video_attn_type: str | None = None

source

Type of the video attention.

`vt_hidden_size`

vt_hidden_size: int

source

Hidden size of the video hidden layer.

`vt_intermediate_size`

vt_intermediate_size: int

source

Intermediate size of the video hidden layer.

`vt_num_attention_heads`

vt_num_attention_heads: int

source

Number of attention heads of the video hidden layer.

`vt_num_hidden_layers`

vt_num_hidden_layers: int

source

Number of hidden layers of the video hidden layer.

KimiK2_5Config
KimiK2_5Model
KimiK2_5ModelInputs
KimiK2_5ReasoningParser
KimiK2_5TextConfig
- calculate_max_seq_len()
- initialize()
KimiToolParser
VisionConfig

KimiK2_5Config​

bos_token_id​

devices​

dtype​

eos_token_id​

get_kv_params()​

get_num_layers()​

ignore_index​

initialize()​

initialize_from_config()​

llm_config​

media_placeholder_token_id​

pad_token_id​

tie_word_embeddings​

use_unified_vision_chunk​

video_placeholder​

vision_config​

KimiK2_5Model​

batch_processor_cls​

execute()​

get_kv_params()​

language_model​

load_model()​

model​

model_config_cls​

prepare_initial_token_inputs()​

release()​

vision_model​

KimiK2_5ModelInputs​

buffers​

cu_seqlens​

eplb_counter_buffers​

grid_thws​

has_vision_inputs​

image_token_indices​

language_image_embeddings​

language_image_token_indices​

max_seqlen​

pixel_values​

precomputed_image_embeddings​

vision_position_ids​

KimiK2_5ReasoningParser​

from_tokenizer()​

reasoning_end_token_id()​

stream()​

will_reason_after_prompt()​

KimiK2_5TextConfig​

calculate_max_seq_len()​

initialize()​

KimiToolParser​

CALL_BEGIN​

CALL_END​

SECTION_BEGIN​

SECTION_END​

XGRAMMAR_FORMAT​

generate_tool_call_grammar()​

VisionConfig​

devices​

dtype​

finalize()​

has_bias​

in_channels​

init_pos_emb_height​

init_pos_emb_time​

init_pos_emb_width​

initialize_from_config()​

merge_kernel_size​

merge_type​

mm_hidden_size​

mm_projector_type​

model_type​

patch_size​

pos_emb_type​

projector_hidden_act​

projector_ln_eps​

rope_max_height​

rope_max_width​

rope_theta​

text_hidden_size​

video_attn_type​

`KimiK2_5Config`

`bos_token_id`

`devices`

`dtype`

`eos_token_id`

`get_kv_params()`

`get_num_layers()`

`ignore_index`

`initialize()`

`initialize_from_config()`

`llm_config`

`media_placeholder_token_id`

`pad_token_id`

`tie_word_embeddings`

`use_unified_vision_chunk`

`video_placeholder`

`vision_config`

`KimiK2_5Model`

`batch_processor_cls`

`execute()`

`get_kv_params()`

`language_model`

`load_model()`

`model`

`model_config_cls`

`prepare_initial_token_inputs()`

`release()`

`vision_model`

`KimiK2_5ModelInputs`

`buffers`

`cu_seqlens`

`eplb_counter_buffers`

`grid_thws`

`has_vision_inputs`

`image_token_indices`

`language_image_embeddings`

`language_image_token_indices`

`max_seqlen`

`pixel_values`

`precomputed_image_embeddings`

`vision_position_ids`

`KimiK2_5ReasoningParser`

`from_tokenizer()`

`reasoning_end_token_id()`

`stream()`

`will_reason_after_prompt()`

`KimiK2_5TextConfig`

`calculate_max_seq_len()`

`initialize()`

`KimiToolParser`

`CALL_BEGIN`

`CALL_END`

`SECTION_BEGIN`

`SECTION_END`

`XGRAMMAR_FORMAT`

`generate_tool_call_grammar()`

`VisionConfig`

`devices`

`dtype`

`finalize()`

`has_bias`

`in_channels`

`init_pos_emb_height`

`init_pos_emb_time`

`init_pos_emb_width`

`initialize_from_config()`

`merge_kernel_size`

`merge_type`

`mm_hidden_size`

`mm_projector_type`

`model_type`

`patch_size`

`pos_emb_type`

`projector_hidden_act`

`projector_ln_eps`

`rope_max_height`

`rope_max_width`

`rope_theta`

`text_hidden_size`

`video_attn_type`