IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

MemoryEstimator

MemoryEstimator​

class max.pipelines.MemoryEstimator

source

Bases: object

Estimates available memory for pipeline model allocation.

available_kv_cache_memory()​

classmethod available_kv_cache_memory(model_weights_size, activation_memory_size, model_config, devices, signal_buffer_size=0)

source

Estimates available KV cache memory after model weights, activations, and signal buffers.

Parameters:

  • model_weights_size (int) – Size of model weights.
  • activation_memory_size (int) – Size of activation memory.
  • model_config (MAXModelConfig) – The model configuration.
  • devices (list[Device]) – The list of devices on which the model will run.
  • signal_buffer_size (int) – Size of P2P signal buffers. Defaults to 0.

Returns:

Available KV cache memory in bytes.

Return type:

int

estimate_memory_footprint()​

classmethod estimate_memory_footprint(pipeline_config, model_config, arch_config, devices, model_weights_size, activation_memory_size, signal_buffer_size=0, arch=None)

source

Estimates memory footprint and validates max_length/max_batch_size fit.

Parameters:

Return type:

None

free_memory()​

classmethod free_memory(devices)

source

Returns the total free memory available across all provided devices.

Parameters:

devices (list[Device])

Return type:

int

max_supported_sequence_length()​

classmethod max_supported_sequence_length(model_weights_size, activation_memory_size, model_config, devices, arch_config, signal_buffer_size=0)

source

Computes the hard upper bound on tokens for a single request.

Mirrors the paged KV cache constraint: per replica, a request cannot exceed total pages per device times page size.

Parameters:

Return type:

int | None

static_memory_size()​

classmethod static_memory_size(model_weights_size, activation_memory_size, signal_buffer_size=0)

source

Calculates static memory usage: model weights plus activations plus signal buffers.

Parameters:

  • model_weights_size (int) – Size of model weights.
  • activation_memory_size (int) – Size of activation memory.
  • signal_buffer_size (int) – Size of P2P signal buffers (fixed-size allocations used by collective comm kernels). Defaults to 0.

Returns:

Total static memory usage in bytes.

Return type:

int