Skip to main content
Log in

Python module

hf_utils

Utilities for interacting with HuggingFace Files/Repos.

HuggingFaceFile

class max.pipelines.lib.hf_utils.HuggingFaceFile(repo_id, filename, revision=None)

A simple object for tracking Hugging Face model metadata. The repo_id will frequently be used to load a tokenizer, whereas the filename is used to download model weights.

Parameters:

  • repo_id (str )
  • filename (str )
  • revision (str | None )

download()

download(force_download=False)

Download the file and return the file path where the data is saved locally.

Parameters:

force_download (bool )

Return type:

Path

exists()

exists()

Return type:

bool

filename

filename*: str*

repo_id

repo_id*: str*

revision

revision*: str | None* = None

size()

size()

Return type:

int | None

HuggingFaceRepo

class max.pipelines.lib.hf_utils.HuggingFaceRepo(repo_id, revision='main', trust_remote_code=False, repo_type=None)

A class for interacting with HuggingFace Repos.

Parameters:

  • repo_id (str )
  • revision (str )
  • trust_remote_code (bool )
  • repo_type (RepoType | None )

download()

download(filename, force_download=False)

Parameters:

  • filename (str )
  • force_download (bool )

Return type:

Path

encoding_for_file()

encoding_for_file(file)

Parameters:

file (str | Path )

Return type:

SupportedEncoding

file_exists()

file_exists(filename)

Parameters:

filename (str )

Return type:

bool

files_for_encoding()

files_for_encoding(encoding, weights_format=None)

Parameters:

  • encoding (SupportedEncoding )
  • weights_format (WeightsFormat | None )

Return type:

dict[WeightsFormat, list[Path]]

formats_available

property formats_available*: list[WeightsFormat]*

info

property info*: ModelInfo*

repo_id

repo_id*: str*

The HuggingFace repo id. While it’s called repo_id, it can be a HF remote or local path altogether.

repo_type

repo_type*: RepoType | None* = None

The type of repo. This is inferred from the repo_id.

revision

revision*: str* = 'main'

The revision to use for the repo.

size_of()

size_of(filename)

Parameters:

filename (str )

Return type:

int | None

supported_encodings

property supported_encodings*: list[SupportedEncoding]*

trust_remote_code

trust_remote_code*: bool* = False

Whether to trust remote code.

weight_files

property weight_files*: dict[WeightsFormat, list[str]]*

download_weight_files()

max.pipelines.lib.hf_utils.download_weight_files(huggingface_model_id, filenames, revision=None, force_download=False, max_workers=8)

Provided a HuggingFace model id, and filenames, download weight files
and return the list of local paths.

Parameters:

  • huggingface_model_id (str ) – The huggingface model identifier, ie. modularai/Llama-3.1-8B-Instruct-GGUF
  • filenames (list [ str ] ) – A list of file paths relative to the root of the HuggingFace repo. If files provided are available locally, download is skipped, and the local files are used.
  • revision (str | None ) – The HuggingFace revision to use. If provided, we check our cache directly without needing to go to HuggingFace directly, saving a network call.
  • force_download (bool ) – A boolean, indicating whether we should force the files to be redownloaded, even if they are already available in our local cache, or a provided path.
  • max_workers (int ) – The number of worker threads to concurrently download files.

Return type:

list[Path]

generate_local_model_path()

max.pipelines.lib.hf_utils.generate_local_model_path(repo_id, revision)

Generate the local filesystem path where a HuggingFace model repo is cached.

This function takes a HuggingFace repository ID and revision hash and returns the full local filesystem path where the model files are cached by the huggingface_hub library. The path follows the standard HuggingFace caching convention of: ~/.cache/huggingface/hub/models–{org}–{model}/snapshots/{revision}

Parameters:

  • repo_id (str ) – The HuggingFace repository ID in the format “org/model” (e.g. “HuggingFaceTB/SmolLM2-135M”)
  • revision (str ) – The specific model revision hash to use, typically from a repo lock file

Returns:

The absolute path to the cached model files for the specified revision. For example: “~/.cache/huggingface/hub/models–HuggingFaceTB–SmolLM2-135M/snapshots/abc123”

Return type:

str

Raises:

FileNotFoundError – If the model path does not exist locally

repo_exists_with_retry()

max.pipelines.lib.hf_utils.repo_exists_with_retry(repo_id, revision)

Wrapper around huggingface_hub.revision_exists with retry logic. Uses exponential backoff with 25% jitter, starting at 1s and doubling each retry.

We use revision_exists here instead of repo_exists because repo_exists does not take in a revision parameter.

See huggingface_hub.revision_exists for details

Parameters:

  • repo_id (str )
  • revision (str )

Return type:

bool