z4ai¶

A lossless storage-and-distribution layer for AI model checkpoints — bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries) and in environments the Hugging Face Hub's Xet backend doesn't cover: self-hosted registries, internal MLOps, plain object storage.

import z4ai

blob = z4ai.compress(weights_bytes)   # smaller, self-describing
data = z4ai.decompress(blob)          # byte-identical original
assert data == weights_bytes

Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:

# store checkpoint N as the bit-exact delta from checkpoint N-1
delta    = z4ai.compress_delta(step_2000, reference=step_1000)
restored = z4ai.decompress_delta(delta, reference=step_1000)   # exact == step_2000

Or from the command line, on files:

z4ai compress   weights.bin -o weights.z4ai
z4ai decompress weights.z4ai -o weights.bin
z4ai info       weights.z4ai            # ratio + per-plane breakdown

It is a small, pure-Python package (NumPy + zstandard; no PyTorch, no compiled toolchain required) with optional native acceleration.

🚀 Install

pip install z4ai and you’re ready — pure-Python, no build step.

Installation

⚡ Quickstart

Compress a buffer, an ndarray, or a whole .safetensors file in a few lines.

Quickstart

📦 Usage

Files, per-tensor random access, sparse / quantized weights, checkpoint deltas.

Usage

🧠 How it works

Field decorrelation, whole-tensor matching, and the best-of candidate selector.

How it works

🖥️ CLI

z4ai compress / decompress / info — pipe-friendly, self-describing frames.

Command line

📚 API reference

Every public function, generated from the source docstrings.

API reference

Honest framing¶

What lossless can — and cannot — do for weights

On a dense checkpoint a trained float’s mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5× (bf16) / ~1.2× (fp32). z4ai cannot meaningfully out-ratio that wall on dense weights — and says so.

The large wins come from redundancy the entropy bound assumes away, which z4ai auto-detects and exploits:

Reduced precision

fp32 files carrying fp16/bf16-origin values have dead low mantissa bits → 2.3–3.0×, automatically.

Quantized weights

INT4/INT8/FP8 dequantised into a wide container → a lossless palette transform → 2.4–10.8× (the common deployed format).

Structure & sparsity

Tied embeddings, duplicated layers, pruned zeros → whole-tensor long-distance matching and a zero-aware path.

Checkpoint deltas

Consecutive checkpoints are ~95–99% identical → store each as a tiny delta → 10–180×.

See How it works for the mechanism and Background & references for prior art.