z4aiΒΆ

A lossless storage-and-distribution layer for AI model checkpoints β€” bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries) and in environments the Hugging Face Hub's Xet backend doesn't cover: self-hosted registries, internal MLOps, plain object storage.

import z4ai

blob = z4ai.compress(weights_bytes)   # smaller, self-describing
data = z4ai.decompress(blob)          # byte-identical original
assert data == weights_bytes

Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:

# store checkpoint N as the bit-exact delta from checkpoint N-1
delta    = z4ai.compress_delta(step_2000, reference=step_1000)
restored = z4ai.decompress_delta(delta, reference=step_1000)   # exact == step_2000

Or from the command line, on files:

z4ai compress   weights.bin -o weights.z4ai
z4ai decompress weights.z4ai -o weights.bin
z4ai info       weights.z4ai            # ratio + per-plane breakdown

It is a small, pure-Python package (NumPy + zstandard; no PyTorch, no compiled toolchain required) with optional native acceleration.

πŸš€ Install

pip install z4ai and you’re ready β€” pure-Python, no build step.

Installation
⚑ Quickstart

Compress a buffer, an ndarray, or a whole .safetensors file in a few lines.

Quickstart
πŸ“¦ Usage

Files, per-tensor random access, sparse / quantized weights, checkpoint deltas.

Usage
🧠 How it works

Field decorrelation, whole-tensor matching, and the best-of candidate selector.

How it works
πŸ–₯️ CLI

z4ai compress / decompress / info β€” pipe-friendly, self-describing frames.

Command line
πŸ“š API reference

Every public function, generated from the source docstrings.

API reference

Honest framingΒΆ

What lossless can β€” and cannot β€” do for weights

On a dense checkpoint a trained float’s mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5Γ— (bf16) / ~1.2Γ— (fp32). z4ai cannot meaningfully out-ratio that wall on dense weights β€” and says so.

The large wins come from redundancy the entropy bound assumes away, which z4ai auto-detects and exploits:

Reduced precision

fp32 files carrying fp16/bf16-origin values have dead low mantissa bits β†’ 2.3–3.0Γ—, automatically.

Quantized weights

INT4/INT8/FP8 dequantised into a wide container β†’ a lossless palette transform β†’ 2.4–10.8Γ— (the common deployed format).

Structure & sparsity

Tied embeddings, duplicated layers, pruned zeros β†’ whole-tensor long-distance matching and a zero-aware path.

Checkpoint deltas

Consecutive checkpoints are ~95–99% identical β†’ store each as a tiny delta β†’ 10–180Γ—.

See How it works for the mechanism and Background & references for prior art.