z4aiΒΆ
A lossless storage-and-distribution layer for AI model checkpoints β bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries) and in environments the Hugging Face Hub's Xet backend doesn't cover: self-hosted registries, internal MLOps, plain object storage.
import z4ai
blob = z4ai.compress(weights_bytes) # smaller, self-describing
data = z4ai.decompress(blob) # byte-identical original
assert data == weights_bytes
Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:
# store checkpoint N as the bit-exact delta from checkpoint N-1
delta = z4ai.compress_delta(step_2000, reference=step_1000)
restored = z4ai.decompress_delta(delta, reference=step_1000) # exact == step_2000
Or from the command line, on files:
z4ai compress weights.bin -o weights.z4ai
z4ai decompress weights.z4ai -o weights.bin
z4ai info weights.z4ai # ratio + per-plane breakdown
It is a small, pure-Python package (NumPy + zstandard; no PyTorch, no compiled
toolchain required) with optional native acceleration.
pip install z4ai and youβre ready β pure-Python, no build step.
Compress a buffer, an ndarray, or a whole .safetensors file in a few lines.
Files, per-tensor random access, sparse / quantized weights, checkpoint deltas.
Field decorrelation, whole-tensor matching, and the best-of candidate selector.
z4ai compress / decompress / info β pipe-friendly, self-describing frames.
Every public function, generated from the source docstrings.
Honest framingΒΆ
What lossless can β and cannot β do for weights
On a dense checkpoint a trained floatβs mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5Γ (bf16) / ~1.2Γ (fp32). z4ai cannot meaningfully out-ratio that wall on dense weights β and says so.
The large wins come from redundancy the entropy bound assumes away, which z4ai auto-detects and exploits:
fp32 files carrying fp16/bf16-origin values have dead low mantissa bits β 2.3β3.0Γ, automatically.
INT4/INT8/FP8 dequantised into a wide container β a lossless palette transform β 2.4β10.8Γ (the common deployed format).
Tied embeddings, duplicated layers, pruned zeros β whole-tensor long-distance matching and a zero-aware path.
Consecutive checkpoints are ~95β99% identical β store each as a tiny delta β 10β180Γ.
See How it works for the mechanism and Background & references for prior art.