Quickstart¶
Compress a buffer¶
import z4ai
# `dtype` tells z4ai the element width so it can split the float planes.
blob = z4ai.compress(weights_bytes, dtype="bf16") # -> bytes, self-describing
data = z4ai.decompress(blob) # -> bytes, byte-identical
assert data == weights_bytes
The frame is self-describing: decompress needs no side-channel metadata, and the
output is never larger than the input.
Compress a NumPy array (dtype + shape preserved)¶
import numpy as np
import z4ai
w = np.random.randn(1024, 1024).astype(np.float16)
blob = z4ai.compress_array(w)
back = z4ai.decompress_array(blob)
assert np.array_equal(w, back) and back.dtype == w.dtype
Store a sequence of checkpoints as deltas¶
The single largest win: consecutive checkpoints (fine-tuning, RLHF, continued pre-training) are ~95-99% identical, so each is stored as a tiny delta.
import z4ai
# store checkpoint N as the bit-exact delta from checkpoint N-1
delta = z4ai.compress_delta(step_2000, reference=step_1000, dtype="bf16")
restored = z4ai.decompress_delta(delta, reference=step_1000)
assert bytes(restored) == step_2000 # lossless
Shrink a .safetensors file, keep per-tensor random access¶
z4ai.safetensors compresses a .safetensors file into a ZSTN container that
stays byte-identical on the round trip - but unlike a plain zip, it keeps a
per-tensor index, so you can lazily read a single tensor without touching the
rest of the file (the same access pattern safetensors gives you).
from z4ai import safetensors
# whole-file: ZSTN is byte-identical to the original .safetensors on decompress
stats = safetensors.compress_file("model.safetensors", "model.zstn")
print(f"{stats['ratio']:.2f}x ({stats['saved_pct']:.1f}% smaller)")
safetensors.decompress_file("model.zstn", "restored.safetensors")
# random access: read just one tensor, seeking past the rest of the file
with safetensors.ZstnReader.open("model.zstn") as r:
print(r.names()[:3]) # tensor names, no decode
w = r.read_numpy("model.embed_tokens.weight") # decodes only this tensor
From the command line¶
The CLI is the generic byte codec (one frame in, one frame out). For the
tensor-aware ZSTN container with per-tensor random access, use the
z4ai.safetensors API above.
z4ai compress weights.bin -o weights.z4ai
z4ai decompress weights.z4ai -o weights.bin
z4ai info weights.z4ai # ratio + per-plane breakdown
See Command line for every flag (effort tiers, threads, dtype).
Load into PyTorch / transformers¶
With the hf extra (pip install "z4ai[hf]"), load compressed weights straight
into a torch state dict, or patch transformers to do it transparently:
import z4ai
z4ai.save_file(model.state_dict(), "model.z4ai.safetensors")
state = z4ai.load_file("model.z4ai.safetensors") # -> {name: torch.Tensor}
# ...or transparent from_pretrained (call before importing transformers):
from z4ai import enable_hf
enable_hf()
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("your-org/model", variant="z4ai")
See Usage for the full HuggingFace integration.
The three mental tiers of the API¶
You have… |
Use |
Notes |
|---|---|---|
a raw buffer or NumPy array |
|
the core codec |
a checkpoint similar to one you already stored |
|
stores only what changed |
a |
|
keeps lazy single-tensor loads |
See Usage for the detailed recipes and Command line for the command line.