← back to projects
ZenFlow — stall-free offloading
How asynchronous gradient updates keep the GPU busy while offloading to the CPU.
The idea — split gradients by impact
top-k important → applied on GPU now
the rest → offloaded to CPU, applied async
GPUforward / backward
CPUasync update
High-impact gradients are applied immediately on the GPU; the
remaining gradients are offloaded to the CPU and updated asynchronously,
fully overlapped with the next step's compute.
The result — no more GPU stalls
ZeRO-Offload — GPU sits idle while the CPU updates
GPU compute
stall
GPU compute
stall
ZenFlow — GPU never pauses; CPU updates overlap underneath
GPU compute — uninterrupted
GPU compute
GPU stall (idle, waiting on CPU)
CPU async update (overlapped)
important gradient → instant GPU update