Standard LLM fine-tuning requires storing model weights, gradients, optimizer states, and activationsâoften exceeding GPU memory for models above 70B parameters. DistZO2 eliminates backpropagation entirely, estimating gradients through forward-pass-only perturbation. Distributed across multiple GPUs, this enables fine-tuning of 100B+ models on hardware that cannot run standard training.
zeroth-order optimizationLLM fine-tuningmemory efficient