Fine-tuning many LoRA adapters concurrently on multiple GPUs with no idle bubbles.
Naive / sequential — adapters run one at a time, GPU stages sit idle (bubbles)
mLoRA — adapters interleaved; bubbles filled, all GPUs stay busy
In the naive schedule, each adapter's 4 pipeline stages run one-at-a-time, leaving 3 out of 4 GPU stages idle (bubble) at every step. mLoRA's LoRA-aware pipeline interleaves adapters A–D so that every GPU is executing a stage at every time step — eliminating bubbles and cutting average fine-tuning time by ~30%.