training DeepSpeedPyTorch ZenFlow Stall-free async offloading for LLM training. Integrated into DeepSpeed via official PR. paper code blog demo VLDB '25 mLoRA Fine-tuning LoRA adapters via highly-efficient pipeline parallelism across multiple GPUs. paper code demo VLDB '24 DLRover-RM Resource optimization for deep recommendation model training in the cloud. paper code demo storage Preprint TStore Tensor-centric storage layer for AI model hubs — compress checkpoints by exploiting their internal structure. paper demo NSDI '26 ZipLLM Efficient LLM storage via model-aware synergistic data deduplication and compression. paper code demo inference MLSys '26 MorphServe Workload-aware LLM serving via runtime layer swapping and KV cache resizing. paper demo MLSys '26 λScale Fast scaling for serverless LLM inference — cold start is no longer a death sentence. paper code demo Preprint Scorpio SLO-aware LLM serving — TTFT / TPOT guards with credit-based batching for heterogeneous deadlines. paper code demo vis ACL '26 IGenBench First benchmark for text-to-infographic generation — 600 tests, 30 types, automated reliability checks. paper demo ACL '26 Demo ViviDoc Human-agent collaborative system for generating interactive educational documents from a single topic input. paper demo