Tingfeng (Felix) Lan

first author

ZenFlow: Stall-Free Offloading Training via Asynchronous Updates

preprint · 2025

Async updates kill the offloading stall — train bigger models on the same GPUs.

[pdf]

co-author

mLoRA: Fine-Tuning LoRA Adapters via Pipeline Parallelism

VLDB '25

Pipeline-parallel LoRA training across multiple GPUs.

[pdf]

co-first

DLRover-RM: Resource Optimization for Deep Recommendation Models in the Cloud

VLDB '24

Resource autoscaling that understands recommendation training workloads.

[pdf]

first author

TStore: Rethinking AI Model Hub with Tensor-Centric Compression

preprint · 2026

A tensor-centric storage layer for AI model hubs — compressing checkpoints by exploiting their internal structure.

[pdf][code]

co-author

ZipLLM: Efficient LLM Storage via Model-Aware Deduplication and Compression

NSDI '26

Synergistic dedup + compression tuned to how LLM weights actually look on disk.

[pdf]

co-author

MorphServe: Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing

MLSys '26

Adapt the serving stack at runtime — swap layers, resize KV cache, ride the workload.

[pdf]

co-author

λScale: Fast Scaling for Serverless LLM Inference

MLSys '26

Cold start is no longer a death sentence for serverless LLMs.

[pdf]

co-author

Scorpio: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference

preprint · 2025

SLO-aware LLM serving — TTFT/TPOT guards with credit-based batching for workloads with heterogeneous deadlines.

[pdf]

co-author

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

preprint · 2026

First benchmark for text-to-infographic generation — 600 tests across 30 infographic types, automated reliability checks via atomic yes/no questions.

[pdf]

co-author

Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

preprint · 2026

Human-agent system for interactive educational documents — multi-agent pipeline (Planner / Executor / Evaluator) plus a human-readable DocSpec IR.

[pdf]

Isn't it so fun?

ZenFlow: Stall-Free Offloading Training via Asynchronous Updates

mLoRA: Fine-Tuning LoRA Adapters via Pipeline Parallelism

DLRover-RM: Resource Optimization for Deep Recommendation Models in the Cloud

TStore: Rethinking AI Model Hub with Tensor-Centric Compression

ZipLLM: Efficient LLM Storage via Model-Aware Deduplication and Compression

MorphServe: Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing

λScale: Fast Scaling for Serverless LLM Inference

Scorpio: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

ZenFlow

TStore

ZipLLM

MorphServe

λScale

IGenBench