← back to projects

SCORPIO — SLO-aware LLM serving

Serving the right requests at the right time by enforcing per-request TTFT and TPOT SLOs.

The idea — least-deadline-first + reject the doomed

Incoming LDF Scheduler Served
LDF
deadline
deadline
deadline
deadline
REJECT

Incoming requests each carry a deadline (short bar = tight, long bar = relaxed). The LDF scheduler reorders them — tightest deadline goes first. Requests whose deadline is already unattainable are immediately rejected, freeing capacity for requests that can still be served in time.

The result — SLO-aware vs FCFS

FCFS  — first-come-first-served ignores deadlines
met SLO
missed
met SLO
missed
met SLO
missed
Goodput
low
SCORPIO  — LDF + credit batching keeps all requests on time
all requests met SLO
Goodput
up to 14.4x higher
urgent deadline medium deadline relaxed deadline met SLO missed SLO rejected (doomed)