← back to projects

MorphServe — runtime-adaptive LLM serving

How quantized layer swapping and KV cache resizing keep SLOs intact under load spikes.

Mechanism — layer swapping + KV cache resizing

Transformer layers

KV cache

Mem. pressure

LOW PRESSURE

HIGH PRESSURE

FP16

FP16 INT8

FP16

FP16 INT8

FP16

KV cache

HI LO

P95 TTFT

SLO limit

As memory pressure spikes, MorphServe swaps low-impact layers to INT8 (amber, slimmer) and shrinks the KV cache to free capacity. Both restore when pressure drops. Throughout the spike, P95 TTFT stays well inside the SLO threshold — ~92% fewer violations vs. full-precision serving.

FP16 layer (full precision) INT8 layer (quantized swap) KV cache capacity Memory pressure P95 TTFT (stays in SLO)