How We Reduced Cold Starts by 94% - Halcyon Engineering Blog

When we first launched the Halcyon runtime, cold start latency wasn't a priority. We were focused on correctness, security isolation, and API compatibility. But as adoption grew and customers started deploying latency-sensitive workloads, cold starts became our number one support ticket category.

This post walks through the problem, the experiments we ran, and the architecture changes that ultimately brought our p99 cold start from 1.2 seconds down to 68 milliseconds.

The Problem

Every time a request hit an isolate that hadn't been used recently, we had to spin up a fresh V8 instance, load the user's bundled code, initialize their module graph, and run any top-level setup. Under our original architecture, each of these steps was sequential and fully blocking.

Our monitoring showed a clear bimodal distribution: warm requests averaged 4ms, but cold starts spiked to 800ms-1.2s depending on bundle size. For customers running API gateways or auth middleware on our edge, that was a deal-breaker.

isolate.config.yaml

12345678910111213141516

# Before: original isolate config
runtime:
  isolate:
    pool_size: 0            # no pre-warming
    max_idle_ms: 30000      # evict after 30s idle
    snapshot: false          # full boot every time
    memory_limit_mb: 128
    startup_timeout_ms: 5000

# After: optimized config
runtime:
  isolate:
    pool_size: 8            # pre-warmed isolate pool
    max_idle_ms: 300000     # 5 min idle window
    snapshot: true           # V8 heap snapshots
    snapshot_mode: "eager"   # snapshot at deploy time
    memory_limit_mb: 128
    startup_timeout_ms: 500
          

The Solution

We attacked cold starts from three angles simultaneously. Each optimization on its own wasn't enough, but combined they produced the 94% reduction we were after.

V8 heap snapshots at deploy time. Instead of parsing and compiling JavaScript on every cold start, we now create a serialized V8 heap snapshot when code is deployed. On cold start, we deserialize the snapshot directly into memory -- skipping parse, compile, and top-level execution entirely. This alone cut 400-600ms from cold starts.
Pre-warmed isolate pools. We maintain a pool of 8 generic isolates per edge node that have already booted the runtime but haven't loaded user code yet. When a cold start hits, we grab a pre-warmed isolate and inject the user's snapshot, rather than booting from zero. This eliminated another 200-300ms of V8 initialization overhead.
Predictive pre-warming based on traffic patterns. We built a lightweight traffic predictor that identifies isolates likely to be needed in the next 60 seconds based on recent request patterns. These get speculatively warmed before any request arrives. For high-traffic deployments, this turns most cold starts into warm starts.

Results

After rolling these changes out region by region over three weeks, the numbers spoke for themselves:

68ms

p99 cold start

94%

reduction

12ms

p50 cold start

Cold start support tickets dropped to near zero within a week. More importantly, we saw a measurable uptick in adoption for latency-sensitive use cases -- auth middleware, API gateways, and real-time data transforms -- that customers had previously avoided deploying to the edge.

Conclusion

Sometimes the biggest performance wins come from changing the shape of the problem, not grinding on the existing one.

The lesson here wasn't just technical. We spent months trying to micro-optimize within our existing architecture before stepping back and rethinking the isolation model from scratch.

If you're building something similar, the V8 snapshot approach is the single highest-leverage change you can make. The pool and prediction layers are nice-to-haves that compound on top of it, but snapshots alone will get you most of the way there.

⚡

Want to try the optimized runtime? All Halcyon deployments on the Pro plan and above automatically get snapshot-based cold starts. Check our migration guide for details.