Why your Kubernetes rolling deploy still drops requests

"Zero-downtime" is the thing everyone assumes they get for free from a Kubernetes rolling update. You bump the image tag, the new pods come up, the old ones go away, kubectl rollout status turns green. Done, right?

Then someone watches the dashboards during a deploy and sees a little spike of 502s and reset connections every single time. Not an outage — just a handful of users who got an error mid-rollout, on every release, forever. I shipped that exact pattern with .NET microservices on EKS for longer than I would like to admit, because the deploy looked successful. Zero-downtime is not automatic. It is four specific things, and Kubernetes does none of them for you by default.

1. Probes: gate traffic with readiness, not liveness

The single most common mistake is conflating the three probe types. They do completely different jobs.

startupProbe — "has the app finished booting?" While it runs, the other two are paused. This is what stops a liveness probe from killing a slow-cold-starting service before it is even up.
readinessProbe — "should this pod receive traffic right now?" Fail it and the pod is pulled from the Service endpoints. It is not restarted. This is your traffic gate.
livenessProbe — "is this container wedged and in need of a restart?" Fail it and the kubelet kills the container.

The dangerous anti-pattern is pointing liveness at a deep health endpoint that also checks dependencies. Your database has a 10-second blip, the liveness probe fails across every replica at once, and Kubernetes "helpfully" restarts your entire fleet during the exact moment it was already under stress. A liveness probe should be shallow and almost always pass — it answers "is the process alive," nothing more. Put the dependency checks in readiness, where a failure removes one pod from rotation instead of restarting all of them.

startupProbe:
  httpGet: { path: /health/live, port: 8080 }
  failureThreshold: 30
  periodSeconds: 2
livenessProbe:
  httpGet: { path: /health/live, port: 8080 }  # shallow: process alive
  periodSeconds: 10
readinessProbe:
  httpGet: { path: /health/ready, port: 8080 }  # deep: deps reachable
  periodSeconds: 5

2. The shutdown race — the real source of dropped requests

This is the one almost nobody gets right, and it is the actual cause of those deploy-time 502s.

When a pod is deleted, two things happen concurrently: the kubelet sends SIGTERM to your container, and the endpoints controller starts removing the pod from the Service. Those are not coordinated. For a short window, kube-proxy (or your ingress) on other nodes still has the dying pod in its routing table and keeps sending it new connections — after it already got the signal to shut down.

So if your app does the "correct" thing and exits immediately on SIGTERM, you drop every request that arrives during that propagation window, plus any still in flight. The deploy succeeded; the users got connection-refused.

The fix has two halves. First, a preStop hook that simply waits, so endpoint removal propagates before your app starts shutting down:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 40

The kubelet runs preStop, then sends SIGTERM, then waits up to terminationGracePeriodSeconds before SIGKILL. That sleep is unglamorous and it is the highest-leverage line in this whole post. Second, your app has to actually drain: stop accepting new connections on SIGTERM, finish in-flight requests, then exit — within the grace period.

3. PodDisruptionBudgets — survive node drains

Rolling deploys are not the only way replicas disappear. Cluster upgrades, node-group rotations, and the autoscaler removing an underused node all drain nodes, which evicts your pods. Without a PodDisruptionBudget, a drain can legally evict every replica of a service at once, and now you have a real outage during what was supposed to be routine maintenance.

A PDB tells Kubernetes the floor it must respect during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: api }
spec:
  minAvailable: 2
  selector:
    matchLabels: { app: api }

Now a kubectl drain blocks rather than taking you below two healthy pods. This is the difference between a node rotation being a non-event and being an incident.

4. Rolling-update config — never dip below capacity

The default rolling update will briefly run fewer ready pods than you asked for. Under real load that dip is enough to shed requests. Pin it so new pods are ready before old ones leave:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 0
    maxSurge: 1

maxUnavailable: 0 means Kubernetes will not terminate an old pod until a new one is ready (per the readiness probe above). Combined with proper readiness gating, capacity never drops during a rollout.

The .NET specifics

ASP.NET Core actually has good bones for this, but the defaults need nudging. The generic host already listens for SIGTERM and triggers graceful shutdown — Kestrel stops accepting new connections and drains in-flight ones. The catch is the shutdown timeout: the default is short, so a long request can get cut off. Give the host enough room to drain inside your grace period:

builder.Services.Configure<HostOptions>(o =>
    o.ShutdownTimeout = TimeSpan.FromSeconds(25));

And split your health endpoints so readiness checks dependencies (the database you hit through Dapper, the message broker) while liveness stays shallow:

builder.Services.AddHealthChecks()
    .AddNpgSql(conn, name: "db", tags: ["ready"]);

app.MapHealthChecks("/health/live", new() { Predicate = _ => false });
app.MapHealthChecks("/health/ready", new()
{
    Predicate = c => c.Tags.Contains("ready"),
});

Predicate = _ => false runs no checks for liveness — it returns healthy as long as the process can serve the request, which is exactly what liveness should mean. Readiness runs only the checks tagged ready, so a database hiccup pulls the pod from rotation instead of restarting it.

One more .NET-on-Kubernetes detail: make sure your container actually receives SIGTERM as PID 1. If you wrap the app in a shell script (sh -c "dotnet App.dll"), the shell is PID 1 and may not forward the signal, so your graceful shutdown never runs. Exec the binary directly, or use an init that forwards signals.

The Node.js specifics

Same four controls, different shutdown ergonomics — and Node makes you do more of the work yourself. Node does not shut down gracefully on its own: if you do not handle SIGTERM, the process exits immediately and drops every in-flight request. You have to wire it.

const server = app.listen(8080);

process.on('SIGTERM', () => {
  // Stop accepting new connections; the callback fires once in-flight finish.
  server.close(() => process.exit(0));
  // Keep-alive sockets can hold close() open — release idle ones (Node 18.2+).
  server.closeIdleConnections?.();
  // Hard cap so a stuck request can't outlast the grace period.
  setTimeout(() => process.exit(1), 25_000).unref();
});

The subtle part is HTTP keep-alive: idle-but-open sockets keep server.close() from ever completing, so on Node 18.2+ call closeIdleConnections() and set server.keepAliveTimeout below your grace period. Fastify and NestJS wrap all of this — await app.close() performs the drain — but with raw Express you own it.

Health checks are two routes: shallow liveness, dependency-aware readiness.

app.get('/health/live', (_req, res) => res.sendStatus(200));   // process alive
app.get('/health/ready', async (_req, res) => {
  try {
    await db.query('select 1');                                // deps reachable
    res.sendStatus(200);
  } catch {
    res.sendStatus(503);
  }
});

And the PID-1 trap is sharper in Node: npm does not forward SIGTERM to your process. CMD ["npm", "start"] leaves npm as PID 1, your handler never runs, and the kubelet SIGKILLs you after the grace period — dropping requests on every single deploy. Run Node directly (CMD ["node", "server.js"]) or add an init (--init / tini) to forward signals. The Deployment manifest below is identical for Node — only the image and that entrypoint change.

Putting it together

Here is a real orders-api with every one of these controls in one place — the manifest, and the ASP.NET Core wiring it expects.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  namespace: shop
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0                                # never dip below capacity
      maxSurge: 1
  selector:
    matchLabels: { app: orders-api }
  template:
    metadata:
      labels: { app: orders-api }
    spec:
      terminationGracePeriodSeconds: 40
      containers:
        - name: app
          image: registry.example.com/orders-api:1.8.3 # pinned, never :latest
          ports: [{ containerPort: 8080 }]
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 10"]       # win the endpoint-removal race
          startupProbe:
            httpGet: { path: /health/live, port: 8080 }
            failureThreshold: 30
            periodSeconds: 2
          livenessProbe:
            httpGet: { path: /health/live, port: 8080 }  # shallow: process alive
            periodSeconds: 10
          readinessProbe:
            httpGet: { path: /health/ready, port: 8080 } # deep: gates traffic
            periodSeconds: 5
          resources:
            requests: { cpu: 250m, memory: 256Mi }
            limits: { memory: 512Mi }                     # memory limit, no CPU limit
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orders-api
  namespace: shop
spec:
  minAvailable: 3
  selector:
    matchLabels: { app: orders-api }

And the matching Program.cs — the shutdown budget, a dependency-aware readiness check, and a shallow liveness check, wired to the exact /health/* paths the probes hit:

var builder = WebApplication.CreateBuilder(args);

// Let in-flight requests finish on SIGTERM, inside the pod's grace period.
builder.Services.Configure<HostOptions>(o =>
    o.ShutdownTimeout = TimeSpan.FromSeconds(25));

// Readiness checks the database (reached via Dapper); liveness checks nothing.
builder.Services.AddHealthChecks()
    .AddNpgSql(
        builder.Configuration.GetConnectionString("Orders")!,
        name: "db",
        tags: ["ready"]);

var app = builder.Build();

// Liveness: no checks — "is the process able to answer at all?"
app.MapHealthChecks("/health/live", new HealthCheckOptions { Predicate = _ => false });

// Readiness: only the "ready"-tagged checks. A DB blip pulls THIS pod from
// rotation instead of restarting the whole fleet.
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
});

app.MapGet("/orders/{id:guid}", async (Guid id, IDbConnection db) =>
    await db.QuerySingleOrDefaultAsync<Order>(
        "select * from orders where id = @id", new { id }));

app.Run();

Nothing here is exotic. It is the same handful of controls, assembled: readiness gates traffic, preStop wins the race, the host drains in-flight work on SIGTERM, the PDB survives node drains, and maxUnavailable: 0 holds capacity through the rollout. Wire it once per service and the deploy-time 502s are gone.

The checklist

[ ] startupProbe for slow boots; liveness shallow; readiness gates traffic
[ ] liveness does NOT check dependencies (or a DB blip restarts everything)
[ ] preStop sleep (~10s) so endpoint removal beats SIGTERM
[ ] app drains in-flight on SIGTERM within terminationGracePeriodSeconds
[ ] PodDisruptionBudget so node drains can't evict every replica
[ ] rollingUpdate maxUnavailable: 0
[ ] app is PID 1 (or signals are forwarded)

The principle

A green kubectl rollout status tells you the new pods are running. It tells you nothing about whether a user got a reset connection on the way there. Real zero-downtime is the boring sum of four controls: readiness gates traffic, the preStop sleep wins the endpoint-removal race, the PDB survives drains, and maxUnavailable: 0 holds your capacity. None of it is automatic, and all of it is one-time config you write once per service and forget. The 502s during deploys are not the cost of doing business on Kubernetes — they are a missing sleep 10.