yt
/experience
/blog
/tools
/uses
/mcp
/admin
rss
gh
yt
/index
/experience
/blog
/tools
/uses
/mcp
/admin
rss
gh
berlin ·
light
theme
← /blog/tags
#MLOps
2 posts tagged.
01
The request is the wrong unit of scale for LLMs on Kubernetes
Your dashboard says traffic is flat while latency drifts and the GPU strains. The HTTP request is only the envelope; the real work is token processing. Why tokens, not requests, are the unit of scale for LLMs on Kubernetes.
#kubernetes
#LLM
#Platform Engineering
#AI Infrastructure
#MLOps
#DevOps
2026-05-21 · kubernetes
02
Building a production LLM platform on Kubernetes
I have run Kubernetes in production for microservices, not LLMs. Serving large language models breaks the assumptions that make K8s good at web apps. Here is how I would architect a production LLM platform, vendor-neutral, with the router, token accounting, and autoscaling Kubernetes will not give you.
#kubernetes
#LLM
#Platform Engineering
#AI Infrastructure
#MLOps
2026-05-21 · kubernetes