Deployment Strategy as the "Rules of the Game": How Kubernetes Replaces Your Old Code
The "Rules of the Game"
Every time you push a new version of your application to a Kubernetes cluster, you need a strategy to ensure the old code is seamlessly replaced by the new code. These strategies act as the Rules of the Game.
Here, we will break down a specific resource-constrained configuration (the "Tight Cluster"), compare it against the standard strategies available in Kubernetes natively, and explore advanced paradigms.
1. The "Tight Cluster" Strategy
In environments with heavy workloads—such as large AI models—where your nodes have virtually no spare CPU, a standard deployment can lead to "Insufficient CPU" errors. To solve this, we adjust the deployment strategy parameters.
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
maxSurge: 0: This tells Kubernetes: "Do not create any extra pods." Usually, Kubernetes tries to start the new pod before killing the old one (to avoid downtime). By setting this to 0, you force it to wait.maxUnavailable: 1: This tells Kubernetes: "It is okay if 1 pod is offline during the update."
The Result
Because you have no extra CPU space on your nodes, Kubernetes intentionally kills the old pod first (releasing its CPU reservation), and then starts the new pod in that freshly emptied space.
- PRO: Solves "Insufficient CPU" errors completely.
- CON: If you only have 1 replica, you will experience a fleeting 10–30 seconds of downtime while the old pod stops and the new model loads.
2. The Two Main Kubernetes Strategies
Type A: RollingUpdate (The Default & Professional Choice)
This is the standard for modern web applications. It replaces pods incrementally, ensuring minimal disruption.
- Default Behavior: Usually
maxSurge: 25%andmaxUnavailable: 25%. - Use Case: High-availability apps where you demand Zero Downtime.
- Requirement: Your cluster must have enough "spare" capacity (CPU/RAM) to run a few extra pods during the transition.
Type B: Recreate (The "Clean Slate" Choice)
This is a brutal but sometimes necessary "Kill all, then Start all" approach.
- Mechanism: It completely shuts down all version 1 pods, waits for them to terminate, and only then starts the version 2 pods.
- Use Case (Singleton Apps): If your application (like a legacy database engine) cannot tolerate two versions running simultaneously without corrupting state.
- Use Case (Resource Constrained): When the cluster is exceptionally small and absolutely cannot afford any surge.
- Downside: Guaranteed total downtime during the swap.
3. Comparison & Use Cases
| Strategy | maxSurge | maxUnavailable | Downtime? | Best For... |
|---|---|---|---|---|
| Standard Rolling | 25% | 25% | None | Web APIs, Frontends, Microservices. |
| Tight Rolling (The AI Setup) | 0 | 1 | Short | Heavy AI Engines, tightly-packed clusters. |
| Recreate | N/A | N/A | Longer | Batch jobs, DB migrations, Singleton apps. |
4. Advanced "Pro" Strategies (Argo Rollouts)
Standard Kubernetes natively supports only the strategies mentioned above. However, by introducing tools like Argo Rollouts (often paired with ArgoCD), you can implement "Next Level" robust release pipelines:
Blue/Green Deployments
You spin up a completely isolated second environment (Green). You test it thoroughly. Once validated, you flip a switch at the Service or Ingress layer to route all traffic there instantly.
- Pro: Instant, zero-friction rollback.
- Con: Temporarily consumes double the hardware resources (CPU/RAM) during the test phase.
Canary Deployments
You bleed exactly 10% of traffic to the new version while 90% stays on the old. Check the metrics. If there are no 500 errors, dial it up to 50%, then 100%.
- Pro: Unquestionably the safest way to release code to production.
5. Best Practice Recommendations
- For Heavy AI Engines: Stick to the
maxSurge: 0&maxUnavailable: 1setup. Because your pods are heavy, trying to "surge" will predictably fail on saturated nodes unless you drastically autoscale your cluster. - For Small Web Apps: Use the default
RollingUpdate(remove the maxSurge/maxUnavailable lines completely). The seamless cutover provides the best UX. - Always Use Probes: No matter your strategy, if you omit
readinessProbes, Kubernetes assumes the new pod is "Ready" the millisecond the container process starts—even if your heavyweight AI model is still loading 5GB into RAM. This leads to dropped traffic.
Summary
If you're operating on a cluster that's almost full, the "Tight Cluster" (RollingUpdate with 0 surge) configuration is a thoroughly professional way to handle high-resource pods. It guarantees the "old" hardware reservation is physically deleted before the "new" reservation is demanded, safeguarding your cluster from frustrating sizing deadlocks.