Kubernetes Storage Incident Deep Dive (RKE2)
From Failure to Best Practices
Architecture Diagram (Storage Flow)
This diagram outlines the flow of ephemeral storage resources from the application pod down to the node disk, highlighting the key monitoring checkpoints.
emptyDir / /tmp FS] RunK3s["/run/k3s (runtime)
tmpfs / mounts"] VarLib["/var/lib/rancher
rke2/agent
container layers & volumes"] NodeDisk[Node Disk VM] end subgraph Monitoring[Monitoring Layer] Prom[Prometheus + Grafana
node_filesystem_usage
container_fs_usage] end App -->|Temp Data| Ephemeral Ephemeral -->|Runtime Data| RunK3s Ephemeral -->|Persistent Data| VarLib RunK3s -->|Saturates| NodeDisk VarLib -->|Stores to| NodeDisk Prom -.->|Monitors| NodeDisk classDef default fill:#f9fafb,stroke:#d1d5db,stroke-width:2px; classDef disk fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#991b1b; classDef monitor fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#1e40af; class NodeDisk disk; class Prom monitor;
Introduction
In modern cloud-native environments, most engineers focus heavily on CPU and memory. However, during a recent production-like scenario, I encountered a critical issue that highlighted a less-discussed but equally important resource: Ephemeral Storage in Kubernetes.
This article walks through:
- How the issue happened
- How RKE2 manages storage internally
- Step-by-step troubleshooting methodology
- Immediate recovery actions
- Long-term best practices
Kubernetes: RKE2 (v1.31)
OS: Ubuntu 22.04
Runtime: containerd
The Incident
Pods started getting evicted with the following event log:
The node was low on resource: ephemeral-storage
At the same time, the cluster experienced cascading effects:
- Nodes became unstable
- Disk usage reached 100%
- Services degraded
Investigation
1. Disk Check
df -h
Result: Root disk fully saturated.
2. Identify Large Directories
sudo du -h / --max-depth=1 | sort -rh
Key findings:
/var→ very large/run→ abnormally large
3. Deep Dive into Specific Paths
sudo du -h /var/lib/rancher --max-depth=2 | sort -rh
sudo du -h /run --max-depth=1 | sort -rh
Findings:
/var/lib/rancher/rke2/agent→ contained bloated container layers & volumes/run/k3s→ contained temporary runtime data (unexpectedly huge)
Root Cause
The issue was primarily caused by:
- Applications generating temporary files (images, processing data, etc.)
- Files stored directly in ephemeral storage
- No cleanup mechanism in place
Application → Temp files → Node filesystem → No cleanup → Disk full → Pod eviction
How RKE2 Manages Storage
Understanding internal storage management is key to resolving capacity issues.
1. Persistent Node Storage
Located at /var/lib/rancher/rke2/. This path contains:
- Container images
- Writable layers (overlayfs)
- Volumes
- Cluster data (etcd)
2. Runtime Storage
Located at /run/k3s/. This manages:
- Temporary mounts
- Sockets
- Runtime data
3. Logs (stdout)
Maintained under /var/log/containers/ and /var/log/pods/.
- All container logs flow here
- Managed strictly by journald / container runtime
Immediate Recovery Actions
sudo systemctl restart rke2-agent
Effect: Clears runtime temp storage proactively.
sudo crictl rmi --prune
Effect: Removes unused images filling up /var/lib.
sudo crictl pods --state Exited -q | xargs -r crictl rmp
Effect: Cleans stopped containers safely.
sudo journalctl --vacuum-size=500M
Effect: Reduces system logs footprint.
Troubleshooting Methodology (Reusable)
df -hStep 2: Find biggest directories
du -h / --max-depth=1 | sort -rhStep 3: Drill down persistent
du -h /var/lib/rancher --max-depth=2Step 4: Check runtime
du -h /runStep 5: Inspect containers
crictl ps -a and crictl images
Best Practices (Production Ready)
1. Set Ephemeral Storage Limits
resources:
requests:
ephemeral-storage: "200Mi"
limits:
ephemeral-storage: "1Gi"
2. Clean Temporary Files (Application Level)
- Always delete files immediately after processing
- Avoid completely uncontrolled
/tmpusage in code
3. Use Controlled Volumes
emptyDir:
sizeLimit: 1Gi
4. Monitoring (Critical)
Deploy standard tools like Prometheus & Grafana to track metrics:
- Node disk usage
- Container filesystem usage
5. Log Management (stdout)
Configure journald limits inside /etc/systemd/journald.conf:
SystemMaxUse=500M
MaxFileSec=7day
6. Capacity Planning
- Minimum 50GB+ per node recommended
- Consider placing a separate physical disk layout targeted for
/var
Key Takeaways & Conclusion
- Kubernetes does NOT manage disk automatically out of the box.
- Ephemeral storage is frequently overlooked but ultimately critical for stability.
- AI & processing workloads can silently fill up entire disks if untracked.
- Without well-defined limits and active cleanup, complete cluster instability is inevitable.
Proper management requires application discipline (cleanup), Kubernetes configuration (limits), and infrastructure monitoring.