Resolving Argo CD: "dial tcp 10.43.222.132:6379: i/o timeout" Redis Error
This error is specific to Argo CD. It indicates that the Argo CD Repo Server is unable to communicate with the Redis instance used for caching.
Here is a detailed breakdown of what it means, why it happens, and how to fix it.
1. What does this error mean?
- "Failed to generate manifest": Argo CD is trying to convert your Git files (Helm, Kustomize, or YAML) into Kubernetes objects, but it failed.
- "failed to list refs": To generate the manifest, Argo CD needs to check the Git repository (refs). It tries to check its cache first or store the result in the cache.
- "dial tcp 10.43.222.132:6379: i/o timeout": This is the smoking gun. Port 6379 is the default port for Redis. The Repo Server tried to connect to Redis at that IP address, but the connection timed out.
In short: The Argo CD component that processes your code cannot talk to the Argo CD cache (Redis).
2. What is the source of the error?
The source is a network connectivity or service availability issue within your Kubernetes cluster, specifically between the argocd-repo-server pod and the argocd-redis pod.
Common causes:
- Redis Pod is down: The Redis pod crashed or was evicted.
- Network Policies: You have a NetworkPolicy in the namespace that is blocking traffic on port 6379.
- Resource Exhaustion: Redis is out of memory (OOMKilled) or CPU-throttled, making it unresponsive.
- Service Mesh (Istio/Linkerd): If you use a service mesh, mTLS or proxy issues might be blocking the connection.
- Corrupt Service/Endpoint: The Kubernetes service for Redis is pointing to an old or non-existent IP.
3. How to fix it?
Step 1: Check the Redis Pod Status
Run this command to see if the Redis pod is running:
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-redis
If it is CrashLoopBackOff or Pending, describe the pod to see why:
kubectl describe pod <redis-pod-name> -n argocd
Step 2: Check Redis Logs
If the pod is running but the error persists, check the logs for errors:
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-redis
Step 3: Verify the Service and IP
Check if the IP in your error message (10.43.222.132) matches the ClusterIP of your Redis service:
kubectl get svc argocd-redis -n argocd
If the IP doesn't match, the Repo Server might be using cached, outdated DNS info. Restarting the Repo Server usually fixes this.
Step 4: Test Connectivity (The "Telnet" test)
Try to reach Redis from a temporary pod inside the same namespace:
kubectl run busybox --rm -ti --image=busybox -n argocd -- restart=Never -- telnet argocd-redis 6379
If this times out, you likely have a NetworkPolicy blocking traffic.
Step 5: Check for Network Policies
List network policies in the namespace:
kubectl get netpol -n argocd
If policies exist, ensure they allow traffic between argocd-repo-server and argocd-redis on port 6379.
4. Best Practices & Prevention
To prevent this from happening again, follow these best practices:
- Resource Requests & Limits: Ensure Redis has enough memory. Redis stores data in RAM; if your Argo CD manages hundreds of apps, the default 256MB might not be enough.
- Recommendation: Set limits to at least 512Mi or 1Gi for larger environments.
- Use High Availability (HA): If this is a production environment, use the Argo CD HA installation (
ha-install.yaml). This sets up a Redis Sentinel cluster instead of a single pod, preventing a single point of failure. - Monitor Redis Health: Use a Prometheus exporter for Redis to alert you if memory usage is high or if the hit rate drops significantly.
- Pod Anti-Affinity: If running in HA, ensure Redis pods are scheduled on different nodes so that a single node failure doesn't take down the entire cache.
- Restart Strategy: If you encounter transient network issues, you can perform a rollout restart of the Argo CD components to clear stale connections:
kubectl rollout restart deployment argocd-repo-server -n argocd
kubectl rollout restart deployment argocd-redis -n argocd