Share via

AKS konnectivity-agent tunnel instability causing intermittent 503s on admin operations

Bernard Kowalski 0 Reputation points
2026-04-09T02:51:53.66+00:00

Cluster aks-asclepius-prod (resource group rg-aks, East US 2, K8s 1.35.1, private cluster with KMS etcd encryption) experiences persistent konnectivity tunnel failures.

Symptoms:

  • konnectivity-agent pods enter cannot connect once error loops, unable to re-establish tunnel to control plane proxy servers
  • rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF"
  • Intermittent 503 Service Unavailable on kubectl exec, port-forward, and logs
  • Pod-to-pod application traffic is unaffected

Environment:

  • Single node: Standard_D4ps_v6 (ARM64)
  • KMS etcd encryption via public Key Vault (asclepius-infra-prod) — ruled out as cause (0 throttled requests, ~36ms avg latency)
  • Dev cluster (same region, also private, no KMS) shows identical EOF churn but functions correctly

Troubleshooting performed:

  1. Rollout restart of konnectivity-agent — no improvement
  2. az aks update --yes control plane reconcile — temporarily restores tunnel, but issue recurs

Ask: Investigate why konnectivity-agent pods intermittently fail to reconnect to control plane proxy servers, requiring manual control plane reconciliation to restore admin operations.

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.