Share via

AKS Node Image Upgrade Failures

Andreea Stoenescu 20 Reputation points
2026-03-25T16:13:54.1533333+00:00

Hello Support Team,

We are experiencing issues with node image upgrades across the Azure Kubernetes Service (AKS) clusters.

On 24th March, the scheduled node image upgrades failed across all Kubernetes clusters. The logs show the following message: “Upgrade Failed with status Unspecified, error: Unknown error”

User's image

Could you please investigate the root cause of these upgrade failures.

Thank you,

Andreea Stoenescu

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.


1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-25T16:14:25.78+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The activity log message “Upgrade Failed with status Unspecified, error: Unknown error” indicates that the node image upgrade PUT operation on the agent pool failed, but the platform didn’t surface a specific error code in that view. To identify the root cause and remediate, use the standard AKS failed-operation troubleshooting flow and then check for the common failure conditions that affect upgrades.

    1. In the Azure portal, open the AKS cluster → Activity log and filter Status = Failed and the time range around 24 March. Open the failed Create or Update Agent Pool event and review the Status section and the JSON payload for any error code and message. This is the primary place where the underlying failure reason is exposed.
    2. If an error code is present (for example QuotaExceeded, RequestDisallowedByPolicy, PublicIPCountLimitReached, SubnetIsFull, OverconstrainedAllocationRequest, OutboundConnFailVMExtensionError, or a drain/PDB-related error), use the corresponding remediation:
      • RequestDisallowedByPolicy: a policy is blocking the operation. Review the policy assignment on the subscription/resource group and modify or exclude it so the agent pool update is allowed.
      • QuotaExceeded: the region or SKU quota is exhausted. Check current usage and either delete unused resources or request a quota increase before retrying the node image upgrade.
      • PublicIPCountLimitReached: the subscription/region has hit the public IP limit. Free unused public IPs or request a higher limit.
      • OverconstrainedAllocationRequest or allocation failures: the requested VM size/zone has no capacity. Choose a different VM size or region for the node pool.
      • OutboundConnFailVMExtensionError: the VM extension cannot reach required endpoints. Verify NSG and firewall rules allow outbound access to https://aka.ms/aks/outbound, https://aka.ms/aks/ssh, https://aka.ms/aks/agent, and https://aka.ms/aks/containerinsights.
      • Drain/PDB-related errors: pods cannot be drained from nodes. Use kubectl get pods --all-namespaces -o wide and kubectl describe pod to find stuck pods; temporarily relax or remove PodDisruptionBudgets, or force-delete problematic pods, then retry the upgrade.
      • Subnet too small or SubnetIsFull: the subnet cannot allocate more IPs for upgraded nodes. Move the node pool to a larger subnet or recreate it with a larger address space.
    3. If the cluster itself is marked as Failed after the upgrade attempt, follow the failed-cluster guidance: use the activity log and the Diagnose and solve problems blade on the AKS resource to detect and fix common issues, including network, DNS, or infrastructure problems.
    4. After correcting the identified issue (quota, policy, networking, PDB, subnet, or capacity), re-run the node image upgrade for the affected node pools.

    Because the failure is reported only as “Unknown error” in the screenshot, the concrete root cause will be one of the underlying errors surfaced in the detailed activity log JSON or Diagnose and Solve Problems output. Those tools should be used to pinpoint which of the documented failure patterns applies to the environment on 24 March.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.