Share via

network-plugin & settings for large AKS cluster?

Zach Howell 115 Reputation points
2026-04-01T17:53:26.34+00:00

I'm trying to create a large AKS cluster with 1k nodes. I'm running this command:

az aks create --name myname --location mylocation --enable-managed-identity --nodepool-name default --resource-group myresourcegroup --node-vm-size Standard_B2s --node-count=1 --pod-cidr 100.64.0.0/10 --tier standard --outbound-type managedNATGateway --nat-gateway-managed-outbound-ip-count 4 --nat-gateway-idle-timeout 10 --enable-cluster-autoscaler --min-count=1 --max-count=1000

With a lot of effort put into the pod-cidr bits. This actually works on one machine but fails on another, and I believe the difference is the second machine is using aks-preview to get access to some other features. On that second machine, I get the following error:
Please explicitly specify the network plugin type

I set to azure & get this error:
Please specify network plugin mode `overlay` when using pod_cidr or use network plugin `kubenet`. For more information about Azure CNI Overlay please see https://aka.ms/aksoverlay

I try out overlay:
az aks create: 'overlay' is not a valid value for '--network-plugin'. Allowed values: kubenet, azure, none.

Fine, only one more option then - I try out kubenet & get this error:
Cluster's upper limit node count is 1000 which exceeds the limit for Kubenet 400. If Autoscaler is enabled, the max-count from each nodepool is counted towards this total. https://aka.ms/aks/kubenet-networking-overview
What the heck? I've tried out all values & neither work. What value should I be setting here?
Additional info:
az --version gives 2.81 on my first machine & 2.84 on my second failing preview machine.

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.

0 comments No comments

Answer accepted by question author
  1. Jilakara Hemalatha 11,520 Reputation points Microsoft External Staff Moderator
    2026-04-01T19:19:04.55+00:00

    Hello Zach,

    Thank you for providing the details about the AKS cluster creation issue.

    The behavior you’re seeing is due to stricter validation in newer Azure CLI versions (for example, az 2.84 with the aks-preview extension). When you specify a custom --pod-cidr such as 100.64.0.0/10, the CLI now requires you to explicitly define both the network plugin and the plugin mode.

    In your case, using --network-plugin kubenet will not work because kubenet has a scaling limit of 400 nodes when cluster autoscaler is enabled. Since your configuration allows scaling up to 1000 nodes, this results in the error you observed.

    To support this scale, you should use Azure CNI with overlay mode. This setup allows pod IPs to be allocated from the specified pod CIDR, independent of the VNet subnet, which enables larger cluster sizes without hitting kubenet limitations.

    To resolve the issue, please update your command to include the following parameters:

    az aks create --name myname  --location mylocation --resource-group myresourcegroup  --nodepool-name default --node-vm-size Standard_B2s --node-count 1 --enable-managed-identity --tier standard --outbound-type managedNATGateway --nat-gateway-managed-outbound-ip-count 4  --nat-gateway-idle-timeout 10 --enable-cluster-autoscaler --min-count 1  --max-count 1000  --network-plugin azure  --network-plugin-mode overlay --pod-cidr 100.64.0.0/10
    

    Also, please make sure that:

    • Your Azure CLI version is 2.48 or later (required for --network-plugin-mode)
    • The pod CIDR range does not overlap with any existing VNet or subnet
    • The aks-preview extension is up to date, if you are using preview features

    Hope this helps! Please let me know if you have any queries.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2026-04-01T17:53:50.5866667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The behavior comes from new AKS networking rules introduced in newer API/CLI versions and the limits of each network plugin.

    Key points from the error messages and documented behavior:

    1. --pod-cidr usage
      • podCidr is only valid when using kubenet or when using BYO CNI / overlay scenarios where the control plane still needs a pod CIDR.
      • In the ARM API, podCidr is explicitly described as: “A CIDR notation IP range from which to assign pod IPs when kubenet is used.”
      • When using Azure CNI (standard --network-plugin azure), --pod-cidr is not used; pods get IPs from the subnet instead.
    2. Kubenet node limit
      • With kubenet, the documented behavior is that the pod CIDR is split into /24 blocks per node, and the address space must be large enough for the expected node count.
      • There is also a hard scale limit: kubenet is supported only up to 400 nodes. The error seen:

        “Cluster's upper limit node count is 1000 which exceeds the limit for Kubenet 400.”

      • Because --max-count=1000, kubenet cannot be used for this cluster.
    3. Azure CNI and large clusters
      • For clusters beyond 400 nodes, Azure CNI is required. The troubleshooting guidance for large clusters explicitly states:

        “To scale beyond 400 nodes, you have to use the Azure CNI networking plug-in.”

      • With Azure CNI, IPs come from the subnet, not from --pod-cidr. Planning must ensure the subnet has enough IPs for nodes * maxPodsPerNode, and if insufficientSubnetSize occurs, the fix is to move to a larger subnet.
    4. Overlay / network-plugin-mode
      • Newer API versions introduce networkPluginMode with value overlay for Azure CNI overlay, but in the ARM schema this is a separate property:
        • networkPlugin: azure | kubenet | none
        • networkPluginMode: overlay
      • The CLI version on the “preview” machine is enforcing the new rule: if --pod-cidr is specified with Azure CNI, it expects overlay mode (network-plugin azure + network-plugin-mode overlay) or kubenet. However, the installed CLI version does not yet expose --network-plugin-mode overlay as a valid parameter, hence the confusing error.

    Given these constraints, for a 1,000-node cluster:

    • kubenet cannot be used because of the 400-node limit.
    • --pod-cidr should not be used with standard Azure CNI.
    • The correct supported configuration for 1,000 nodes today is:
      • Use --network-plugin azure (Azure CNI).
      • Remove --pod-cidr from the command.
      • Ensure the chosen subnet is large enough to avoid insufficientSubnetSize (plan IPs as nodes * maxPodsPerNode). If IP exhaustion occurs, the documented mitigation is to create a larger, non-overlapping subnet, move node pools, and delete the old subnet.

    The difference between the two machines is that the newer CLI / API combination is enforcing these newer rules more strictly, so the command that “worked” on the older CLI is not a valid or supported combination going forward.

    A minimal pattern for a 1,000-node cluster would therefore be:

    • Use Azure CNI (--network-plugin azure).
    • Do not specify --pod-cidr.
    • Keep --outbound-type managedNATGateway for large-scale egress (as recommended for large clusters).

    If overlay is desired, wait for a CLI version that supports --network-plugin-mode overlay and then follow the Azure CNI overlay guidance, but still do not use kubenet for 1,000 nodes.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.