Azure Kubernetes Fleet Manager: L4 Load Balancing (MCS) Traffic Failing for Specific Member Cluster

Question

Azure Kubernetes Fleet Manager: L4 Load Balancing (MCS) Traffic Failing for Specific Member Cluster

김선교 0

I'm currently testing the Multi-cluster L4 load balancing using Azure Kubernetes Fleet Manager. While the setup is partially working, I'm encountering a connectivity issue with one of the member clusters.

1. Environment Setup

Fleet Manager: Hub-cluster enabled.

Member Clusters: * aks-member-1: Private AKS (Same VNet, Azure CNI) - Working

aks-member-2: Public AKS (Same VNet, Azure CNI) - Failing

Region: Both in the same region.

Configuration: ServiceExport and ClusterResourcePlacement (CRP) are applied and verified.

2. The Issue

I’ve successfully deployed the workload and created a MultiClusterService (MCS). I received a valid EXTERNAL-IP on the member cluster.

Result: When I curl the External IP, I get responses from Pods in aks-member-1.

Problem: Connections to Pods in aks-member-2 consistently result in a Failed to connect or Timeout.

3. Current Status & Observations

Both ServiceExport resources show IS-VALID: True and IS-CONFLICTED: False.

The ProvisioningState for both members in the Fleet is Succeeded.

Internal Pod IPs are reachable within the VNet, but the MCS Load Balancer doesn't seem to route traffic correctly to the aks-member-2 endpoints.

4. Questions

Sample Image Issue: The official guide () uses gcr.io/kuar-demo/kuard-amd64:blue, which seems to be outdated or unavailable. Are there any recommended MCR-based alternative images for testing Fleet MCS?

Network Requirements: When mixing Private and Public AKS as Fleet members in the same VNet, are there specific NSG (Network Security Group) rules or route table configurations required for the Fleet Load Balancer to reach the Public AKS pods?

Troubleshooting: Is there a way to verify if the Hub's ServiceImport has successfully aggregated all endpoints from both member clusters?

Any insights or similar experiences would be greatly appreciated.I'm currently testing the Multi-cluster L4 load balancing using Azure Kubernetes Fleet Manager. While the setup is partially working, I'm encountering a connectivity issue with one of the member clusters.

1. Environment Setup

Fleet Manager: Hub-cluster enabled.

Member Clusters: * aks-member-1: Private AKS (Same VNet, Azure CNI) - Working

aks-member-2: Public AKS (Same VNet, Azure CNI) - Failing

Region: Both in the same region.

Configuration: ServiceExport and ClusterResourcePlacement (CRP) are applied and verified.

2. The Issue

I’ve successfully deployed the workload and created a MultiClusterService (MCS). I received a valid EXTERNAL-IP on the member cluster.

Result: When I curl the External IP, I get responses from Pods in aks-member-1.

Problem: Connections to Pods in aks-member-2 consistently result in a Failed to connect or Timeout.

3. Current Status & Observations

Both ServiceExport resources show IS-VALID: True and IS-CONFLICTED: False.

The ProvisioningState for both members in the Fleet is Succeeded.

Internal Pod IPs are reachable within the VNet, but the MCS Load Balancer doesn't seem to route traffic correctly to the aks-member-2 endpoints.

4. Questions

Sample Image Issue: The official guide () uses gcr.io/kuar-demo/kuard-amd64:blue, which seems to be outdated or unavailable. Are there any recommended MCR-based alternative images for testing Fleet MCS?

Network Requirements: When mixing Private and Public AKS as Fleet members in the same VNet, are there specific NSG (Network Security Group) rules or route table configurations required for the Fleet Load Balancer to reach the Public AKS pods?

Troubleshooting: Is there a way to verify if the Hub's ServiceImport has successfully aggregated all endpoints from both member clusters?

Any insights or similar experiences would be greatly appreciated.

0 comments

1 answer

Your answer

Answer 1

Multi-cluster L4 load balancing with Fleet Manager relies on a few specific primitives and network assumptions that can be used to troubleshoot this scenario.

Sample image The documentation’s walkthrough uses a simple HTTP workload (kuard) only as an example to demonstrate that traffic is being distributed across pods in multiple member clusters. Any HTTP container image that exposes a TCP port can be used instead, as long as:

The same Service name, namespace, and port are used on all member clusters.
The ServiceExport and MultiClusterService objects reference that Service.

The docs don’t prescribe a Microsoft Container Registry (MCR) image for this scenario; they only require that the workload be reachable on the configured port so that the curl <EXTERNAL-IP>:<port> test described in the guide works.

Network requirements (private + public AKS in same VNet) From the conceptual and how-to documentation:

All member clusters that participate in L4 multi-cluster load balancing must:
- Use Azure CNI networking so that pod IPs are directly routable on the VNet.
- Be on the same virtual network or on peered VNets.
Fleet’s L4 capability then configures each member cluster’s Azure Load Balancer to route traffic not only to local endpoints but also to endpoints of the same Service in other member clusters.

The documentation does not add extra NSG or route-table requirements beyond those implied by “same VNet / peered VNets with Azure CNI.” In particular:

Pod IPs must be reachable over the VNet between clusters.
Azure Load Balancer must be able to reach those pod IPs.

When mixing private and public AKS clusters, this means:

Ensure that any NSGs applied to the subnets or NICs of the public cluster do not block inbound traffic from the private cluster’s subnet(s) to the pod CIDR ranges used by that public cluster.
Ensure that any custom route tables do not blackhole traffic between the clusters’ subnets.

The Fleet documentation does not define additional, Fleet-specific NSG or UDR rules beyond these general Azure CNI/VNet requirements.

Verifying that ServiceImport has aggregated endpoints from both clusters The L4 flow is:

ServiceExport is created on each member cluster.
Fleet creates a corresponding ServiceImport on the hub cluster and other member clusters to build awareness of the service.
A MultiClusterService is created to configure the Azure Load Balancer in each member cluster to distribute traffic across endpoints in multiple clusters.

The how-to guide shows a basic validation step:

On a member cluster, run kubectl get multiclusterservice <name> -n <namespace> and confirm IS-VALID is true and that an EXTERNAL-IP is assigned.
Then repeatedly curl <EXTERNAL-IP>:<port> and observe that the pod IPs serving the request change and correspond to pods in multiple member clusters.

In addition, the conceptual documentation explains that ServiceExport creation “results in a ServiceImport being created on the fleet cluster, and all other member clusters to build the awareness of the service.” This means that, for troubleshooting aggregation:

Confirm that a ServiceImport object exists for the service on the hub cluster.
Confirm that ServiceImport objects exist on the member clusters that should be aware of the service.

The documentation does not provide a specific field-level example of the ServiceImport status or an explicit command to list per-cluster endpoint membership, but the presence of ServiceImport on hub and members, combined with IS-VALID: True on the MultiClusterService and successful cross-cluster pod IP rotation when curling the external IP, is the documented way to validate that endpoints are being aggregated and used.

References:

Share via

Azure Kubernetes Fleet Manager: L4 Load Balancing (MCS) Traffic Failing for Specific Member Cluster

1. Environment Setup

2. The Issue

3. Current Status & Observations

4. Questions

1. Environment Setup

2. The Issue

3. Current Status & Observations

4. Questions

1 answer

Your answer