Share via

Azure Cosmos DB for MongoDB vCore cluster stuck in "Updating" state for 11 days after scaling from M10 to M20

Alice da Lufia 0 Reputation points
2026-03-17T18:32:34.85+00:00

Resource type: Microsoft.DocumentDB/mongoClusters (Cosmos DB for MongoDB vCore)

Region: Brazil South

Issue:

On March 6, 2026, I initiated a scale-up operation from M10 to M20 compute tier via the Azure Portal. The operation was accepted, and the compute tier configuration now shows M20. However, both the clusterStatus and provisioningState have been stuck on Updating for 11 days and show no signs of progressing.

Current behavior:

properties.clusterStatus: Updating

properties.provisioningState: Updating

properties.compute.tier: M20

systemData.lastModifiedAt: 2026-03-06T00:49:20Z

• The cluster appears to be running and accepting connections normally.

• I am unable to perform any further scaling operations (up or down) or configuration changes because Azure blocks modifications while the resource is in Updating state.

• There are no entries in the Activity Log related to this resource for the affected time period.

• There are no stuck ARM deployments associated with this operation.

Steps taken:

  1. Verified the resource status via az resource show and az rest — both confirm the stuck state.
  2. Checked Activity Log — no operations logged for this resource since before the scaling attempt.
  3. Checked ARM deployments — no pending or failed deployments related to this scaling operation.
  4. Waited 11 days — the status has not changed.

Expected behavior:

The scaling operation should have completed (or failed with a clear error), and the cluster status should have returned to Ready with provisioningState set to Succeeded.

Questions:

  1. Is there a way to cancel or force-complete a stuck provisioning operation on a MongoDB vCore cluster without a support plan?
  2. Is this a known issue with MongoDB vCore scaling in the Brazil South region?
  3. Are there any CLI or REST API commands that can reset the provisioning state?

Environment: Azure CLI 2.x on Linux

Any help would be greatly appreciated. Since I don't have a paid support plan, I'm relying on community guidance to unblock this resource.

Azure Cosmos DB
Azure Cosmos DB

An Azure NoSQL database service for app development.


2 answers

Sort by: Most helpful
  1. Saraswathi Devadula 15,515 Reputation points Microsoft External Staff Moderator
    2026-03-18T14:03:24.21+00:00

    Hello **Alice da Lufia
    **Unfortunately there isn’t a public CLI/REST command you can run to “cancel” or “force-complete” a stuck scale operation, and there’s no known Brazil South–specific bug here. In most cases where a vCore cluster hangs in Updating, it’s because an underlying replica has run out of disk due to a runaway WAL or broken replication slot, which blocks the platform job. Here’s what you can try to unblock yourself:

    1. Check per-node disk usage:
      • In the Azure portal go to your Cosmos DB vCore cluster > Monitoring > Metrics > Storage used, then filter by individual replica.
      • If any replica is at or near 100% capacity, the cluster controller will keep retrying the update and never finish.
    2. Fix server-parameter misconfiguration:
      • In the portal, open Settings > Server parameters for your cluster or specific replica.
      • Set wal_keep_size to a modest value (for example, 1–5 GB).
      • If available, configure replication_slot_timeout to something reasonable.
      • Remove any unused replication slots.
    3. Free up disk space by deleting old WAL logs Use the az cosmosdb CLI extension to drop excess logs so the node can recover:
    az cosmosdb mongodb vcore replica delete-log --name --resource-group --replica-name --log-type wal 
    

    Repeat for each overloaded replica until storage used drops back into healthy range.

    1. Monitor and wait After you’ve shaved off disk usage and corrected the parameters, the platform’s background update job should be able to finish. Give it a few minutes and then run: az resource show --ids /subscriptions//resourceGroups//providers/Microsoft.DocumentDB/mongoClusters/

    If none of that helps, please share the cluster details in private message for further troubleshooting.

    0 comments No comments

  2. Q&A Assist
    2026-03-17T18:33:01.2666667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more
    1. There’s no supported way in CLI or ARM/REST to cancel or force-complete a stuck provisioning operation for a vCore-based Azure Cosmos DB for MongoDB cluster. Provisioning and scaling state (provisioningState, clusterStatus) are controlled by the service; client-side tools (Azure portal, CLI, REST) can only request changes such as updating the compute tier, not override or reset the internal state machine. For vCore-based MongoDB clusters, scaling is done by changing the compute tier on the cluster’s physical shards. This is exposed as an update operation on the cluster resource (for example, changing properties.compute.tier from M10 to M20 via the portal, az resource update, or REST). Once the service has accepted the update, the operation must complete or fail on the service side; there’s no documented API to roll back or clear a stuck Updating state from the client side. The only supported remediation when a control-plane operation is stuck is to wait for the service to transition the resource to a terminal state (Succeeded/Failed) and then either retry the operation or delete/recreate the resource. If the resource never leaves Updating, this requires intervention from Azure support.
    2. The provided material doesn’t document any known, region-specific issue with vCore-based Azure Cosmos DB for MongoDB scaling in Brazil South. The general guidance for stuck cluster operations in Azure is:
      • Check the current cluster status with the appropriate show/get command (for example, az aks show for AKS) and inspect the provisioning state.
      • If the cluster is actively updating, wait until the operation finishes.
      • If the update failed, inspect activity logs and then retry the operation.
      • If an operation (such as deletion) is in progress, no other operation is allowed until it completes.
      This pattern is described for AKS clusters, but the same control-plane behavior applies broadly across Azure resource types: while a resource is in an active update or delete operation, other operations are blocked, and there is no generic “cancel provisioning” API exposed to customers. There is no documentation in the supplied context that calls out MongoDB vCore cluster scaling in Brazil South as having a known systemic issue.
    3. There are no documented CLI or REST API commands that can directly reset provisioningState or clusterStatus for a vCore-based Azure Cosmos DB for MongoDB cluster. The supported operations for scaling cluster compute are:
      • Azure portal: use the Scale blade to change the cluster tier and save.
      • Azure CLI: update the cluster resource and set properties.compute.tier to the desired MXXX value:
             az resource update \
               --resource-type "Microsoft.DocumentDB/mongoClusters" \
               --name "<cluster-name>" \
               --resource-group "<resource-group>" \
               --set properties.compute.tier="<compute-tier>"
        
      • REST/az rest: send a PATCH to the cluster resource with the new compute.tier:
             az rest \
               --method "PATCH" \
               --url "https://management.azure.com/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/mongoClusters/<cluster-name>?api-version=2025-09-01" \
               --body "{\"location\":\"<cluster-region>\",\"properties\":{\"compute\":{\"tier\":\"<compute-tier>\"}}}"
        
      These operations request a change in compute tier; they do not expose any parameter to override or clear the provisioning state. Similarly, for other Azure services (for example, AKS), the guidance is to:
      • Use show commands (such as az aks show) to inspect ProvisioningState.
      • Wait for active updates to complete.
      • If an operation is stuck due to a transient issue, retry the same operation once the service transitions to a terminal state.
      There is no documented, supported API to directly manipulate provisioningState for Cosmos DB MongoDB vCore clusters.

    Given the constraints in the documentation, the only supported paths are:

    • Continue to monitor the resource to see if it eventually transitions to Succeeded or Failed, at which point another scale operation or a delete/recreate can be attempted.
    • Engage Azure support (even via basic support channels available with the subscription) so the service team can inspect and, if necessary, repair the stuck control-plane state from the backend.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.