Share via

How to get the GPU metrics for Virtual Machine

RajKumar Kannan 120 Reputation points
2026-04-07T12:28:41.4533333+00:00

Hello Team,

I am trying to collect GPU utilization metrics for an Azure Virtual Machine. I understand that GPU metrics are not available as platform metrics and may require Azure Monitor Agent and Log Analytics configuration.

Could you please clarify:

  1. The correct approach to collect GPU utilization metrics for a GPU-enabled VM (NC/ND/NV series).
  2. Whether these metrics should appear in the InsightsMetrics table or any other table.
  3. Any required configuration steps (Azure Monitor Agent, Data Collection Rules, GPU drivers, etc.).
  4. The exact metric names or performance counters for GPU utilization and memory.

Currently, I am able to retrieve CPU, disk, and other metrics, but I am not seeing any GPU-related metrics.

Please provide official guidance or documentation to enable GPU monitoring.

Thank you.

Azure Monitor
Azure Monitor

An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.


1 answer

Sort by: Most helpful
  1. Marcin Policht 85,250 Reputation points MVP Volunteer Moderator
    2026-04-07T13:19:09.16+00:00

    Yep - To enable GPU monitoring for Azure VMs (NC, ND, NV series), you need to implement a guest-based collection strategy as these are not available as standard host-level platform metrics.

    1. Primary documentation/approach

    The current official guidance for monitoring NVIDIA GPUs on Azure involves using the Azure Monitor Agent (AMA) and NVIDIA DCGM Exporter.

    • Linux VMs: The standard recommended path is to use the NVIDIA DCGM Exporter to expose metrics, which can then be scraped by the Azure Monitor Agent. Alternatively, Microsoft provides a comprehensive guide on using Telegraf with the Azure Monitor output plugin.
    • Windows VMs: You must configure Data Collection Rules (DCRs) to ingest specific GPU performance counters if the drivers expose them to the Windows Performance Monitor.
    1. Log Analytics tables

    Depending on your collection method, metrics will populate different tables:

    • Perf Table: Standard performance counters (like CPU and Memory) and custom counters collected via DCRs appear here.
    • InsightsMetrics Table: While used by VM Insights for standard metrics, custom GPU metrics often require a separate namespace (e.g., Telegraf/nvidia-smi).
    • Azure Monitor Metrics: Metrics sent via the Telegraf plugin can be viewed in the Metrics Explorer under the telegraf/nvidia-smi namespace.
    1. Implementation steps
    2. Driver Installation: Ensure the latest NVIDIA GPU Drivers are installed. Using the NVIDIA GPU-Optimized VMI is often the easiest starting point.
    3. Enable VM Insights: This installs the Azure Monitor Agent and creates a default Data Collection Rule.
    4. Deploy DCGM Exporter (Linux): Run the exporter as a service to translate GPU telemetry into a format the agent can read.
    5. Configure Custom DCR: For specific counters not in the default set, create a new Data Collection Rule to capture additional performance metrics.
    6. Key metric names

    Using the NVIDIA DCGM/Telegraf method, you can track:

    • gpu_utilization: Percentage of time the kernels were active.
    • gpu_memory_used: Current framebuffer memory in use.
    • gpu_temperature: Current core temperature.
    • gpu_power_usage: Real-time power draw in Watts.

    More at https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/comprehensive-nvidia-gpu-monitoring-for-azure-n-series-vms-using-telegraf-with-a/4257402


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.