How to get the GPU metrics for Virtual Machine

Question

How to get the GPU metrics for Virtual Machine

RajKumar Kannan 120

Hello Team,

I am trying to collect GPU utilization metrics for an Azure Virtual Machine. I understand that GPU metrics are not available as platform metrics and may require Azure Monitor Agent and Log Analytics configuration.

Could you please clarify:

The correct approach to collect GPU utilization metrics for a GPU-enabled VM (NC/ND/NV series).
Whether these metrics should appear in the InsightsMetrics table or any other table.
Any required configuration steps (Azure Monitor Agent, Data Collection Rules, GPU drivers, etc.).
The exact metric names or performance counters for GPU utilization and memory.

Currently, I am able to retrieve CPU, disk, and other metrics, but I am not seeing any GPU-related metrics.

Please provide official guidance or documentation to enable GPU monitoring.

Thank you.

Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-07T14:10:05.9466667+00:00
Hello @RajKumar Kannan Thank you for posting your query on Microsoft Q&A platform.

Thanks for your question. This is a valid concern and the behavior you are seeing is expected with Azure GPU‑enabled virtual machines.

Azure does not expose GPU utilization or GPU memory as platform (host) metrics for virtual machines (including NC, ND, and NV series). Only CPU, disk, network, and similar host‑level metrics are available by default.

GPU metrics must be collected from inside the guest OS using guest‑based monitoring.

This is by design in Azure Monitor.

Below is the reason why GPU metrics don’t appear by default:

Azure Monitor separates VM monitoring into:

Platform metrics – collected by Azure automatically (CPU, disk, network)

Guest metrics – collected from inside the VM using Azure Monitor Agent (AMA)

GPU telemetry is only available inside the guest OS through NVIDIA drivers and tooling. Azure Monitor will not collect it unless you explicitly configure guest‑level data collection.

Reference: https://dotnet.territoriali.olinfo.it/en-us/azure/azure-monitor/vm/data-collection-performance

Please have a look into below correct and supported way to collect GPU metrics:

Prerequisites (mandatory):

GPU VM (NC / ND / NV series)

NVIDIA GPU drivers installed (Microsoft recommends using GPU‑optimized marketplace images)

For Linux GPU VMs, Microsoft and NVIDIA support collecting GPU metrics using:

NVIDIA DCGM (Data Center GPU Manager) to expose GPU metrics

Azure Monitor Agent (AMA) or a Prometheus‑compatible collector to ingest them

DCGM exposes GPU utilization, memory usage, and other metrics from inside the VM.

Official reference describing this architecture and Azure ingestion:

https://www.ibm.com/docs/en/tarm/8.15.x?topic=resources-azure-vm-gpu-metrics-collection

For Windows GPU VMs:

Azure Monitor Agent must be installed

Custom Data Collection Rules (DCRs) are required

GPU metrics are collected only if NVIDIA drivers expose counters

Alternatively, tools like nvidia-smi combined with a collector can be used

Azure Monitor does not auto‑discover GPU counters on Windows.

Microsoft DCR documentation:

https://dotnet.territoriali.olinfo.it/azure/azure-monitor/vm/data-collection-performance.

Where GPU metrics appear depends entirely on how they are collected. When VM Insights is enabled with the default configuration, GPU metrics are not included and will not appear anywhere, as VM Insights only collects a predefined set of guest metrics. When Azure Monitor Agent (AMA) is used with custom performance counters or Data Collection Rules, GPU metrics if exposed by the GPU driver are ingested into the Perf table in Log Analytics. When NVIDIA DCGM or other custom GPU collectors are configured, the GPU data is also written to the Perf table or a custom Log Analytics table, depending on how ingestion is set up. If Telegraf is used to send GPU metrics to Azure Monitor Metrics, those GPU metrics appear in Metrics Explorer under a custom namespace (for example, a Telegraf or NVIDIA-related namespace), rather than in VM Insights or platform metrics.

GPU metrics do not appear in InsightsMetrics by default.

VM Insights only includes a predefined set of guest metrics:

Reference: https://dotnet.territoriali.olinfo.it/azure/azure-monitor/vm/monitor-virtual-machine-data-collection.

Example of GPU metric names when using NVIDIA DCGM.

These are vendor‑defined metrics exposed inside the VM:

DCGM_FI_DEV_GPU_UTIL – GPU utilization (%)

DCGM_FI_DEV_FB_USED – GPU memory used (MiB)

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE – Tensor core activity

Reference: https://www.ibm.com/docs/en/tarm/8.15.x?topic=resources-azure-vm-gpu-metrics-collection

You see CPU/Disk but not GPU metrics which is expected when:

Only platform metrics are enabled

VM Insights is enabled without GPU‑specific configuration

Azure Monitor Agent is installed but no GPU exporter, counters, or scripts are configured

Azure intentionally requires explicit guest configuration for GPU telemetry.

So, GPU utilization and GPU memory metrics are not available as Azure platform metrics for NC/ND/NV virtual machines. To collect GPU metrics, guest‑based monitoring must be configured using NVIDIA GPU drivers and tools such as DCGM or equivalent, together with Azure Monitor Agent and Data Collection Rules. These metrics are collected as guest performance data and do not appear in VM Insights or InsightsMetrics by default.

Thanks,

Suchitra.
Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-07T14:10:44.18+00:00

Hello @RajKumar Kannan

Kindly let us know if the solution provided worked for you.

If you need any further assistance, please feel free to reach out.

If you found the comment helpful, please consider clicking "Upvote it".

Thanks,

Suchitra.

1 answer

Your answer

Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-07T14:10:44.18+00:00

Hello @RajKumar Kannan

Kindly let us know if the solution provided worked for you.

If you need any further assistance, please feel free to reach out.

If you found the comment helpful, please consider clicking "Upvote it".

Thanks,

Suchitra.

Answer 1

Yep - To enable GPU monitoring for Azure VMs (NC, ND, NV series), you need to implement a guest-based collection strategy as these are not available as standard host-level platform metrics.

Primary documentation/approach

The current official guidance for monitoring NVIDIA GPUs on Azure involves using the Azure Monitor Agent (AMA) and NVIDIA DCGM Exporter.

Linux VMs: The standard recommended path is to use the NVIDIA DCGM Exporter to expose metrics, which can then be scraped by the Azure Monitor Agent. Alternatively, Microsoft provides a comprehensive guide on using Telegraf with the Azure Monitor output plugin.
Windows VMs: You must configure Data Collection Rules (DCRs) to ingest specific GPU performance counters if the drivers expose them to the Windows Performance Monitor.

Log Analytics tables

Depending on your collection method, metrics will populate different tables:

Perf Table: Standard performance counters (like CPU and Memory) and custom counters collected via DCRs appear here.
InsightsMetrics Table: While used by VM Insights for standard metrics, custom GPU metrics often require a separate namespace (e.g., Telegraf/nvidia-smi).
Azure Monitor Metrics: Metrics sent via the Telegraf plugin can be viewed in the Metrics Explorer under the telegraf/nvidia-smi namespace.

Implementation steps
Driver Installation: Ensure the latest NVIDIA GPU Drivers are installed. Using the NVIDIA GPU-Optimized VMI is often the easiest starting point.
Enable VM Insights: This installs the Azure Monitor Agent and creates a default Data Collection Rule.
Deploy DCGM Exporter (Linux): Run the exporter as a service to translate GPU telemetry into a format the agent can read.
Configure Custom DCR: For specific counters not in the default set, create a new Data Collection Rule to capture additional performance metrics.
Key metric names

Using the NVIDIA DCGM/Telegraf method, you can track:

gpu_utilization: Percentage of time the kernels were active.
gpu_memory_used: Current framebuffer memory in use.
gpu_temperature: Current core temperature.
gpu_power_usage: Real-time power draw in Watts.

More at https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/comprehensive-nvidia-gpu-monitoring-for-azure-n-series-vms-using-telegraf-with-a/4257402

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Share via

How to get the GPU metrics for Virtual Machine

1 answer

Your answer