Windows 365 for Agents MCP server reference (preview)

Important

  • This is a preview feature.
  • Preview features aren't meant for production use and might have restricted functionality. These features are subject to supplemental terms of use, and are available before an official release so that customers can get early access and provide feedback.

Windows 365 for Agents is an MCP server for full operational control of a Windows 365 cloud PC. Use this MCP server to drive a real Windows environment through desktop interaction (mouse, keyboard, screen capture, command execution), browser automation via Microsoft Edge, and semantic UI inspection via Windows UI Automation.

Note

  • Existing connections that use previous versions of Microsoft MCP servers remain supported.
  • For all new connections, use the latest Windows 365 Agents MCP server, which exposes tools across desktop, browser, and accessibility capabilities.
  • Browser automation operates on Microsoft Edge. Edge launches automatically on the first browser tool call. focus_browser can also target Chrome or Firefox, but DOM-level browser tools only operate on the Edge instance.

To learn more about Windows 365 for Agents, see Windows 365 for Agents documentation.

Overview

Server ID Tenant-level URL Display name Description
mcp_W365AServer https://agent365.svc.cloud.microsoft/
agents/tenants/{tenantId}/
servers/mcp_W365AServer
Windows 365 for Agents MCP server Full operational control of a Windows 365 cloud PC, including desktop interaction, browser automation, and UI inspection.

Available tools

mcp_desktop_move_mouse

Move the cursor to a screen position. Use mcp_desktop_click instead if you intend to click at the destination. Required parameters:

  • x: X coordinate in screen pixels
  • y: Y coordinate in screen pixels

mcp_desktop_click

Click at a position, or at the current cursor location if coordinates are omitted. Supports single-click, double-click, and all five mouse buttons.

Optional parameters:

  • x: X coordinate in screen pixels (omit for current position)
  • y: Y coordinate in screen pixels (omit for current position)
  • button: Left, Right, Middle, Forward, or Backward (default Left)
  • clickCount: 1 = single click, 2 = double click (default 1)

mcp_desktop_get_cursor_position

Return the current cursor coordinates. No parameters. Returns {cursorX, cursorY}.

mcp_desktop_drag_mouse

Drag from one position to another. Useful for moving objects, resizing windows, or pixel-precise scrolling. Required parameters:

  • startX: Start X coordinate.
  • startY: Start Y coordinate.
  • endX: End X coordinate.
  • endY: End Y coordinate. Optional parameters:
  • button: Left, Right, or Middle (default is Left)

mcp_desktop_scroll

Scroll at a position using notch units (not pixels). Three notches is approximately one page.

Required parameters:

  • x: Scroll position X
  • y: Scroll position Y

Optional parameters:

  • deltaX: Horizontal notches, positive = right (default 0)
  • deltaY: Vertical notches, positive = down (default 0)

Note

Values are clamped to the range [-20, 20].

mcp_desktop_type_text

Type text via keyboard simulation. For keyboard shortcuts, use mcp_desktop_press_keys. For web form fields, use mcp_browser_type.

Required parameters:

  • text: Text to type

mcp_desktop_press_keys

Press a key combination simultaneously. Supports modifier keys, function keys, and standard keys.

Required parameters:

  • keys: Array of key names to press together (for example, ["ctrl","c"], ["alt","tab"], ["ctrl","shift","s"])

mcp_desktop_take_screenshot

Capture the full screen or a cropped region as a PNG image (base64-encoded).

Optional parameters:

  • x: Crop region left edge
  • y: Crop region top edge
  • width: Crop region width
  • height: Crop region height

Note

Provide all four crop parameters together, or omit all four for a full-screen capture.

mcp_desktop_zoom_region

Capture a screen region at native resolution as a PNG image (base64-encoded). Use this to inspect small text or dense UI that's hard to read in a downscaled full-screen screenshot.

Required parameters:

  • x: Left edge X coordinate in screen pixels
  • y: Top edge Y coordinate in screen pixels
  • width: Region width in pixels
  • height: Region height in pixels

Note

Maximum region size is 1920x1080 pixels.

mcp_desktop_analyze_screen

Perform OCR on the entire screen. No parameters. Returns {fullText, averageConfidence, boxes[{text, confidence, x, y, width, height}], width, height}.

mcp_desktop_get_screen_size

Return the screen resolution. No parameters. Returns {width, height}.

mcp_desktop_list_windows

List all visible windows with their titles, positions, and dimensions. No parameters. Returns an array of {title, processName, handle, x, y, width, height}.

mcp_desktop_activate_window

Bring a window to the foreground using a fuzzy title match.

Required parameters:

  • titlePattern: Partial window title (case-insensitive substring)

mcp_desktop_focus_browser

Focus a browser window (Edge, Chrome, or Firefox), optionally filtered by URL or title.

Optional parameters:

  • pattern: URL or title substring to match (omit for any browser window)

mcp_desktop_close_window

Gracefully close a window by fuzzy title match. System-critical processes are protected and cannot be closed.

Required parameters:

  • titlePattern: Partial window title (80% match threshold). Returns {matchedTitle, processName, closed}.

mcp_desktop_resize_window

Resize, move, maximize, minimize, or restore a window using a fuzzy title match.

Required parameters:

  • title: Window title to match (case-insensitive fuzzy match)
  • action: Action to perform — Resize, Move, Maximize, Minimize, or Restore

Optional parameters:

  • x: Left edge X coordinate (used with Resize or Move)
  • y: Top edge Y coordinate (used with Resize or Move)
  • width: Width in pixels (used with Resize)
  • height: Height in pixels (used with Resize)

mcp_desktop_execute_shell_command

Run a shell command in a sandboxed environment. Commands are validated against an allow list and dangerous patterns are blocked.

Required parameters:

  • command: Command to execute

Optional parameters:

  • cwd: Working directory. Use forward slashes (for example, C:/Users/me/project).
  • timeoutMs: Timeout in milliseconds (default 30000, max 30000)

Note

  • Allowed commands: git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type, and notepad.
  • Blocked patterns include shell metacharacters (|, ;, &, <, >), environment variable expansion (%VAR%), interpreter eval flags (python -c or node -e), git config --global, npm -g, path-prefixed executables, rm -rf, sudo, and disk/system commands.
  • stdout and stderr are each truncated at 32 KB. Use mcp_desktop_execute_python_code for arbitrary computation. Returns {stdout, stderr, exitCode, success, timedOut, resourceLimitsApplied}.

mcp_desktop_execute_python_code

Execute Python code in a sandboxed environment with resource limits. Ideal for data processing, calculations, file I/O, and any computation that goes beyond simple shell commands.

Required parameters:

  • code: Python code (max 262,144 characters).

Optional parameters:

  • cwd: Working directory. Use forward slashes.
  • timeoutMs: Timeout in milliseconds (default 30000, max 30000).

Returns the same schema as mcp_desktop_execute_shell_command.

Note

The sandbox enforces a 512 MB memory limit and a 30-second timeout.

mcp_desktop_wait_milliseconds

Pause execution to allow animations or transitions to complete. Do not use in polling loops—use mcp_browser_wait_for for DOM polling.

Required parameters:

  • ms: Wait duration in milliseconds (clamped to [0, 5000])

mcp_desktop_clipboard_read

Read the current content of the system clipboard. No parameters. Returns a JSON object describing the clipboard format and payload — either a text string or a base64-encoded image.

mcp_desktop_clipboard_write

Write text to the system clipboard, replacing the current content.

Required parameters:

  • text: Text to write to the clipboard

Returns a confirmation including the character count.

mcp_desktop_list_processes

List running processes in the current session. Each entry includes the PID, process name, memory usage, window title (if any), and startTimeTicks. Pair startTimeTicks with mcp_desktop_kill_process to prevent killing a recycled PID.

Optional parameters:

  • maxCount: Maximum number of processes to return (default 200)

Returns a JSON array of process info objects.

mcp_desktop_kill_process

Terminate a process by PID. The startTime value from mcp_desktop_list_processes must be supplied to guard against PID recycling.

Required parameters:

  • pid: Process ID returned by mcp_desktop_list_processes
  • startTime: Process start time ticks returned by mcp_desktop_list_processes

Optional parameters:

  • force: Force-kill without a graceful shutdown (default false)

Returns a JSON result describing the outcome.

mcp_desktop_launch_application

Launch a GUI application from an allowed directory. Use mcp_desktop_execute_shell_command for CLI commands instead.

Required parameters:

  • path: Absolute path to the executable. Use forward slashes (for example, C:/Program Files/app.exe).

Optional parameters:

  • args: Array of command-line arguments

Returns {path, pid}.

mcp_desktop_get_system_info

Return OS version, CPU, RAM, available disk space, and display resolution. No parameters. Returns a JSON object containing the system information.

mcp_browser_navigate

Navigate to a URL and wait for the page to load.

Required parameters:

  • url: Full URL including protocol (for example, https://example.com)

mcp_browser_back

Navigate back in browser history. No parameters.

mcp_browser_forward

Navigate forward in browser history. No parameters.

mcp_browser_reload

Reload the current page. No parameters.

mcp_browser_get_url

Return the current page URL as a plain string. No parameters.

mcp_browser_get_title

Return the current page title as a plain string. No parameters.

mcp_browser_get_text

Return the visible page text content as a plain string. No parameters. Truncated at 512 KB.

mcp_browser_get_html

Return the full page HTML source as a plain string. No parameters. Truncated at 512 KB.

mcp_browser_click

Click a DOM element by CSS selector. More reliable than coordinate-based clicking for web content.

Required parameters:

  • selector: CSS selector (for example, #submit-btn or a.nav-link)

mcp_browser_type

Type text into a form element by CSS selector.

Required parameters:

  • selector: CSS selector of the input element.
  • text: Text to type

mcp_browser_query_text

Get the text content of the first element matching a CSS selector.

Required parameters:

  • selector: CSS selector

mcp_browser_wait_for

Wait for a DOM element to appear. Useful for dynamic content that loads asynchronously.

Required parameters:

  • selector: CSS selector to wait for

Optional parameters:

  • timeoutMs: Timeout in milliseconds (default 5000, max 30000)

mcp_browser_eval_js

Evaluate a JavaScript expression in the page context and return the result as a string.

Required parameters:

  • expression: JavaScript expression that returns a string

Note

If your expression returns an object or number, convert it to a string explicitly (for example, JSON.stringify(obj) or .toString()).

mcp_browser_list_tabs

List all open tabs with their index, title, and URL. No parameters. Returns an array of {index, title, url}.

mcp_browser_switch_tab

Switch to a tab by index.

Required parameters:

  • tabIndex: 0-based tab index

mcp_browser_new_tab

Open a new tab, optionally navigating to a URL.

Optional parameters:

  • url: URL to open (blank tab if omitted)

Returns {index, title, url}.

mcp_browser_close_tab

Close a tab by index.

Required parameters:

  • tabIndex: 0-based tab index

mcp_browser_screenshot

Capture a PNG screenshot of the browser viewport only (not the full screen). No parameters. Returns a base64-encoded PNG.

mcp_browser_select_option

Select one or more options in a <select> element by their value attribute.

Required parameters:

  • selector: CSS selector for the <select> element
  • values: Array of option value(s) to select

Returns a confirmation with the count of selected options.

mcp_browser_fill_form

Fill multiple form fields in a single call. Each entry is a {selector, value} pair. Stops on first failure and reports which fields succeeded.

Required parameters:

  • fields: Array of {selector, value} pairs

Returns a confirmation with the count of filled fields.

mcp_browser_drag

Drag a source element onto a target element. Both elements are identified by CSS selector.

Required parameters:

  • sourceSelector: CSS selector of the drag source
  • targetSelector: CSS selector of the drop target

mcp_browser_pdf_save

Save the current page as a PDF file. Destination paths are restricted to %USERPROFILE% or %TEMP%.

Required parameters:

  • filePath: Destination file path under %USERPROFILE% or %TEMP%. Use forward slashes.

Returns a confirmation including the saved file path.

mcp_browser_handle_dialog

Accept or dismiss a pending browser dialog (alert, confirm, prompt, or beforeunload). Returns "No dialog pending" if no dialog is active.

Required parameters:

  • action: accept or dismiss

Optional parameters:

  • promptText: Text to supply to a prompt dialog (ignored for alert and confirm)

mcp_browser_snapshot

Capture the page's accessibility tree with stable ref IDs (for example, e5) that map to DOM nodes. Use the refs with mcp_browser_click_ref, mcp_browser_type_ref, and mcp_browser_hover_ref. Refs expire when the page navigates — retake a snapshot after navigation.

Optional parameters:

  • maxDepth: Maximum tree depth, 1-10 (default 5)
  • includeIframes: Include cross-origin iframes (default true)

Returns a JSON object containing the accessibility snapshot and ref IDs.

mcp_browser_click_ref

Click an element by ref ID from mcp_browser_snapshot. A hit-test verifies that no other element overlays the target. Fails if the snapshot has expired — retake the snapshot in that case.

Required parameters:

  • snapshotId: Snapshot ID returned by mcp_browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes

Optional parameters:

  • button: Left, Right, or Middle (default Left)
  • clickCount: 1 = single click, 2 = double click (default 1)

Returns a confirmation including the clicked coordinates.

mcp_browser_type_ref

Type text into an element by ref ID from mcp_browser_snapshot. The element is focused first, and existing text is cleared by default. Fails if the snapshot has expired.

Required parameters:

  • snapshotId: Snapshot ID returned by mcp_browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes
  • text: Text to type

Optional parameters:

  • clear: Clear existing text first (default true)

Returns a confirmation including the character count.

mcp_browser_hover_ref

Hover over an element by ref ID from mcp_browser_snapshot. Returns immediately. Fails if the snapshot has expired — retake the snapshot in that case.

Required parameters:

  • snapshotId: Snapshot ID returned by mcp_browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes

Returns a confirmation including the hover coordinates.

mcp_accessibility_get_accessibility_tree

Retrieve the UI element tree for the foreground window. Each element includes its role, name, value, and screen coordinates.

Optional parameters:

  • maxDepth: Maximum tree traversal depth, 1-10 (default 3)
  • maxElements: Maximum elements to return, 1-2000 (default 500)

Returns a hierarchical tree of {role, name, value, x, y, width, height, children[...]}.

mcp_accessibility_find_ui_element

Search for UI elements by text content, accessibility role, or name (case-insensitive substring). Returns matching elements with their clickable screen coordinates.

Optional parameters:

  • text: Text to search for (used as name if name omitted)
  • role: UI role filter — Button, TextBox, CheckBox, MenuItem, ComboBox, and more
  • name: Accessible name (takes precedence over text if both provided)
  • windowHandle: Target window handle (null = foreground window)

Key features

Desktop interaction

  • Click, double-click, right-click, and five-button mouse control.
  • Pixel-precise drag and drop.
  • Notch-based scrolling (three notches ≈ one page).
  • Keyboard typing and multi-key shortcut combos.
  • Cursor position tracking.
  • Screen resolution detection.

Screen capture and analysis

  • Full-screen or cropped PNG screenshots.
  • OCR of the full screen with per-region confidence scores and bounding boxes.
  • Browser-viewport-only screenshots for web content.

Window management

  • Enumerate all visible windows with positions and dimensions.
  • Activate windows by fuzzy title match.
  • Focus browser windows (Edge, Chrome, Firefox) optionally filtered by URL or title.
  • Graceful window close with protection for system-critical processes.

Command execution

  • Sandboxed shell commands with an allow list (git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type).
  • Sandboxed Python execution up to 262,144 characters of code.
  • Working-directory and per-call timeout control (max 30 seconds).
  • Resource limits and hardened block list against shell metacharacters, eval flags, privilege escalation, and destructive operations.

Browser automation

  • Navigate, back, forward, reload.
  • Read pageURL, title, visible text (512 KB cap), and full HTML (512 KB cap).
  • DOM-level click, type, and text query by CSS selector.
  • Wait for dynamic elements with configurable timeout.
  • Evaluate JavaScript expressions in the page context.
  • Multi-tab management: list, switch, open, close.
  • Runs on Microsoft Edge, launched automatically on first use.

UI accessibility

  • Retrieve the Windows UI Automation tree for the foreground window with configurable depth and element count.
  • Find UI elements by text, role, or accessible name.
  • Returns clickable screen coordinates for precise targeting of buttons, text boxes, checkboxes, menu items, and combo boxes.

Timing and synchronization

  • Short one-shot pauses via mcp_desktop_wait_milliseconds (max five seconds).
  • DOM-level polling via mcp_browser_wait_for (max 30 seconds).

Notes

  • All coordinates are in screen pixels with (0,0) at the top-left corner. Coordinates from mcp_desktop_take_screenshot, mcp_desktop_analyze_screen, mcp_accessibility_find_ui_element, and mcp_desktop_list_windows all share the same coordinate space.
  • A cursor failsafe is active: If the cursor moves within five pixels of any screen corner, mouse operations are cancelled. Avoid targeting the extreme edges of the screen.
  • Shell pipe operators (|), semicolons (;), ampersands (&), and output redirection (>, <) are blocked. To transform command output, capture it and process it with mcp_desktop_execute_python_code.
  • If interpreter eval flags are blocked or if python -c "..." and node -e "..." are rejected, you can use mcp_desktop_execute_python_code for Python code, or write code to a file first.
  • Command stdout/stderr is truncated at 32 KB each. Use flags to limit verbose output (for example, git log --oneline -20) or redirect to a file and read it separately.
  • Maximum timeout for mcp_desktop_execute_shell_command and mcp_desktop_execute_python_code is 30 seconds. For longer work, break it into smaller steps or launch a background process from Python and poll.
  • There is no dedicated file read/write tool. Read files with mcp_desktop_execute_shell_command using the type command; write files with mcp_desktop_execute_python_code using Python's built-in file I/O. Shell output redirection (>, >>) is blocked.
  • mcp_browser_eval_js always returns a string. Convert objects or numbers explicitly before returning.
  • Browser DOM tools (mcp_browser_click, mcp_browser_type, mcp_browser_eval_js, etc.) operate only on the Microsoft Edge instance. mcp_desktop_focus_browser can focus Chrome or Firefox windows, but DOM tools will not target them.
  • mcp_desktop_take_screenshot requires all four crop parameters (x, y, width, height) together, or none for a full-screen capture.
  • mcp_desktop_scroll uses notch units (clamped to [-20, 20]), not pixels. Three notches is approximately one page.
  • mcp_accessibility_find_ui_element requires at least one of text, role, or name. When both text and name are provided, name takes precedence.

Common use cases

Fill out a web form

  • Call mcp_browser_navigate to open the target page.
  • Call mcp_browser_wait_for to wait for the form to load.
  • Call mcp_browser_type to fill each field by CSS selector.
  • Call mcp_browser_click to submit the form.
  • Call mcp_browser_wait_for to wait for the confirmation element.
  • Call mcp_browser_get_text to read and verify the result.

Automate a desktop application

  • Call mcp_desktop_activate_window to bring the application to the foreground.
  • Call mcp_desktop_take_screenshot to capture the current state.
  • Call mcp_accessibility_find_ui_element to locate a button or field by name.
  • Call mcp_desktop_click on the element's reported coordinates.
  • Call mcp_desktop_type_text to enter data.
  • Call mcp_desktop_press_keys for shortcuts (for example, ["ctrl","s"] to save).
  • Call mcp_desktop_take_screenshot to verify the result.

Extract data from a web page

  • Call mcp_browser_navigate to open the page.
  • Call mcp_browser_get_text to extract visible text content.
  • Call mcp_desktop_execute_python_code to parse and process the extracted data.
  • Call mcp_browser_eval_js to query specific values via JavaScript when text extraction isn't enough.

Run development tasks

  • Call mcp_desktop_execute_shell_command for git pull, npm install, and dotnet build.
  • Call mcp_desktop_take_screenshot to capture build output.
  • Call mcp_desktop_execute_python_code to analyze logs or test results.
  • Call mcp_browser_navigate to open a local dev server in the browser.
  • Call mcp_browser_screenshot to capture the rendered page.

Read and write files

  • Read a file with mcp_desktop_execute_shell_command using type C:\path\to\file.txt.
  • Write a file with mcp_desktop_execute_python_code using Python's open(...) and write(...).
  • Verify with mcp_desktop_execute_shell_command using dir C:\path\to\output.txt.
  • Call mcp_accessibility_get_accessibility_tree to understand the full UI structure.
  • Call mcp_accessibility_find_ui_element to find a specific control (for example, role: "MenuItem", name: "Settings").
  • Call mcp_desktop_click using the element's reported coordinates.
  • Call mcp_accessibility_find_ui_element again to find the next control in the dialog.
  • Call mcp_desktop_type_text or mcp_desktop_click to interact with it.

Keep a long-running session alive

  • Send any MCP request at least once every 30 minutes to prevent idle eviction.
  • mcp_desktop_get_screen_size is lightweight and works well as a heartbeat.