Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
This article provides a complete command-line reference for the runevals command, which is part of the @microsoft/m365-copilot-eval package.
Note
The Agent Evaluations CLI is currently in preview. Features and functionality are subject to change.
Synopsis
runevals [options]
runevals cache-info
runevals cache-clear
runevals cache-dir
Description
The runevals command evaluates Microsoft 365 Copilot agents by sending test prompts and scoring responses using Azure AI + machine learning Evaluation metrics. The tool supports batch evaluation from JSON files, inline prompts, and interactive testing.
Options
-V, --version
Output the version number of the CLI tool.
Example:
runevals --version
Output:
1.3.0-preview.1
--log-level [level]
Set the logging verbosity level. Available levels: debug, info, warning, error.
- Default: When you use the flag without a value, it defaults to
info. - debug: Detailed debugging information, including API payloads.
- info: General information about evaluation progress.
- warning: Warning messages only.
- error: Error messages only.
Examples:
# Info level (default when flag is present)
runevals --log-level
# Debug level
runevals --log-level debug
# Error level only
runevals --log-level error
Warning
The debug level might include raw API payloads and response data in console output. Redaction is pattern-based and might not catch all PII or credentials. Don't share debug output publicly without manual review.
--prompts <prompts...>
Specify one or more prompts directly on the command line for quick testing without creating a file.
Examples:
# Single prompt
runevals --prompts "What is Microsoft 365?"
# Multiple prompts
runevals --prompts "What is Teams?" "What is SharePoint?" "What is OneDrive?"
--expected <responses...>
Provide expected responses to accompany prompts specified with --prompts. The number of responses must match the number of prompts.
Example:
runevals --prompts "What is Microsoft Graph?" \
--expected "Microsoft Graph is the API gateway to Microsoft 365 data and intelligence."
Multiple prompts and responses:
runevals --prompts "What is Teams?" "What is SharePoint?" \
--expected "Teams is a collaboration platform" "SharePoint is a content management system"
--prompts-file <file>
Specify a custom JSON file containing test prompts. This file overrides auto-discovery.
Example:
runevals --prompts-file ./tests/my-custom-tests.json
File format:
[
{
"prompt": "Test question",
"expected_response": "Expected answer"
}
]
For the full dataset schema, see Dataset schema and test design.
-o, --output <file>
Specify the output file path and format. The format is determined by the file extension.
Supported formats:
.html- HTML report (default, auto-opens in browser).json- JSON results.csv- CSV spreadsheet
Examples:
# HTML output
runevals --output ./reports/results.html
# JSON output
runevals --output ./results/eval-results.json
# CSV output
runevals --output ./data/scores.csv
Default behavior:
Without --output, the command saves results to ./.evals/YYYY-MM-DD_HH-MM-SS.html.
-i, --interactive
Enter interactive mode for manual prompt entry and testing.
Example:
runevals --interactive
In interactive mode, you're prompted to enter prompts one at a time, so you can do exploratory testing.
--m365-agent-id <id>
Override the agent ID to evaluate a specific agent. This parameter is useful when testing multiple agents or when the agent ID can't be auto-detected.
Example:
runevals --m365-agent-id "U_0dc4a8a2-b95f-edac-91c8-d802023ec2d4"
Agent ID formats:
- User-scoped:
U_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - Tenant-scoped:
T_agent-name.declarativeAgent
--env <environment>
Specify the environment configuration to load. This parameter loads env/.env.<environment>.
Default: dev (loads env/.env.dev)
Examples:
# Load env/.env.dev (default)
runevals --env dev
# Load env/.env.prod
runevals --env prod
# Load env/.env.staging
runevals --env staging
Environment file precedence:
.env.local(auto-detected for Agents Toolkit projects).env.local.user(secrets, auto-loaded if present)env/.env.<environment>(specified by--env)- System environment variables
--init-only
Initializes the Python environment and downloads dependencies without running evaluations. This option is useful for:
- Prewarming the cache in CI/CD pipelines
- Troubleshooting installation problems
- Verifying the setup before running tests
Example:
runevals --init-only
For troubleshooting, combine this option with --log-level debug:
runevals --init-only --log-level debug
-h, --help
Displays help information about available commands and options.
Example:
runevals --help
Cache commands
The evaluation tool uses a local cache for the Python runtime and dependencies. These commands help you manage the cache.
cache-info
Displays statistics about the cached Python environment, including size, location, and installed packages.
Example:
runevals cache-info
Output:
Cache Information
Location: C:\Users\YourName\.m365-copilot-eval\cache
Size: 245 MB
Python Version: 3.11.5
Packages: 42 installed
Last updated: 2026-04-10 14:23:15
cache-clear
Removes the cached Python environment and all downloaded dependencies. Use this command when troubleshooting installation issues or freeing disk space.
Example:
runevals cache-clear
Follow-up:
After clearing the cache, reinitialize:
runevals --init-only
cache-dir
Prints the absolute path to the cache directory. This feature is useful for scripts or manual inspection.
Example:
runevals cache-dir
Output:
C:\Users\YourName\.m365-copilot-eval\cache
Usage in scripts:
# Check cache directory permissions (Unix/macOS)
chmod -R u+w $(runevals cache-dir)
# View cache contents
ls -lah $(runevals cache-dir)
Environment variables
The tool reads configuration from environment files and system variables. For step-by-step instructions on obtaining these values, see Required environment variables.
Required variables
| Variable | Description | Example |
|---|---|---|
TENANT_ID |
Microsoft Entra tenant ID | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
AZURE_AI_OPENAI_ENDPOINT |
Azure OpenAI in Foundry Models endpoint URL | https://your-resource.openai.azure.com/ |
AZURE_AI_API_KEY |
Azure OpenAI API key | your-api-key-here |
Optional variables
| Variable | Description | Default |
|---|---|---|
M365_AGENT_ID |
Agent ID to evaluate | Auto-detected from M365_TITLE_ID |
M365_TITLE_ID |
Agent title ID (Agents Toolkit) | None |
AZURE_AI_API_VERSION |
Azure OpenAI API version | 2024-12-01-preview |
AZURE_AI_MODEL_NAME |
Model for evaluations | gpt-4o-mini |
Examples
Basic usage
Evaluate by using the auto-discovered dataset file:
cd /path/to/your-agent-project
runevals
Specify environment
Use production environment configuration:
runevals --env prod
Custom dataset file
Use a specific test file:
runevals --prompts-file ./tests/regression-tests.json
Inline testing
Quick test with inline prompts:
runevals --prompts "What is Microsoft 365?" \
--expected "Microsoft 365 is a cloud-based productivity suite"
Interactive mode
Enter prompts manually:
runevals --interactive
Custom output format
Generate JSON results:
runevals --output ./results/eval-$(date +%Y%m%d).json
Debug mode
Run with detailed logging:
runevals --log-level debug --output ./debug-results.json
Setup only
Pre-cache Python environment without running tests:
runevals --init-only --log-level info
Override agent ID
Test a specific agent:
runevals --m365-agent-id "U_0dc4a8a2-b95f-edac-91c8-d802023ec2d4"
Combined options
Comprehensive evaluation with custom settings:
runevals \
--env staging \
--prompts-file ./evals/full-suite.json \
--output ./reports/staging-eval-$(date +%Y%m%d).html \
--log-level info \
--m365-agent-id "T_my-agent.declarativeAgent"
Exit codes
| Code | Meaning |
|---|---|
0 |
Success |
1 |
General error |
2 |
Invalid arguments |
3 |
Environment configuration error |
4 |
Agent not found |
5 |
Authentication failure |
10 |
Python environment setup failure |
Troubleshooting
For common issues with installation, authentication, runtime errors, cache problems, and proxy setup, see the Troubleshooting article.