Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
An agent test evaluates the end-to-end behavior of a Business Central agent by sending it natural-language input and then asserting that it took the correct actions in the system. Unlike AI tests that evaluate prompt outputs, agent tests verify the full task execution loop: the agent receives input, decides what to do, interacts with Business Central, and completes — or pauses to ask the user for input — before continuing.
The AI Test Suite is the main driver of an agent test. It iterates the dataset, dispatches each turn to your test codeunit, and collects evaluation results. The Library - Agent codeunit is a reusable helper for interacting with the agent during a test. It reads the defined YAML structure and streamlines the agent interactions, so individual tests don't have to handle that wiring directly.
Note
Agent tests run against a live agent service and consume Copilot credits. Run them in sandbox environments with prepaid credits. See Evaluation for credit limits.
Tip
A complete working sample is available at SalesValidationAgent3P on the BCTech repository.
Step-by-step: write an agent test
The recommended pattern is data-driven: the dataset describes the input to the agent and the expected outcome, and your test code runs a small turn loop that delegates the agent interaction to Library - Agent.
Step 1 - define the test codeunit and run the turn loop
An agent test is a standard AL test codeunit with TestType = AITest. It uses Library - Agent to run each turn and AIT Test Context to read dataset values. The codeunit follows a small pattern: an Initialize procedure that resolves and activates the agent, and a [Test] procedure that drives a repeat ... until loop over the dataset turns.
codeunit 50200 "My Agent Accuracy Test"
{
Subtype = Test;
TestType = AITest;
TestPermissions = Disabled;
RequiredTestIsolation = Disabled;
var
AITTestContext: Codeunit "AIT Test Context";
LibraryAgent: Codeunit "Library - Agent";
Assert: Codeunit "Library Assert";
AgentUserSecurityId: Guid;
Initialized: Boolean;
}
RequiredTestIsolation = Disabled is needed because agent tasks span transactions in a different session.
Note
Agent tests depend on datasets that describe the input sent to the agent and the expected outcome. Learn how to create one in Datasets.
Initialize the agent
Initialize runs before the first turn. It resolves the agent under test, cleans up any tasks left from prior runs, and activates the agent. The agent identity can come from the evaluation suite (when set) or from your own setup — fall back to creating the agent programmatically if neither is available.
local procedure Initialize()
begin
if Initialized then
exit;
// Optionally read the agent under test from the evaluation suite.
// This enables A/B testing — you can point a suite at a different agent
// without changing test code. Skip the call if your test always targets
// a specific agent.
LibraryAgent.GetAgentUnderTest(AgentUserSecurityId);
// GetAgentUnderTest returns a null GUID when no agent is configured
// on the suite. Fall back to your own lookup or to creating the agent
// programmatically.
if IsNullGuid(AgentUserSecurityId) then
AgentUserSecurityId := GetOrCreateAgent();
// Clean up any tasks left from prior runs before activating the agent.
// Running cleanup at the start of Initialize replaces a separate
// [TearDown] step and makes the test resilient to previous failures
// that didn't shut down cleanly.
LibraryAgent.StopTasks(AgentUserSecurityId);
LibraryAgent.EnsureAgentIsActive(AgentUserSecurityId);
Initialized := true;
end;
GetOrCreateAgent is application-specific. A common pattern is to look up the agent identity in a setup record and create it on first run by using the Agent SDK:
local procedure GetOrCreateAgent(): Guid
var
MyAgentSetup: Record "My Agent Setup";
begin
if MyAgentSetup.FindFirst() then
exit(MyAgentSetup."Agent User Security ID");
// First run — create the agent and persist its identity.
// See "Define and register an agent programmatically" for the APIs.
exit(CreateMyAgent());
end;
Tip
Calling GetAgentUnderTest is optional. Use it when you want the evaluation suite to control which agent the test targets — for example, to A/B test two agent versions against the same dataset. Tests that always run against a specific agent can skip the call and assign AgentUserSecurityId directly. For details on creating an agent in code, see Define and register an agent programmatically.
Run the turn loop
With Initialize in place, write the [Test] procedure. It runs a repeat ... until loop that delegates each turn to Library - Agent.RunTurnAndWait and then to Library - Agent.FinalizeTurn. RunTurnAndWait reads the current turn's query:, dispatches it to the agent, and waits for completion. FinalizeTurn writes the turn output, validates intervention expectations declared in the dataset, and advances to the next turn.
[Test]
procedure TestAgentReleasesOrders()
var
AgentTask: Record "Agent Task";
TurnSuccessful, ContinueWithNextTurn : Boolean;
ErrorReason: Text;
AgentStatusErr: Label 'The agent task did not complete successfully. Status: %1.', Comment = '%1 = task status';
begin
Initialize();
repeat
Clear(ErrorReason);
// [GIVEN] Prepare Business Central state for this turn
ApplyTurnSetup();
// [WHEN] Send the turn input to the agent and wait
TurnSuccessful := LibraryAgent.RunTurnAndWait(AgentUserSecurityId, AgentTask);
if TurnSuccessful then
TurnSuccessful := ValidateTurnCompletedSuccessfully(ErrorReason)
else
ErrorReason := StrSubstNo(AgentStatusErr, AgentTask.Status);
// [THEN] Record turn outcome and decide whether to continue
ContinueWithNextTurn := LibraryAgent.FinalizeTurn(AgentTask, TurnSuccessful, ErrorReason);
until not ContinueWithNextTurn;
Assert.IsTrue(TurnSuccessful, ErrorReason);
end;
Validation reads the turn's expected_data from AIT Test Context and checks Business Central state. Return false with a populated ErrorReason on mismatch rather than calling Error() — that lets FinalizeTurn log the failure on the turn and (optionally) continue with the next turn.
local procedure ValidateTurnCompletedSuccessfully(var ErrorReason: Text): Boolean
var
ExpectedData: Codeunit "Test Input Json";
ExpectedReleased: Integer;
ActualReleased: Integer;
begin
ExpectedData := AITTestContext.GetExpectedData();
ExpectedReleased := ExpectedData.Element('expected_released_count').ValueAsInteger();
ActualReleased := CountReleasedSalesOrders();
if ActualReleased <> ExpectedReleased then begin
ErrorReason := StrSubstNo('Expected %1 released orders, found %2.', ExpectedReleased, ActualReleased);
exit(false);
end;
exit(true);
end;
Step 2 - author the dataset
Agent test datasets use the query: element to describe what the agent receives. A task-input query has at minimum a title (used as the task subject) and typically a from (sender display name) and a message (the natural-language instruction).
suite_setup: MY-AGENT-SETUP
tests:
- name: AGENT_TEST_01
description: Process sales orders with a future shipment date
turns:
- query:
from: Jane Doe
title: "Release all sales orders for $DateFormula-<CW+1M>$"
message: "Release all sales orders for $DateFormula-<CW+1M>$"
expected_data:
expected_released_count: 2
A few things to call out:
- Always use the
turns:array, even for single-turn tests. Multi-turn syntax is the supported format for both single-turn and multi-turn tests. - Dates are expressed through placeholders —
$DateFormula-<formula>$is calculated relative toWorkDateso the test doesn't drift over time. Learn more in Placeholders for dates for the full reference. expected_datais your validation contract. Most keys you put there are read by your own validator. One sub-key,intervention_request, is recognized by the framework — see Step 3.
Note
Older datasets use the legacy question: element instead of query:. Both are accepted by the framework; new datasets should use query:. Learn more in Datasets — backwards compatibility about migration details.
Step 3 - handle interventions declaratively
Some agents pause to ask the user for input before continuing. The framework validates these interventions for you based on what the dataset declares.
Declare an expected intervention in expected_data.intervention_request. The type value must be one of Assistance, Review, or Message (the English names of the Agent User Int Request Type enum). suggestions is optional — when present, every code listed must appear in the agent's actual request.
turns:
- query:
from: Jane Doe
title: "Release all open sales orders"
message: "Release all open sales orders"
expected_data:
intervention_request:
type: Assistance
suggestions:
- PROVIDE_DATE
expected_released_count: 0
- query:
intervention:
instruction: "Use shipment date $DateFormula-<CW+1M>$"
expected_data:
expected_released_count: 1
The second turn uses a continuation query instead of a task input. You can respond either with an instruction: (free-text reply) or a suggestion: (one of the codes offered by the agent — mutually exclusive with instruction).
FinalizeTurn enforces the contract automatically:
- If a turn declares
intervention_request, the agent must have paused for an intervention with the matching type. Ifsuggestionsis listed, all expected codes must be present. - If a turn does not declare
intervention_request, the agent must not have paused for one. An unexpected intervention fails the turn.
So declare intervention_request on every turn where you expect the agent to pause; otherwise omit it.
Step 4 - prepare per-turn data
A turn's turn_setup: block contains data to materialize in Business Central before the agent runs. The shape is application-defined — the framework hands the YAML sub-tree to your test code, which dispatches to record-creation handlers.
A common convention is to use setup_actions with action_type and action_data:
turn_setup:
setup_actions:
- action_type: create_sales_order
action_data:
- "Sell-to Customer No.": CUST001
"Shipment Date": "$DateFormula-<CW+1M>$"
In AL, read the block via AITTestContext.GetTurnSetup and walk the JSON with Test Input Json:
local procedure ApplyTurnSetup()
var
TurnSetup: Codeunit "Test Input Json";
Found: Boolean;
begin
TurnSetup := AITTestContext.GetTurnSetup(Found);
if not Found then
exit;
// Dispatch each action_type entry to a handler that creates records.
// Resolve $DateFormula-... values via Test Input Data Tools — values
// read through Test Input Json's typed accessors are resolved
// automatically.
end;
For a complete dispatcher implementation, check out the SalesValidationAgent3P sample.
Step 5 - configure suite-level setup (optional)
Suite-level setup holds data that needs to be created only once across all tests in a suite — typical examples are master records every test relies on (locations, customers, posting groups). Define it in a separate YAML file under .resources/suite_setup/ and reference it by name from the test dataset:
# .resources/suite_setup/MY-AGENT-SETUP.yaml
name: MY-AGENT-SETUP
description: Shared master data for the My Agent suite.
suite_setup:
setup_actions:
- action_type: create_customers
action_data:
- "No.": CUST001
Name: Validation Customer 01
Apply it once per suite in your test codeunit, gated by IsSuiteSetupDone:
local procedure SetupPerSuiteTestData()
var
SuiteSetup: Codeunit "Test Input Json";
begin
if AITTestContext.IsSuiteSetupDone() then
exit;
SuiteSetup := AITTestContext.GetEvalSuiteSetupDataInput();
ApplySuiteSetup(SuiteSetup);
AITTestContext.SetEvalSuiteSetupCompleted();
end;
Call SetupPerSuiteTestData from Initialize. The IsSuiteSetupDone flag is sticky across runs — use the Reset Suite Setup action on the AI Eval Suite page after editing the suite_setup: content.
Configure the evaluation suite
Set TestType="Agent" and TestRunnerId="130451" (Test Runner - Isol. Disabled) on the suite XML to mark it as an agent evaluation. Each <Line> points at a dataset and your test codeunit.
<Root>
<AITSuite Code="MY-AGENT" Description="My Agent Accuracy Tests" Dataset="MY-DATASET.YAML"
Capability="My Agent Capability" Frequency="Daily" TestRunnerId="130451" TestType="Agent">
<Line CodeunitID="50200" Description="My Agent - Accuracy" Dataset="MY-DATASET.YAML" />
</AITSuite>
</Root>
Advanced - manual task management
RunTurnAndWait and FinalizeTurn cover the data-driven flow used by most agent tests. For scenarios that need finer control — for example, starting a task from product code, constructing custom messages, or polling intervention state — Library - Agent exposes lower-level helpers. The descriptions below summarize what each method does; see the codeunit symbols for full parameter lists.
Task lifecycle
CreateTaskAndWait(var AgentTaskBuilder)— Create a task from a manually configuredAgent Task Builderand wait for completion.CreateMessageAndWait(var AgentTaskMessageBuilder)— Append a message to an existing task and wait.ContinueTaskAndWait(var AgentTask)/(var AgentTask, UserInput)— Continue a paused task with default or custom free-text input.WaitForTaskToComplete(var AgentTask)— Block until a task finishes. Use this in end-to-end scenarios that start the task from product code (for example, by invoking an action) rather than from the library.StopTasks(AgentUserSecurityId)/StopAllTasks()— Cleanup helpers. The recommended flow callsStopTasksfromInitialize; use these directly only when you need ad-hoc cleanup outside the standard turn loop.SetAgentTaskTimeout(NewTimeout)— Override the 30-minute default wait for all…AndWaitmethods.
Example: building and starting a task manually.
[Test]
procedure TestAgentReleasesOrders_Manual()
var
AgentTask: Record "Agent Task";
AgentTaskBuilder: Codeunit "Agent Task Builder";
AgentTaskMessageBuilder: Codeunit "Agent Task Message Builder";
TaskSuccessful: Boolean;
begin
Initialize();
AgentTaskMessageBuilder.Initialize('Jane Doe', 'Release all open sales orders');
AgentTaskBuilder.Initialize(AgentUserSecurityId, 'Release all open sales orders');
AgentTaskBuilder.AddTaskMessage(AgentTaskMessageBuilder);
TaskSuccessful := LibraryAgent.CreateTaskAndWait(AgentTaskBuilder, AgentTask);
Assert.IsTrue(TaskSuccessful, StrSubstNo('Agent task did not complete. Status: %1', AgentTask.Status));
end;
Manual intervention handling
RequiresUserIntervention(AgentTask)— Returnstruewhen the task is paused waiting for user input.GetLastUserInterventionRequestDetails(...)— Read the most recent intervention request, its annotations, and any offered suggestions.GetUserInterventionRequestDetails(LogEntry, ...)— Read the intervention request attached to a specific log entry.CreateUserInterventionAndWait(var AgentTask, UserInput)— Reply with free-text input and wait.CreateUserInterventionFromSuggestionAndWait(var AgentTask, SuggestionCode)— Reply with a suggestion code and wait.ParseUserInterventionRequestType(Text)— Convert a dataset string (Assistance,Review,Message) to the correspondingAgent User Int Request Typeenum value.
Example: detect an intervention and respond.
if LibraryAgent.RequiresUserIntervention(AgentTask) then
TaskSuccessful := LibraryAgent.CreateUserInterventionFromSuggestionAndWait(AgentTask, 'CONFIRM');
The valid suggestion codes are defined by the agent developer in the agent's IAgentTaskExecution implementation, specifically in GetAgentTaskUserInterventionSuggestions. Check the agent's implementation to find the available codes before writing assertions against them.
Manual output
WriteTaskToOutput(var AgentTask, var Output)— Serialize task details and log entries to JSON.WriteTaskToOutput(var AgentTask, var Output, FromDateTime)— Same, filtered to entries from a specific point in time. Useful in multi-turn scenarios where you want only the current turn's activity.WriteTurnToOutput(var AgentTask, TurnSuccessful, ErrorReason)— Sets the answer used by evaluators.FinalizeTurncalls this automatically; use it directly only when you bypassFinalizeTurn.
Backwards compatibility: question: and GetQuestion
Earlier agent test datasets used a question: element instead of query:, and the matching API on AIT Test Context was GetQuestion(). Both forms are still accepted by the framework, so existing tests don't have to be rewritten. New tests should use query: and GetQuery() because:
query:supports the task-input/intervention split (title/from/message/attachmentsvs.intervention.suggestion/intervention.instruction).RunTurnAndWaitdispatches based on thequery:shape — it doesn't recognizequestion:as input.- The validation and continuation features (
expected_data.intervention_request, multi-turn continuation queries) are layered on thequery:form.
When migrating, keep this mapping in mind:
| Legacy | Current |
|---|---|
question: (scalar text) |
query: with title (and optional from / message) |
AITTestContext.GetQuestion() |
AITTestContext.GetQuery() |
Related information
Business Central Copilot Evaluation
SalesValidationAgent3P sample on BCTech
Test the Copilot capability in AL
Write AI tests
Datasets
Evaluation
Permissions in Evaluation
Best practices for testing the Copilot capability