Share via


Evaluate AL coding agents with BC-Bench

Important

Some of the functionality described in this release plan has not been released. Delivery timelines may change and projected functionality may not be released (see Microsoft policy). Learn more: What's new and planned

Enabled for Public preview General availability
Admins, makers, marketers, or analysts, automatically - Apr 2026

Business value

BC‑Bench provides a repeatable benchmark for AL bug fix and test creation tasks in Business Central, producing measurable results that increase trust in GitHub Copilot agents and accelerate adoption across partners and the community.

Feature details

BC‑Bench is a benchmarking framework for evaluating agent performance on real‑world Dynamics 365 Business Central AL coding tasks. Inspired by SWE‑Bench, it provides measurable, repeatable results instead of subjective impressions, helping developers understand what improvements actually work.

The benchmark focuses on realistic Business Central development scenarios, such as bug fixes and test creation, using curated AL code problems derived from real pull requests. It establishes a consistent evaluation baseline across agent context, instructions, and tooling, enabling objective comparison over time.

By producing transparent metrics, BC‑Bench increases trust in Copilot capabilities, supports data‑driven investment decisions, and helps partners and the community clearly understand an agent's strengths and limitations for Business Central development.

Geographic areas

Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.

Language availability

Visit the Explore Feature Language report for information on this feature's availability.

Tell us what you think

Help us improve Dynamics 365 Business Central by discussing ideas, providing suggestions, and giving feedback. Use the forum at https://aka.ms/bcideas.