Ollama

Ollama allows you to run open-source models locally and use them with Agent Framework. This is ideal for development, testing, and scenarios where you need to keep data on-premises.

The following example shows how to create an agent using Ollama:

using System;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;

// Create an Ollama agent using Microsoft.Extensions.AI.Ollama
// Requires: dotnet add package Microsoft.Extensions.AI.Ollama --prerelease
var chatClient = new OllamaChatClient(
    new Uri("http://localhost:11434"),
    modelId: "llama3.2");

AIAgent agent = chatClient.AsAIAgent(
    instructions: "You are a helpful assistant running locally via Ollama.");

Console.WriteLine(await agent.RunAsync("What is the largest city in France?"));

Prerequisites

Ensure Ollama is installed and running locally with a model downloaded before running any examples:

ollama pull llama3.2

Note

Not all models support function calling. For tool usage, try llama3.2 or qwen3:4b.

pip install agent-framework-ollama --pre

pip install agent-framework

Configuration

Native Ollama
OpenAI Compatible

OLLAMA_MODEL="llama3.2"

The native client connects to http://localhost:11434 by default. You can override this by passing host to the client.

OLLAMA_ENDPOINT="http://localhost:11434/v1/"
OLLAMA_MODEL="llama3.2"

Create Ollama Agents

Native Ollama
OpenAI Compatible

OllamaChatClient provides native Ollama integration with full support for function tools and streaming.

import asyncio
from agent_framework.ollama import OllamaChatClient

async def main():
    agent = OllamaChatClient().as_agent(
        name="HelpfulAssistant",
        instructions="You are a helpful assistant running locally via Ollama.",
    )
    result = await agent.run("What is the largest city in France?")
    print(result)

asyncio.run(main())

You can also use OpenAIChatClient with a custom base URL pointing to your Ollama instance.

import asyncio
import os
from agent_framework.openai import OpenAIChatClient

async def main():
    agent = OpenAIChatClient(
        api_key="ollama",  # Placeholder, Ollama doesn't require an API key
        base_url=os.environ["OLLAMA_ENDPOINT"],
        model=os.environ["OLLAMA_MODEL"],
    ).as_agent(
        name="HelpfulAssistant",
        instructions="You are a helpful assistant running locally via Ollama.",
    )
    result = await agent.run("What is the largest city in France?")
    print(result)

asyncio.run(main())

Tools

The Python Ollama clients (OllamaChatClient and OpenAIChatClient pointed at an Ollama-compatible endpoint) support locally invoked tools. Hosted tool types do not exist because Ollama is a local model runtime.

Tool	Status	Notes
Function Tools	✅	Standard Python callables or `@ai_function`. Whether the selected model can actually call them depends on the model itself.
Tool Approval	✅	Provided by the framework's function-invoking chat client; works with any function-tool call.
Code Interpreter	❌	No hosted code interpreter.
File Search	❌	No hosted file search.
Web Search	❌	No hosted web search.
Hosted MCP Tools	❌	Ollama does not expose hosted MCP.
Local MCP Tools	✅	Runs in your process and works with any chat client.

Function Tools

Native Ollama
OpenAI Compatible

import asyncio
from datetime import datetime
from agent_framework.ollama import OllamaChatClient

def get_time(location: str) -> str:
    """Get the current time."""
    return f"The current time in {location} is {datetime.now().strftime('%I:%M %p')}."

async def main():
    agent = OllamaChatClient().as_agent(
        name="TimeAgent",
        instructions="You are a helpful time agent.",
        tools=get_time,
    )
    result = await agent.run("What time is it in Seattle?")
    print(result)

asyncio.run(main())

import asyncio
import os
from datetime import datetime
from agent_framework.openai import OpenAIChatClient

def get_time(location: str) -> str:
    """Get the current time."""
    return f"The current time in {location} is {datetime.now().strftime('%I:%M %p')}."

async def main():
    agent = OpenAIChatClient(
        api_key="ollama",
        base_url=os.environ["OLLAMA_ENDPOINT"],
        model=os.environ["OLLAMA_MODEL"],
    ).as_agent(
        name="TimeAgent",
        instructions="You are a helpful time agent.",
        tools=get_time,
    )
    result = await agent.run("What time is it in Seattle?")
    print(result)

asyncio.run(main())

Streaming

async def streaming_example():
    agent = OllamaChatClient().as_agent(
        instructions="You are a helpful assistant.",
    )
    print("Agent: ", end="", flush=True)
    async for chunk in agent.run("Tell me about Python.", stream=True):
        if chunk.text:
            print(chunk.text, end="", flush=True)
    print()

Next steps

GitHub Copilot

Feedback

Was this page helpful?

Last updated on 2026-05-26