Tutorial: Build a voice-to-text note taker

Build an application that converts spoken audio into organized notes — entirely on your device. The app first transcribes an audio file using a speech-to-text model, then uses a chat model to summarize and organize the transcription into clean notes.

In this tutorial, you learn how to:

Set up a project and install the Foundry Local SDK
Load a speech-to-text model and transcribe an audio file
Load a chat model and summarize the transcription
Combine transcription and summarization into a complete app
Clean up resources

Prerequisites

A Windows, macOS, or Linux computer with at least 8 GB of RAM.
A .wav audio file to transcribe (the tutorial uses a sample file).

Install packages

Samples repository

The complete sample code for this article is available in the foundry-samples GitHub repository. To clone the repository and navigate to the sample use:

git clone https://github.com/microsoft-foundry/foundry-samples.git
cd foundry-samples/samples/csharp/foundry-local/tutorial-voice-to-text

If you're developing or shipping on Windows, select the Windows tab. The Windows package integrates with the Windows ML runtime — it provides the same API surface area with a wider breadth of hardware acceleration.

Windows
Cross-Platform

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

dotnet add package Microsoft.AI.Foundry.Local
dotnet add package OpenAI

The C# samples in the GitHub repository are preconfigured projects. If you're building from scratch, you should read the Foundry Local SDK reference for more details on how to set up your C# project with Foundry Local.

Transcribe an audio file

In this step, you load a speech-to-text model and transcribe an audio file. The Foundry Local SDK uses the whisper model alias to select the best Whisper variant for your hardware.

Open Program.cs and replace its contents with the following code to initialize the SDK, load the speech model, and transcribe an audio file:

// Load the speech-to-text model
var speechModel = await catalog.GetModelAsync("whisper-tiny")
    ?? throw new Exception("Speech model not found");

await speechModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading speech model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await speechModel.LoadAsync();
Console.WriteLine("Speech model loaded.");

// Transcribe the audio file
var audioClient = await speechModel.GetAudioClientAsync();
var transcriptionText = new StringBuilder();

Console.WriteLine("\nTranscription:");
var audioResponse = audioClient
    .TranscribeAudioStreamingAsync("meeting-notes.wav", ct);
await foreach (var chunk in audioResponse)
{
    Console.Write(chunk.Text);
    transcriptionText.Append(chunk.Text);
}
Console.WriteLine();

// Unload the speech model to free memory
await speechModel.UnloadAsync();

The GetAudioClientAsync method returns a client for audio operations. The TranscribeAudioStreamingAsync method streams transcription chunks as they become available. You accumulate the text so you can pass it to the chat model in the next step.

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Summarize the transcription

Now use a chat model to organize the raw transcription into structured notes. Load the qwen2.5-0.5b model and send the transcription as context with a system prompt that instructs the model to produce clean, summarized notes.

Add the following code after the transcription step:

// Load the chat model for summarization
var chatModel = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Chat model not found");

await chatModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading chat model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await chatModel.LoadAsync();
Console.WriteLine("Chat model loaded.");

// Summarize the transcription into organized notes
var chatClient = await chatModel.GetChatClientAsync();
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a note-taking assistant. Summarize " +
                  "the following transcription into organized, " +
                  "concise notes with bullet points."
    },
    new ChatMessage
    {
        Role = "user",
        Content = transcriptionText.ToString()
    }
};

var chatResponse = await chatClient.CompleteChatAsync(messages, ct);
var summary = chatResponse.Choices[0].Message.Content;
Console.WriteLine($"\nSummary:\n{summary}");

// Clean up
await chatModel.UnloadAsync();
Console.WriteLine("\nDone. Models unloaded.");

The system prompt shapes the model's output format. By instructing it to produce "organized, concise notes with bullet points," you get structured content rather than a raw paraphrase.

Combine into a complete app

Replace the contents of Program.cs with the following complete code that transcribes an audio file and summarizes the transcription:

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;
using System.Text;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(
        Microsoft.Extensions.Logging.LogLevel.Information
    );
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

var catalog = await mgr.GetCatalogAsync();

// Load the speech-to-text model
var speechModel = await catalog.GetModelAsync("whisper-tiny")
    ?? throw new Exception("Speech model not found");

await speechModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading speech model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await speechModel.LoadAsync();
Console.WriteLine("Speech model loaded.");

// Transcribe the audio file
var audioClient = await speechModel.GetAudioClientAsync();
var transcriptionText = new StringBuilder();

Console.WriteLine("\nTranscription:");
var audioResponse = audioClient
    .TranscribeAudioStreamingAsync("meeting-notes.wav", ct);
await foreach (var chunk in audioResponse)
{
    Console.Write(chunk.Text);
    transcriptionText.Append(chunk.Text);
}
Console.WriteLine();

// Unload the speech model to free memory
await speechModel.UnloadAsync();

// Load the chat model for summarization
var chatModel = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Chat model not found");

await chatModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading chat model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await chatModel.LoadAsync();
Console.WriteLine("Chat model loaded.");

// Summarize the transcription into organized notes
var chatClient = await chatModel.GetChatClientAsync();
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a note-taking assistant. Summarize " +
                  "the following transcription into organized, " +
                  "concise notes with bullet points."
    },
    new ChatMessage
    {
        Role = "user",
        Content = transcriptionText.ToString()
    }
};

var chatResponse = await chatClient.CompleteChatAsync(messages, ct);
var summary = chatResponse.Choices[0].Message.Content;
Console.WriteLine($"\nSummary:\n{summary}");

// Clean up
await chatModel.UnloadAsync();
Console.WriteLine("\nDone. Models unloaded.");

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Run the note taker:

dotnet run

You see output similar to:

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

The application first transcribes the audio content with streaming output, then passes the accumulated text to a chat model that extracts key points and organizes them into structured notes.

Install packages

Samples repository

The complete sample code for this article is available in the foundry-samples GitHub repository. To clone the repository and navigate to the sample use:

git clone https://github.com/microsoft-foundry/foundry-samples.git
cd foundry-samples/samples/javascript/foundry-local/tutorial-voice-to-text

Windows
Cross-Platform

npm install foundry-local-sdk-winml openai

npm install foundry-local-sdk openai

Transcribe an audio file

In this step, you load a speech-to-text model and transcribe an audio file. The Foundry Local SDK uses the whisper model alias to select the best Whisper variant for your hardware.

Create a file called app.js.

Add the following code to initialize the SDK, load the speech model, and transcribe an audio file:

// Load the speech-to-text model
const speechModel = await manager.catalog.getModel('whisper-tiny');
await speechModel.download((progress) => {
    process.stdout.write(
        `\rDownloading speech model: ${progress.toFixed(2)}%`
    );
});
console.log('\nSpeech model downloaded.');

await speechModel.load();
console.log('Speech model loaded.');

// Transcribe the audio file
const audioClient = speechModel.createAudioClient();
const transcription = await audioClient.transcribe(
    path.join(__dirname, 'meeting-notes.wav')
);
console.log(`\nTranscription:\n${transcription.text}`);

// Unload the speech model to free memory
await speechModel.unload();

The createAudioClient method returns a client for audio operations. The transcribe method accepts a file path and returns an object with a text property containing the transcribed content.

Note

Replace './meeting-notes.wav' with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Summarize the transcription

Add the following code after the transcription step:

// Load the chat model for summarization
const chatModel = await manager.catalog.getModel('qwen2.5-0.5b');
await chatModel.download((progress) => {
    process.stdout.write(
        `\rDownloading chat model: ${progress.toFixed(2)}%`
    );
});
console.log('\nChat model downloaded.');

await chatModel.load();
console.log('Chat model loaded.');

// Summarize the transcription into organized notes
const chatClient = chatModel.createChatClient();
const messages = [
    {
        role: 'system',
        content: 'You are a note-taking assistant. Summarize ' +
                 'the following transcription into organized, ' +
                 'concise notes with bullet points.'
    },
    {
        role: 'user',
        content: transcription.text
    }
];

const response = await chatClient.completeChat(messages);
const summary = response.choices[0]?.message?.content;
console.log(`\nSummary:\n${summary}`);

// Clean up
await chatModel.unload();
console.log('\nDone. Models unloaded.');

The system prompt shapes the model's output format. By instructing it to produce "organized, concise notes with bullet points," you get structured content rather than a raw paraphrase.

Combine into a complete app

Create a file named app.js and add the following complete code that transcribes an audio file and summarizes the transcription:

import { FoundryLocalManager } from 'foundry-local-sdk';
import { fileURLToPath } from 'url';
import path from 'path';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Load the speech-to-text model
const speechModel = await manager.catalog.getModel('whisper-tiny');
await speechModel.download((progress) => {
    process.stdout.write(
        `\rDownloading speech model: ${progress.toFixed(2)}%`
    );
});
console.log('\nSpeech model downloaded.');

await speechModel.load();
console.log('Speech model loaded.');

// Transcribe the audio file
const audioClient = speechModel.createAudioClient();
const transcription = await audioClient.transcribe(
    path.join(__dirname, 'meeting-notes.wav')
);
console.log(`\nTranscription:\n${transcription.text}`);

// Unload the speech model to free memory
await speechModel.unload();

// Load the chat model for summarization
const chatModel = await manager.catalog.getModel('qwen2.5-0.5b');
await chatModel.download((progress) => {
    process.stdout.write(
        `\rDownloading chat model: ${progress.toFixed(2)}%`
    );
});
console.log('\nChat model downloaded.');

await chatModel.load();
console.log('Chat model loaded.');

// Summarize the transcription into organized notes
const chatClient = chatModel.createChatClient();
const messages = [
    {
        role: 'system',
        content: 'You are a note-taking assistant. Summarize ' +
                 'the following transcription into organized, ' +
                 'concise notes with bullet points.'
    },
    {
        role: 'user',
        content: transcription.text
    }
];

const response = await chatClient.completeChat(messages);
const summary = response.choices[0]?.message?.content;
console.log(`\nSummary:\n${summary}`);

// Clean up
await chatModel.unload();
console.log('\nDone. Models unloaded.');

Note

Replace './meeting-notes.wav' with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Run the note taker:

node app.js

You see output similar to:

Downloading speech model: 100.00%
Speech model downloaded.
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model downloaded.
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

The application first transcribes the audio content, then passes that text to a chat model that extracts key points and organizes them into structured notes.

Install packages

Samples repository

The complete sample code for this article is available in the foundry-samples GitHub repository. To clone the repository and navigate to the sample use:

git clone https://github.com/microsoft-foundry/foundry-samples.git
cd foundry-samples/samples/python/foundry-local/tutorial-voice-to-text

Windows
Cross-Platform

pip install foundry-local-sdk-winml openai

pip install foundry-local-sdk openai

Transcribe an audio file

In this step, you load a speech-to-text model and transcribe an audio file. The Foundry Local SDK uses the whisper model alias to select the best Whisper variant for your hardware.

Create a file called app.py.

Add the following code to initialize the SDK, load the speech model, and transcribe an audio file:

# Load the speech-to-text model
speech_model = manager.catalog.get_model("whisper-tiny")
speech_model.download(
    lambda progress: print(
        f"\rDownloading speech model: {progress:.2f}%",
        end="",
        flush=True,
    )
)
print()
speech_model.load()
print("Speech model loaded.")

# Transcribe the audio file
audio_client = speech_model.get_audio_client()
transcription = audio_client.transcribe("meeting-notes.wav")
print(f"\nTranscription:\n{transcription.text}")

# Unload the speech model to free memory
speech_model.unload()

The get_audio_client method returns a client for audio operations. The transcribe method accepts a file path and returns an object with a text property containing the transcribed content.

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Summarize the transcription

Add the following code after the transcription step:

# Load the chat model for summarization
chat_model = manager.catalog.get_model("qwen2.5-0.5b")
chat_model.download(
    lambda progress: print(
        f"\rDownloading chat model: {progress:.2f}%",
        end="",
        flush=True,
    )
)
print()
chat_model.load()
print("Chat model loaded.")

# Summarize the transcription into organized notes
client = chat_model.get_chat_client()
messages = [
    {
        "role": "system",
        "content": "You are a note-taking assistant. "
        "Summarize the following transcription "
        "into organized, concise notes with "
        "bullet points.",
    },
    {"role": "user", "content": transcription.text},
]

response = client.complete_chat(messages)
summary = response.choices[0].message.content
print(f"\nSummary:\n{summary}")

# Clean up
chat_model.unload()
print("\nDone. Models unloaded.")

The system prompt shapes the model's output format. By instructing it to produce "organized, concise notes with bullet points," you get structured content rather than a raw paraphrase.

Combine into a complete app

Create a file named app.py and add the following complete code that transcribes an audio file and summarizes the transcription:

from foundry_local_sdk import Configuration, FoundryLocalManager



def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""

    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Load the speech-to-text model
    speech_model = manager.catalog.get_model("whisper-tiny")
    speech_model.download(
        lambda progress: print(
            f"\rDownloading speech model: {progress:.2f}%",
            end="",
            flush=True,
        )
    )
    print()
    speech_model.load()
    print("Speech model loaded.")

    # Transcribe the audio file
    audio_client = speech_model.get_audio_client()
    transcription = audio_client.transcribe("meeting-notes.wav")
    print(f"\nTranscription:\n{transcription.text}")

    # Unload the speech model to free memory
    speech_model.unload()

    # Load the chat model for summarization
    chat_model = manager.catalog.get_model("qwen2.5-0.5b")
    chat_model.download(
        lambda progress: print(
            f"\rDownloading chat model: {progress:.2f}%",
            end="",
            flush=True,
        )
    )
    print()
    chat_model.load()
    print("Chat model loaded.")

    # Summarize the transcription into organized notes
    client = chat_model.get_chat_client()
    messages = [
        {
            "role": "system",
            "content": "You are a note-taking assistant. "
            "Summarize the following transcription "
            "into organized, concise notes with "
            "bullet points.",
        },
        {"role": "user", "content": transcription.text},
    ]

    response = client.complete_chat(messages)
    summary = response.choices[0].message.content
    print(f"\nSummary:\n{summary}")

    # Clean up
    chat_model.unload()
    print("\nDone. Models unloaded.")


if __name__ == "__main__":
    main()

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Run the note taker:

python app.py

You see output similar to:

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

The application first transcribes the audio content, then passes that text to a chat model that extracts key points and organizes them into structured notes.

Install packages

Samples repository

The complete sample code for this article is available in the foundry-samples GitHub repository. To clone the repository and navigate to the sample use:

git clone https://github.com/microsoft-foundry/foundry-samples.git
cd foundry-samples/samples/rust/foundry-local/tutorial-voice-to-text

Windows
Cross-Platform

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

cargo add foundry-local-sdk
cargo add tokio --features full
cargo add tokio-stream anyhow

Transcribe an audio file

In this step, you load a speech-to-text model and transcribe an audio file. The Foundry Local SDK uses the whisper model alias to select the best Whisper variant for your hardware.

Open src/main.rs and replace its contents with the following code to initialize the SDK, load the speech model, and transcribe an audio file:

// Load the speech-to-text model
let speech_model = manager
    .catalog()
    .get_model("whisper-tiny")
    .await?;

if !speech_model.is_cached().await? {
    println!("Downloading speech model...");
    speech_model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

speech_model.load().await?;
println!("Speech model loaded.");

// Transcribe the audio file
let audio_client = speech_model.create_audio_client();
let transcription = audio_client
    .transcribe("meeting-notes.wav")
    .await?;
println!("\nTranscription:\n{}", transcription.text);

// Unload the speech model to free memory
speech_model.unload().await?;

The create_audio_client method returns a client for audio operations. The transcribe method accepts a file path and returns an object with a text field containing the transcribed content.

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Summarize the transcription

Add the following code after the transcription step, inside the main function:

// Load the chat model for summarization
let chat_model = manager
    .catalog()
    .get_model("qwen2.5-0.5b")
    .await?;

if !chat_model.is_cached().await? {
    println!("Downloading chat model...");
    chat_model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

chat_model.load().await?;
println!("Chat model loaded.");

// Summarize the transcription into organized notes
let client = chat_model
    .create_chat_client()
    .temperature(0.7)
    .max_tokens(512);

let messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a note-taking assistant. Summarize \
         the following transcription into organized, \
         concise notes with bullet points.",
    )
    .into(),
    ChatCompletionRequestUserMessage::from(
        transcription.text.as_str(),
    )
    .into(),
];

let response = client
    .complete_chat(&messages, None)
    .await?;
let summary = response.choices[0]
    .message
    .content
    .as_deref()
    .unwrap_or("");
println!("\nSummary:\n{}", summary);

// Clean up
chat_model.unload().await?;
println!("\nDone. Models unloaded.");

The system prompt shapes the model's output format. By instructing it to produce "organized, concise notes with bullet points," you get structured content rather than a raw paraphrase.

Combine into a complete app

Replace the contents of src/main.rs with the following complete code that transcribes an audio file and summarizes the transcription:

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage,
    ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, Write};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(
        FoundryLocalConfig::new("foundry_local_samples"),
    )?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Load the speech-to-text model
    let speech_model = manager
        .catalog()
        .get_model("whisper-tiny")
        .await?;

    if !speech_model.is_cached().await? {
        println!("Downloading speech model...");
        speech_model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    speech_model.load().await?;
    println!("Speech model loaded.");

    // Transcribe the audio file
    let audio_client = speech_model.create_audio_client();
    let transcription = audio_client
        .transcribe("meeting-notes.wav")
        .await?;
    println!("\nTranscription:\n{}", transcription.text);

    // Unload the speech model to free memory
    speech_model.unload().await?;

    // Load the chat model for summarization
    let chat_model = manager
        .catalog()
        .get_model("qwen2.5-0.5b")
        .await?;

    if !chat_model.is_cached().await? {
        println!("Downloading chat model...");
        chat_model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    chat_model.load().await?;
    println!("Chat model loaded.");

    // Summarize the transcription into organized notes
    let client = chat_model
        .create_chat_client()
        .temperature(0.7)
        .max_tokens(512);

    let messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a note-taking assistant. Summarize \
             the following transcription into organized, \
             concise notes with bullet points.",
        )
        .into(),
        ChatCompletionRequestUserMessage::from(
            transcription.text.as_str(),
        )
        .into(),
    ];

    let response = client
        .complete_chat(&messages, None)
        .await?;
    let summary = response.choices[0]
        .message
        .content
        .as_deref()
        .unwrap_or("");
    println!("\nSummary:\n{}", summary);

    // Clean up
    chat_model.unload().await?;
    println!("\nDone. Models unloaded.");

    Ok(())
}

Note

Replace "meeting-notes.wav" with the path to your audio file. Supported formats include WAV, MP3, and FLAC.

Run the note taker:

cargo run

You see output similar to:

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

The application first transcribes the audio content, then passes that text to a chat model that extracts key points and organizes them into structured notes.

Clean up resources

The model weights remain in your local cache after you unload a model. This means the next time you run the application, the download step is skipped and the model loads faster. No extra cleanup is needed unless you want to reclaim disk space.

Feedback

Was this page helpful?

Last updated on 2026-06-15

Tutorial: Build a voice-to-text note taker

Prerequisites

Install packages

Samples repository

Transcribe an audio file

Summarize the transcription

Combine into a complete app

Install packages

Samples repository

Transcribe an audio file

Summarize the transcription

Combine into a complete app

Install packages

Samples repository

Transcribe an audio file

Summarize the transcription

Combine into a complete app

Install packages

Samples repository

Transcribe an audio file

Summarize the transcription

Combine into a complete app

Clean up resources

Related content

Feedback

Additional resources