음성 실시간 `2025-10-01` API 참고

음성 라이브 API는 WebSocket 연결을 사용하는 음성 지원 애플리케이션에 실시간 양방향 통신을 제공합니다. 이 API는 음성 인식, 텍스트 음성 합성, 아바타 스트리밍, 애니메이션 데이터, 포괄적인 오디오 처리 기능 등 고급 기능을 지원합니다.

API는 WebSocket 연결을 통해 전송되는 JSON 형식의 이벤트를 사용하여 대화, 오디오 스트림, 아바타 상호작용, 실시간 응답을 관리합니다. 이벤트는 클라이언트 이벤트(클라이언트에서 서버로 전송)와 서버 이벤트(서버에서 클라이언트로 전송)로 분류됩니다.

Key Features

실시간 오디오 처리: PCM16을 포함한 다양한 샘플링 속도와 G.711 코덱 지원
고급 음성 옵션: OpenAI 음성, Azure 맞춤 음성, Azure 표준 음성, Azure 개인 음성
아바타 통합: WebRTC 기반 아바타 스트리밍, 비디오, 애니메이션, 블렌드셰이프
지능형 턴 감지: Azure 의미 VAD와 서버 측 감지를 포함한 여러 VAD 옵션
오디오 향상: 내장된 노이즈 감소 및 에코 캔슬링
기능 호출: 향상된 대화 기능을 위한 도구 통합
유연한 세션 관리: 구성 가능한 출력 방식, 명령어, 응답 매개변수

Client Events

음성 라이브 API는 클라이언트에서 서버로 전송할 수 있는 다음과 같은 클라이언트 이벤트를 지원합니다:

Event	Description
session.update	음성 설정, 출력 모드, 턴 감지 등 세션 설정을 업데이트하세요
session.avatar.connect	WebRTC 협상을 위한 클라이언트 SDP를 제공하여 아바타 연결을 구축하세요
input_audio_buffer.append	입력 오디오 버퍼에 오디오 바이트를 추가하세요
input_audio_buffer.commit	입력 오디오 버퍼를 처리를 위해 커밋합니다.
input_audio_buffer.clear	입력 오디오 버퍼를 지우세요
conversation.item.create	대화 맥락에 새로운 항목을 추가하세요
conversation.item.retrieve	대화에서 특정 항목을 찾아 나오세요
conversation.item.truncate	보조 오디오 메시지를 잘라내기
conversation.item.delete	대화에서 항목을 빼세요
response.create	서버에 모델 추론을 통해 응답을 생성하도록 지시합니다
response.cancel	진행 중인 응답을 취소하세요

session.update

세션 구성을 업데이트하세요. 이 이벤트는 언제든지 전송되어 음성, 출력 모달리티, 턴 감지, 도구 및 기타 세션 매개변수 등을 수정할 수 있습니다. 한 번 세션이 특정 모델로 초기화되면 다른 모델로 변경할 수 없다는 점에 유의하세요.

Event Structure

{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "instructions": "You are a helpful assistant. Be concise and friendly.",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_sampling_rate": 24000,
    "turn_detection": {
      "type": "azure_semantic_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "temperature": 0.8,
    "max_response_output_tokens": "inf"
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"session.update"`
session	RealtimeRequestSession	업데이트할 필드가 있는 세션 구성 객체

Azure Custom Voice 예시

{
  "type": "session.update",
  "session": {
    "voice": {
      "type": "azure-custom",
      "name": "my-custom-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012",
      "temperature": 0.7,
      "style": "cheerful"
    },
    "input_audio_noise_reduction": {
      "type": "azure_deep_noise_suppression"
    },
    "avatar": {
      "character": "lisa",
      "customized": false,
      "video": {
        "resolution": {
          "width": 1920,
          "height": 1080
        },
        "bitrate": 2000000
      }
    }
  }
}

session.avatar.connect

클라이언트의 SDP(세션 설명 프로토콜) 제안을 제공하여 WebRTC 미디어 협상을 위해 아바타 연결을 구축하세요. 이 이벤트는 아바타 기능을 사용할 때 필요합니다.

Event Structure

{
  "type": "session.avatar.connect",
  "client_sdp": "<client_sdp>"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"session.avatar.connect"`
client_sdp	string	클라이언트의 SDP 제공 서비스는 Base64로 인코딩된 WebRTC 연결 구축을 위한 것입니다

input_audio_buffer.append

입력 오디오 버퍼에 오디오 바이트를 추가하세요.

Event Structure

{
  "type": "input_audio_buffer.append",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="
}

Properties

Field	Type	Description
type	string	여야 합니다. `"input_audio_buffer.append"`
audio	string	Base64 인코딩 오디오 데이터

input_audio_buffer.commit

입력 오디오 버퍼를 처리를 위해 커밋합니다.

Event Structure

{
  "type": "input_audio_buffer.commit"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"input_audio_buffer.commit"`

input_audio_buffer.clear

입력 오디오 버퍼를 지우세요.

Event Structure

{
  "type": "input_audio_buffer.clear"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"input_audio_buffer.clear"`

conversation.item.create

대화 맥락에 새로운 항목을 추가하세요. 여기에는 메시지, 함수 호출, 함수 호출 응답이 포함될 수 있습니다. 대화 기록의 특정 위치에 항목을 삽입할 수 있습니다.

Event Structure

{
  "type": "conversation.item.create",
  "previous_item_id": "item_ABC123",
  "item": {
    "id": "item_DEF456",
    "type": "message",
    "role": "user",
    "content": [
      {
        "type": "input_text",
        "text": "Hello, how are you?"
      }
    ]
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.create"`
previous_item_id	string	Optional. 그 다음에 이 항목을 삽입할 아이템의 ID를 알려주세요. 제공되지 않으면 끝에 덧붙입니다
item	RealtimeConversationRequestItem	대화에 추가할 항목

오디오 콘텐츠가 포함된 예시

{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA=",
        "transcript": "Hello there"
      }
    ]
  }
}

함수 호출 출력 예시

{
  "type": "conversation.item.create",
  "item": {
    "type": "function_call_output",
    "call_id": "call_123",
    "output": "{\"location\": \"San Francisco\", \"temperature\": \"70\"}"
  }
}

MCP 승인 응답 예시

{
  "type": "conversation.item.create",
  "item": {
    "type": "mcp_approval_response",
    "approval_request_id": "mcp_approval_req_456",
    "approve": true,
  }
}

conversation.item.retrieve

대화 기록에서 특정 항목을 불러오세요. 이는 노이즈 캔슬링 및 VAD 후 처리된 오디오를 검사하는 데 유용합니다.

Event Structure

{
  "type": "conversation.item.retrieve",
  "item_id": "item_ABC123"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.retrieve"`
item_id	string	회수할 아이템의 ID

conversation.item.truncate

보조 메시지의 오디오 내용을 잘라내세요. 이는 특정 시점에서 재생을 멈추고 서버의 이해를 클라이언트 상태와 동기화하는 데 유용합니다.

Event Structure

{
  "type": "conversation.item.truncate",
  "item_id": "item_ABC123",
  "content_index": 0,
  "audio_end_ms": 5000
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.truncate"`
item_id	string	잘라낼 보조 메시지 항목의 ID
content_index	integer	콘텐츠 부분의 색인을 잘라내기 위해
audio_end_ms	integer	오디오를 잘라내는 시간(밀리초 단위)

conversation.item.delete

대화 기록에서 항목을 삭제하세요.

Event Structure

{
  "type": "conversation.item.delete",
  "item_id": "item_ABC123"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.delete"`
item_id	string	삭제할 항목의 ID

response.create

서버에 모델 추론을 통해 응답을 생성하도록 지시하세요. 이 이벤트는 세션 기본값을 덮어쓰는 응답별 구성을 지정할 수 있습니다.

Event Structure

{
  "type": "response.create",
  "response": {
    "modalities": ["text", "audio"],
    "instructions": "Be extra helpful and detailed.",
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "output_audio_format": "pcm16",
    "temperature": 0.7,
    "max_response_output_tokens": 1000
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.create"`
response	RealtimeResponseOptions	세션 기본값을 덮어쓰는 선택적 응답 구성

도구 선택이 있는 예시

{
  "type": "response.create",
  "response": {
    "modalities": ["text"],
    "tools": [
      {
        "type": "function",
        "name": "get_current_time",
        "description": "Get the current time",
        "parameters": {
          "type": "object",
          "properties": {}
        }
      }
    ],
    "tool_choice": "get_current_time",
    "temperature": 0.3
  }
}

애니메이션 예시

{
  "type": "response.create",
  "response": {
    "modalities": ["audio", "animation"],
    "animation": {
      "model_name": "default",
      "outputs": ["blendshapes", "viseme_id"]
    },
    "voice": {
      "type": "azure-custom",
      "name": "my-expressive-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012",
      "style": "excited"
    }
  }
}

response.cancel

진행 중인 응답을 취소하세요. 이로 인해 즉시 응답 생성과 관련 오디오 출력이 중단됩니다.

Event Structure

{
  "type": "response.cancel"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.cancel"`

input_audio_buffer.append

클라이언트 input_audio_buffer.append 이벤트는 입력 오디오 버퍼에 오디오 바이트를 추가하는 데 사용됩니다. 오디오 버퍼는 임시 저장소로, 나중에 커밋할 수 있습니다.

서버 VAD(음성 활동 감지) 모드에서는 오디오 버퍼를 사용해 음성을 감지하고 서버가 언제 커밋할지 결정합니다. 서버 VAD가 비활성화되면 클라이언트는 각 이벤트에 최대 15 MiB까지 입력할 오디오 양을 선택할 수 있습니다. 예를 들어, 클라이언트에서 작은 청크를 스트리밍하면 VAD가 더 반응할 수 있습니다.

대부분의 다른 클라이언트 이벤트와 달리, 서버는 클라이언트 input_audio_buffer.append 이벤트에 대해 확인 응답을 보내지 않습니다.

Event structure

{
  "type": "input_audio_buffer.append",
  "audio": "<audio>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.append`합니다.
audio	string	Base64로 인코딩된 오디오 바이트. 이 값은 세션 구성의 필드에서 `input_audio_format` 지정한 형식이어야 합니다.

input_audio_buffer.clear

클라이언트 input_audio_buffer.clear 이벤트는 버퍼의 오디오 바이트를 지우는 데 사용됩니다.

서버는 이벤트로 input_audio_buffer.cleared 응답합니다.

Event structure

{
  "type": "input_audio_buffer.clear"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.clear`합니다.

input_audio_buffer.commit

클라이언트 input_audio_buffer.commit 이벤트는 사용자 입력 오디오 버퍼를 커밋하는 데 사용되며, 이는 대화 내 새로운 사용자 메시지 항목을 생성합니다. 세션에 맞게 설정된 오디오는 input_audio_transcription 전사됩니다.

서버 VAD 모드에서는 클라이언트가 이 이벤트를 보낼 필요가 없고, 서버가 오디오 버퍼를 자동으로 커밋합니다. 서버 VAD가 없으면, 클라이언트는 사용자 메시지 항목을 생성하기 위해 오디오 버퍼를 커밋해야 합니다. 이 클라이언트 이벤트는 입력 오디오 버퍼가 비어 있으면 오류를 생성합니다.

입력 오디오 버퍼를 커밋한다고 해서 모델에서 응답이 생성되지 않습니다.

서버는 이벤트로 input_audio_buffer.committed 응답합니다.

Event structure

{
  "type": "input_audio_buffer.commit"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.commit`합니다.

Server Events

음성 라이브 API는 클라이언트에 상태, 응답 및 데이터를 전달하기 위해 다음과 같은 서버 이벤트를 전송합니다:

Event	Description
error	처리 중에 오류가 발생했음을 나타냅니다
warning	대화 흐름을 방해하지 않는 경고가 발생했음을 나타냅니다
session.created	새 세션이 성공적으로 설정되면 전송됩니다
session.updated	세션 구성이 업데이트될 때 전송됩니다
session.avatar.connecting	아바타 WebRTC 연결이 설정되고 있음을 나타냅니다
conversation.item.created	대화에 새로운 항목이 추가될 때 전송됩니다
conversation.item.retrieved	conversation.item.retrieve 요청에 대한 응답
conversation.item.truncated	항목 축소 확인
conversation.item.deleted	항목 삭제 확인
conversation.item.input_audio_transcription.completed	입력 오디오 전사가 완료되었습니다
conversation.item.input_audio_transcription.delta	스트리밍 입력 오디오 전사
conversation.item.input_audio_transcription.failed	입력 오디오 전사 실패
input_audio_buffer.committed	입력 오디오 버퍼는 처리 용도였습니다
input_audio_buffer.cleared	입력 오디오 버퍼가 지워졌습니다
input_audio_buffer.speech_started	입력 오디오 버퍼(VAD)에서 음성 감지
input_audio_buffer.speech_stopped	음성 종료는 입력 오디오 버퍼(VAD)에서 이루어졌습니다
response.created	새로운 대응 생성이 시작되었습니다
response.done	응답 생성이 완료되었습니다
response.output_item.added	응답에 새로운 출력 항목이 추가되었습니다
response.output_item.done	출력 항목이 완료되었습니다
response.content_part.added	출력 항목에 새로운 콘텐츠 파트가 추가되었습니다
response.content_part.done	내용 부분은 완료되었습니다
response.text.delta	모델에서 텍스트 콘텐츠 스트리밍
response.text.done	텍스트 내용이 완성되었습니다
response.audio_transcript.delta	스트리밍 오디오 대본
response.audio_transcript.done	오디오 대본은 완성되었습니다
response.audio.delta	모델에서 오디오 콘텐츠 스트리밍
response.audio.done	오디오 콘텐츠는 완성되었습니다
response.animation_blendshapes.delta	스트리밍 애니메이션 블렌드셰이프 데이터
response.animation_blendshapes.done	애니메이션 블렌드쉐이프 데이터가 완성되었습니다
response.audio_timestamp.delta	스트리밍 오디오 타임스탬프 정보
response.audio_timestamp.done	오디오 타임스탬프 정보가 완전합니다
response.animation_viseme.delta	스트리밍 애니메이션 비셈 데이터
response.animation_viseme.done	애니메이션 바이심 데이터가 완성되었습니다
response.function_call_arguments.delta	스트리밍 함수 호출 인자
response.function_call_arguments.done	함수 호출 인자는 완전합니다
mcp_list_tools.in_progress	MCP 도구 목록 작성이 진행 중입니다
mcp_list_tools.completed	MCP 도구 목록 작성이 완료되었습니다
mcp_list_tools.failed	MCP 공구 목록이 실패했습니다
response.mcp_call_arguments.delta	스트리밍 MCP 호출 인항
response.mcp_call_arguments.done	MCP 호출 인항은 완료되었습니다
response.mcp_call.in_progress	MCP 통화가 진행 중입니다
response.mcp_call.completed	MCP 통화가 완료되었습니다
response.mcp_call.failed	MCP 통화가 실패했습니다

session.created

새 세션이 성공적으로 설정되면 전송됩니다. 이 이벤트는 API에 연결된 후 처음 받는 이벤트입니다.

Event Structure

{
  "type": "session.created",
  "session": {
    "id": "sess_ABC123DEF456",
    "object": "realtime.session",
    "model": "gpt-realtime",
    "modalities": ["text", "audio"],
    "instructions": "You are a helpful assistant.",
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_sampling_rate": 24000,
    "turn_detection": {
      "type": "azure_semantic_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "temperature": 0.8,
    "max_response_output_tokens": "inf"
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"session.created"`
session	RealtimeResponseSession	생성된 세션 객체

session.updated

클라이언트 이벤트에 대응 session.update 하여 세션 구성이 성공적으로 업데이트될 때 전송됩니다.

Event Structure

{
  "type": "session.updated",
  "session": {
    "id": "sess_ABC123DEF456",
    "voice": {
      "type": "azure-custom",
      "name": "my-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012"
    },
    "temperature": 0.7,
    "avatar": {
      "character": "lisa",
      "customized": false
    }
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"session.updated"`
session	RealtimeResponseSession	업데이트된 세션 객체

session.avatar.connecting

아바타 WebRTC 연결이 구축 중임을 나타냅니다. 이 이벤트는 클라이언트 이벤트에 session.avatar.connect 대한 응답으로 전송됩니다.

Event Structure

{
  "type": "session.avatar.connecting",
  "server_sdp": "<server_sdp>"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"session.avatar.connecting"`

conversation.item.created

고객 이벤트를 통해 conversation.item.create 서든 응답 생성 중에 대화에 새로운 항목이 추가될 때 자동으로 전송됩니다.

Event Structure

{
  "type": "conversation.item.created",
  "previous_item_id": "item_ABC123",
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_text",
        "text": "Hello, how are you?"
      }
    ]
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.created"`
previous_item_id	string	이 항목이 삽입된 항목의 ID
item	RealtimeConversationResponseItem	생성된 대화 항목

오디오 아이템 예시

{
  "type": "conversation.item.created",
  "item": {
    "id": "item_GHI789",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "audio": null,
        "transcript": "What's the weather like today?"
      }
    ]
  }
}

conversation.item.retrieved

클라이언트 이벤트에 conversation.item.retrieve 대한 답변으로 보내졌으며, 요청된 대화 항목을 제공했습니다.

Event Structure

{
  "type": "conversation.item.retrieved",
  "item": {
    "id": "item_ABC123",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "audio",
        "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA=",
        "transcript": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      }
    ]
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.retrieved"`
item	RealtimeConversationResponseItem	검색된 대화 항목

conversation.item.truncated

서버 conversation.item.truncated 이벤트는 클라이언트가 이전 보조 오디오 메시지 항목을 이벤트로 conversation.item.truncate 잘라버릴 때 반환됩니다. 이 이벤트는 서버가 오디오를 이해하고 클라이언트의 재생과 동기화하는 데 사용됩니다.

이 이벤트는 오디오를 잘라내고 서버 측 텍스트 전사본을 제거하여 사용자가 모르는 문맥 내 텍스트가 없도록 합니다.

Event structure

{
  "type": "conversation.item.truncated",
  "item_id": "<item_id>",
  "content_index": 0,
  "audio_end_ms": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `conversation.item.truncated`합니다.
item_id	string	단축된 어시스턴트 메시지 항목의 ID입니다.
content_index	integer	내용 부분의 색인이 잘려 나간 부분입니다.
audio_end_ms	integer	오디오가 잘린 시간까지는 밀리초 단위로 측정되었습니다.

conversation.item.deleted

클라이언트 이벤트에 conversation.item.delete 대한 답변으로 보내졌으며, 지정된 항목이 대화에서 삭제되었음을 확인합니다.

Event Structure

{
  "type": "conversation.item.deleted",
  "item_id": "item_ABC123"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"conversation.item.deleted"`
item_id	string	삭제된 항목의 ID

response.created

새로운 응답 생성이 시작되면 전송됩니다. 이것은 반응 시퀀스의 첫 번째 사건입니다.

Event Structure

{
  "type": "response.created",
  "response": {
    "id": "resp_ABC123",
    "object": "realtime.response",
    "status": "in_progress",
    "status_details": null,
    "output": [],
    "usage": {
      "total_tokens": 0,
      "input_tokens": 0,
      "output_tokens": 0
    }
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.created"`
response	RealtimeResponse	생성된 응답 객체

response.done

응답 생성이 완료되면 전송됩니다. 이 이벤트는 모든 출력 항목과 사용 통계가 포함된 최종 응답을 포함합니다.

Event Structure

{
  "type": "response.done",
  "response": {
    "id": "resp_ABC123",
    "object": "realtime.response",
    "status": "completed",
    "status_details": null,
    "output": [
      {
        "id": "item_DEF456",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 87,
      "input_tokens": 52,
      "output_tokens": 35,
      "input_token_details": {
        "cached_tokens": 0,
        "text_tokens": 45,
        "audio_tokens": 7
      },
      "output_token_details": {
        "text_tokens": 15,
        "audio_tokens": 20
      }
    }
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.done"`
response	RealtimeResponse	완성된 응답 객체

response.output_item.added

생성 과정에서 응답에 새로운 출력 항목이 추가될 때 전송됩니다.

Event Structure

{
  "type": "response.output_item.added",
  "response_id": "resp_ABC123",
  "output_index": 0,
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.output_item.added"`
response_id	string	이 항목이 속한 응답의 ID
output_index	integer	응답 출력 배열에 있는 항목의 인덱스
item	RealtimeConversationResponseItem	추가된 출력 항목

response.output_item.done

출력 항목이 완료되면 전송됩니다.

Event Structure

{
  "type": "response.output_item.done",
  "response_id": "resp_ABC123",
  "output_index": 0,
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "text",
        "text": "Hello! I'm doing well, thank you for asking."
      }
    ]
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.output_item.done"`
response_id	string	이 항목이 속한 응답의 ID
output_index	integer	응답 출력 배열에 있는 항목의 인덱스
item	RealtimeConversationResponseItem	완성된 출력 항목

response.content_part.added

서버 response.content_part.added 이벤트는 응답 생성 중 보조 메시지 항목에 새로운 콘텐츠 부분이 추가될 때 반환됩니다.

Event Structure

{
  "type": "response.content_part.added",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "text",
    "text": ""
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.content_part.added"`
response_id	string	응답 식별
item_id	string	이 내용 부분이 속한 아이템의 ID
output_index	integer	응답 내 항목의 색인
content_index	integer	항목 내 이 내용 부분의 색인
part	RealtimeContentPart	추가된 내용 부분

response.content_part.done

서버 response.content_part.done 이벤트는 콘텐츠 부분이 보조 메시지 항목에서 스트리밍이 완료되면 반환됩니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event Structure

{
  "type": "response.content_part.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "text",
    "text": "Hello! I'm doing well, thank you for asking."
  }
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.content_part.done"`
response_id	string	응답 식별
item_id	string	이 내용 부분이 속한 아이템의 ID
output_index	integer	응답 내 항목의 색인
content_index	integer	항목 내 이 내용 부분의 색인
part	RealtimeContentPart	완성된 콘텐츠 부분

response.text.delta

모델에서 텍스트 콘텐츠를 스트리밍하는 것. 모델이 텍스트를 생성하는 동안 점진적으로 전송됩니다.

Event Structure

{
  "type": "response.text.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello! I'm"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.text.delta"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
delta	string	점진적 텍스트 내용

response.text.done

텍스트 콘텐츠 생성이 완료되면 전송됩니다.

Event Structure

{
  "type": "response.text.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.text.done"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
text	string	전체 텍스트 내용

response.audio.delta

모델에서 오디오 콘텐츠를 스트리밍하는 중입니다. 오디오는 base64 인코딩 데이터로 제공됩니다.

Event Structure

{
  "type": "response.audio.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.audio.delta"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
delta	string	Base64 인코딩 오디오 데이터 청크

response.audio.done

오디오 콘텐츠 생성이 완료되면 전송됩니다.

Event Structure

{
  "type": "response.audio.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.audio.done"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인

response.audio_transcript.delta

생성된 오디오 콘텐츠의 스트리밍 전사본.

Event Structure

{
  "type": "response.audio_transcript.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello! I'm doing"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.audio_transcript.delta"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
delta	string	증가 전사 텍스트

response.audio_transcript.done

오디오 대본이 생성되면 전송됩니다.

Event Structure

{
  "type": "response.audio_transcript.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "transcript": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}

Properties

Field	Type	Description
type	string	여야 합니다. `"response.audio_transcript.done"`
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
transcript	string	전체 대본 텍스트

conversation.item.input_audio_transcription.completed

서버 conversation.item.input_audio_transcription.completed 이벤트는 음성 버퍼에 기록된 음성 전사의 결과입니다.

전사는 클라이언트 또는 서버가 입력 오디오 버퍼를 커밋할 때 시작됩니다(모드 상태 server_vad ). 전사는 응답 생성과 비동기적으로 진행되므로, 이 이벤트는 응답 이벤트 전이나 후에 발생할 수 있습니다.

실시간 API 모델은 오디오를 네이티브로 받아들이므로, 입력 전사는 별도의 음성 인식 모델(예: whisper-1.)에서 실행되는 별도의 프로세스입니다. 따라서 전사본은 모델 해석과 다소 다를 수 있으며, 대략적인 가이드로 간주해야 합니다.

Event structure

{
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "<item_id>",
  "content_index": 0,
  "transcript": "<transcript>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `conversation.item.input_audio_transcription.completed`합니다.
item_id	string	오디오가 포함된 사용자 메시지 항목의 ID입니다.
content_index	integer	오디오가 포함된 콘텐츠 부분의 인덱스입니다.
transcript	string	필사된 텍스트.

conversation.item.input_audio_transcription.delta

서버 conversation.item.input_audio_transcription.delta 이벤트는 입력 오디오 전사가 설정되어 있고 사용자 메시지에 대한 전사 요청이 진행 중일 때 반환됩니다. 이 이벤트는 부분적인 전사 결과가 공개될 때마다 제공합니다.

Event structure

{
  "type": "conversation.item.input_audio_transcription.delta",
  "item_id": "<item_id>",
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `conversation.item.input_audio_transcription.delta`합니다.
item_id	string	사용자 메시지 항목의 ID입니다.
content_index	integer	오디오가 포함된 콘텐츠 부분의 인덱스입니다.
delta	string	점진적 전사 텍스트.

conversation.item.input_audio_transcription.failed

서버 conversation.item.input_audio_transcription.failed 이벤트는 입력 오디오 전사가 설정되었을 때 반환되며, 사용자 메시지에 대한 전사 요청이 실패할 때 반환됩니다. 이 이벤트는 다른 error 이벤트와 별도로 진행되어 클라이언트가 관련 항목을 식별할 수 있도록 합니다.

Event structure

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `conversation.item.input_audio_transcription.failed`합니다.
item_id	string	사용자 메시지 항목의 ID입니다.
content_index	integer	오디오가 포함된 콘텐츠 부분의 인덱스입니다.
error	object	전사 오류에 대한 자세한 내용입니다. 다음 표의 중첩 속성을 참조하세요.

Error properties

Field	Type	Description
type	string	오류의 종류.
code	string	오류 코드가 있다면.
message	string	사람이 읽을 수 있는 오류 메시지.
param	string	오류와 관련된 매개변수가 있다면.

response.animation_blendshapes.delta

서버 response.animation_blendshapes.delta 이벤트는 모델이 반응의 일부로 애니메이션 블렌드쉐이프 데이터를 생성할 때 반환됩니다. 이 이벤트는 추가적인 블렌드셰이프 데이터를 제공한다.

Event structure

{
  "type": "response.animation_blendshapes.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "frame_index": 0,
  "frames": [
    [0.0, 0.1, 0.2, ..., 1.0]
    ...
  ]
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.animation_blendshapes.delta`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
frame_index	integer	이 프레임 묶음의 첫 번째 프레임의 인덱스
frames	float 배열의 배열	블렌드셰이프 프레임 배열, 각 프레임은 블렌드셰이프 값들의 배열입니다

response.animation_blendshapes.done

서버 response.animation_blendshapes.done 이벤트는 모델이 애니메이션 블렌드셰이프 데이터를 생성한 후 응답의 일부로 반환됩니다.

Event structure

{
  "type": "response.animation_blendshapes.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.animation_blendshapes.done`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인

response.audio_timestamp.delta

서버 response.audio_timestamp.delta 이벤트는 모델이 응답의 일부로 오디오 타임스탬프 데이터를 생성할 때 반환됩니다. 이 이벤트는 출력 오디오와 텍스트 정렬을 위한 점진적 타임스탬프 데이터를 제공합니다.

Event structure

{
  "type": "response.audio_timestamp.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "audio_offset_ms": 0,
  "audio_duration_ms": 500,
  "text": "Hello",
  "timestamp_type": "word"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio_timestamp.delta`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
audio_offset_ms	integer	오디오 오프셋은 오디오 시작 시점부터 밀리초 단위로 이루어집니다
audio_duration_ms	integer	오디오 세그먼트의 길이 (밀리초 단위)
text	string	이 오디오 타임스탬프에 대응하는 텍스트 세그먼트
timestamp_type	string	현재 지원되는 타임스탬프 유형은 "word"만 지원합니다

response.audio_timestamp.done

오디오 타임스탬프 생성이 완료되면 전송됩니다.

Event Structure

{
  "type": "response.audio_timestamp.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio_timestamp.done`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인

response.animation_viseme.delta

서버 response.animation_viseme.delta 는 모델이 응답의 일부로 애니메이션 비셈 데이터를 생성할 때 반환됩니다. 이 이벤트는 추가적인 비셈 데이터를 제공한다.

Event Structure

{
  "type": "response.animation_viseme.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "audio_offset_ms": 0,
  "viseme_id": 1
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.animation_viseme.delta`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인
audio_offset_ms	integer	오디오 오프셋은 오디오 시작 시점부터 밀리초 단위로 이루어집니다
viseme_id	integer	애니메이션용 입 모양에 대응하는 비셈 ID

response.animation_viseme.done

서버 response.animation_viseme.done 이벤트는 모델이 애니메이션 비셈 데이터를 생성한 후 응답의 일부로 반환됩니다.

Event Structure

{
  "type": "response.animation_viseme.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.animation_viseme.done`합니다.
response_id	string	응답 식별
item_id	string	아이템 식별
output_index	integer	응답 내 항목의 색인
content_index	integer	내용 부분의 색인

error

서버 error 이벤트는 클라이언트 문제나 서버 문제일 수 있는 오류가 발생하면 반환됩니다. 대부분의 오류는 복구 가능하며 세션은 계속 열려 있습니다.

Event structure

{
  "type": "error",
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>",
    "event_id": "<event_id>"
  }
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `error`합니다.
error	object	오류 내용의 세부 사항. 다음 표의 중첩 속성을 참조하세요.

Error properties

Field	Type	Description
type	string	오류의 종류. 예를 들어, "invalid_request_error"과 "server_error"는 오류 유형입니다.
code	string	오류 코드가 있다면.
message	string	사람이 읽을 수 있는 오류 메시지.
param	string	오류와 관련된 매개변수가 있다면.
event_id	string	해당 오류가 발생한 클라이언트 이벤트의 ID입니다.

warning

서버 warning 이벤트는 대화 흐름을 방해하지 않는 경고가 발생하면 반환됩니다. 경고는 정보 제공이며, 세션은 정상적으로 계속됩니다.

Event structure

{
  "type": "warning",
  "warning": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `warning`합니다.
warning	object	경고의 세부 내용. 다음 표의 중첩 속성을 참조하세요.

Warning properties

Field	Type	Description
message	string	사람이 읽을 수 있는 경고 메시지.
code	string	Optional. 경고 코드가 있으면 알려주세요.
param	string	Optional. 경고와 관련된 매개변수가 있다면.

input_audio_buffer.cleared

서버 input_audio_buffer.cleared 이벤트는 클라이언트가 이벤트로 input_audio_buffer.clear 입력 오디오 버퍼를 지우면 반환됩니다.

Event structure

{
  "type": "input_audio_buffer.cleared"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.cleared`합니다.

input_audio_buffer.committed

서버 input_audio_buffer.committed 이벤트는 입력 오디오 버퍼가 커밋될 때 클라이언트에 의해 또는 서버 VAD 모드에서 자동으로 반환됩니다. 속성은 item_id 생성된 사용자 메시지 항목의 ID입니다. 따라서 conversation.item.created 이벤트도 클라이언트에게 전송됩니다.

Event structure

{
  "type": "input_audio_buffer.committed",
  "previous_item_id": "<previous_item_id>",
  "item_id": "<item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.committed`합니다.
previous_item_id	string	새 항목을 삽입하는 이전 항목의 ID입니다.
item_id	string	생성된 사용자 메시지 항목의 ID입니다.

input_audio_buffer.speech_started

음성 제어가 오디오 버퍼에서 감지되면 서버 input_audio_buffer.speech_started 이벤트가 모드로 server_vad 반환됩니다. 이 현상은 오디오가 버퍼에 추가될 때마다 발생할 수 있습니다(이미 음성 신호가 감지되지 않은 경우).

Note

클라이언트는 이 이벤트를 이용해 오디오 재생을 중단하거나 사용자에게 시각적 피드백을 제공하고자 할 수 있습니다.

내담자는 말이 중단되면 이벤트를 받을 input_audio_buffer.speech_stopped 것으로 예상해야 합니다. 속성은 item_id 말이 멈췄을 때 생성되는 사용자 메시지 항목의 ID입니다. item_id 클라이언트가 VAD 활성화 시 오디오 버퍼를 수동으로 커밋하지 않는 한 이벤트에 input_audio_buffer.speech_stopped 포함됩니다.

Event structure

{
  "type": "input_audio_buffer.speech_started",
  "audio_start_ms": 0,
  "item_id": "<item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.speech_started`합니다.
audio_start_ms	integer	세션 중 음성이 처음 감지된 세션 동안 버퍼에 기록된 모든 오디오가 시작된 시점부터 밀리초 단위입니다. 이 속성은 모델에 전송되는 오디오의 시작 시점에 해당하며, 따라서 세션에서 구성된 것도 `prefix_padding_ms` 포함합니다.
item_id	string	말이 멈췄을 때 생성되는 사용자 메시지 항목의 ID.

input_audio_buffer.speech_stopped

서버 input_audio_buffer.speech_stopped 이벤트는 서버가 오디오 버퍼에서 음성 종료를 감지하면 모드로 server_vad 반환됩니다.

서버는 또한 오디오 버퍼에서 생성된 사용자 메시지 항목이 포함된 이벤트를 전송 conversation.item.created 합니다.

Event structure

{
  "type": "input_audio_buffer.speech_stopped",
  "audio_end_ms": 0,
  "item_id": "<item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `input_audio_buffer.speech_stopped`합니다.
audio_end_ms	integer	세션이 시작된 지 말이 멈춘 지 밀리초 만에. 이 속성은 모델에 전송되는 오디오의 끝에 해당하며, 따라서 세션에서 구성된 내용을 `min_silence_duration_ms` 포함합니다.
item_id	string	생성된 사용자 메시지 항목의 ID입니다.

rate_limits.updated

서버 rate_limits.updated 이벤트는 응답 시작 시 업데이트된 속도 제한을 알리기 위해 발송됩니다.

응답이 생성되면 일부 토큰이 출력 토큰에 예약됩니다. 여기에 표시된 요금 한도는 해당 예약을 반영하며, 응답이 완료된 후 이에 맞게 조정됩니다.

Event structure

{
  "type": "rate_limits.updated",
  "rate_limits": [
    {
      "name": "<name>",
      "limit": 0,
      "remaining": 0,
      "reset_seconds": 0
    }
  ]
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `rate_limits.updated`합니다.
rate_limits	RealtimeRateLimitsItem 배열	요율 한도 정보 목록.

response.audio.delta

서버 response.audio.delta 이벤트는 모델에서 생성된 오디오가 업데이트될 때 반환됩니다.

Event structure

{
  "type": "response.audio.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio.delta`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.
delta	string	Base64로 인코딩된 오디오 데이터 델타.

response.audio.done

모델 생성 오디오가 완료되면 서버 response.audio.done 이벤트가 반환됩니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event structure

{
  "type": "response.audio.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio.done`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.

response.audio_transcript.delta

서버 response.audio_transcript.delta 이벤트는 모델 생성 오디오 출력의 전사가 업데이트될 때 반환됩니다.

Event structure

{
  "type": "response.audio_transcript.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio_transcript.delta`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.
delta	string	녹취록 델타.

response.audio_transcript.done

서버 response.audio_transcript.done 이벤트는 모델 생성 오디오 출력의 전사가 스트리밍으로 완료될 때 반환됩니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event structure

{
  "type": "response.audio_transcript.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "transcript": "<transcript>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.audio_transcript.done`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.
transcript	string	오디오의 최종 녹취록입니다.

response.function_call_arguments.delta

서버 response.function_call_arguments.delta 이벤트는 모델이 생성한 함수 호출 인자가 업데이트될 때 반환됩니다.

Event structure

{
  "type": "response.function_call_arguments.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "call_id": "<call_id>",
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.function_call_arguments.delta`합니다.
response_id	string	답변의 식별자.
item_id	string	함수 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
call_id	string	함수 호출의 ID입니다.
delta	string	인수들은 JSON 문자열로 델타를 사용합니다.

response.function_call_arguments.done

서버 response.function_call_arguments.done 이벤트는 모델이 생성한 함수 호출 인자가 스트리밍이 완료되면 반환됩니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event structure

{
  "type": "response.function_call_arguments.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "call_id": "<call_id>",
  "arguments": "<arguments>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.function_call_arguments.done`합니다.
response_id	string	답변의 식별자.
item_id	string	함수 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
call_id	string	함수 호출의 ID입니다.
arguments	string	최종 인수들은 JSON 문자열로 작성됩니다.

mcp_list_tools.in_progress

서버 mcp_list_tools.in_progress 이벤트는 서비스가 MCP 서버에서 사용 가능한 도구를 나열하기 시작할 때 반환됩니다.

Event structure

{
  "type": "mcp_list_tools.in_progress",
  "item_id": "<mcp_list_tools_item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `mcp_list_tools.in_progress`합니다.
item_id	string	처리 중인 MCP 리스트 도구 항목 의 ID입니다.

mcp_list_tools.completed

서버 mcp_list_tools.completed 이벤트는 서비스가 MCP 서버에서 사용 가능한 도구를 나열하는 것을 완료할 때 반환됩니다.

Event structure

{
  "type": "mcp_list_tools.completed",
  "item_id": "<mcp_list_tools_item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `mcp_list_tools.completed`합니다.
item_id	string	처리 중인 MCP 리스트 도구 항목 의 ID입니다.

mcp_list_tools.failed

서버 mcp_list_tools.failed 이벤트는 서비스가 MCP 서버에서 사용 가능한 도구를 목록에 올리지 못할 때 반환됩니다.

Event structure

{
  "type": "mcp_list_tools.failed",
  "item_id": "<mcp_list_tools_item_id>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `mcp_list_tools.failed`합니다.
item_id	string	처리 중인 MCP 리스트 도구 항목 의 ID입니다.

response.mcp_call_arguments.delta

서버 response.mcp_call_arguments.delta 이벤트는 모델 생성 MCP 도구 호출 인자가 업데이트될 때 반환됩니다.

Event structure

{
  "type": "response.mcp_call_arguments.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.mcp_call_arguments.delta`합니다.
response_id	string	답변의 식별자.
item_id	string	MCP 툴 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
delta	string	인수들은 JSON 문자열로 델타를 사용합니다.

response.mcp_call_arguments.done

서버 response.mcp_call_arguments.done 이벤트는 모델 생성 MCP 도구 호출 인자가 스트리밍이 완료되면 반환됩니다.

Event structure

{
  "type": "response.mcp_call_arguments.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "arguments": "<arguments>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.mcp_call_arguments.done`합니다.
response_id	string	답변의 식별자.
item_id	string	MCP 툴 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
arguments	string	최종 인수들은 JSON 문자열로 작성됩니다.

response.mcp_call.in_progress

서버 response.mcp_call.in_progress 이벤트는 MCP 툴 호출이 처리를 시작할 때 반환됩니다.

Event structure

{
  "type": "response.mcp_call.in_progress",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.mcp_call.in_progress`합니다.
item_id	string	MCP 툴 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.

response.mcp_call.completed

MCP 툴 호출이 성공적으로 완료되면 서버 response.mcp_call.completed 이벤트가 반환됩니다.

Event structure

{
  "type": "response.mcp_call.completed",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.mcp_call.completed`합니다.
item_id	string	MCP 툴 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.

response.mcp_call.failed

서버 response.mcp_call.failed 이벤트는 MCP 툴 호출이 실패할 때 반환됩니다.

Event structure

{
  "type": "response.mcp_call.failed",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.mcp_call.failed`합니다.
item_id	string	MCP 툴 호출 항목의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.

response.output_item.added

서버 response.output_item.added 이벤트는 응답 생성 중 새 항목이 생성될 때 반환됩니다.

Event structure

{
  "type": "response.output_item.added",
  "response_id": "<response_id>",
  "output_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.output_item.added`합니다.
response_id	string	해당 항목이 속한 응답의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
item	RealtimeConversationResponseItem	추가된 아이템.

response.output_item.done

서버 response.output_item.done 이벤트는 아이템 스트리밍이 완료되면 반환됩니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event structure

{
  "type": "response.output_item.done",
  "response_id": "<response_id>",
  "output_index": 0
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.output_item.done`합니다.
response_id	string	해당 항목이 속한 응답의 ID입니다.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
item	RealtimeConversationResponseItem	스트리밍이 완료된 항목.

response.text.delta

모델 생성 텍스트가 업데이트될 때 서버 response.text.delta 이벤트가 반환됩니다. 텍스트는 보조 메시지 항목의 내용 부분에 text 해당합니다.

Event structure

{
  "type": "response.text.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.text.delta`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.
delta	string	문자 델타.

response.text.done

서버 response.text.done 이벤트는 모델 생성 텍스트가 스트리밍이 완료되면 반환됩니다. 텍스트는 보조 메시지 항목의 내용 부분에 text 해당합니다.

이 이벤트는 응답이 중단되거나 불완전하거나 취소될 때도 반환됩니다.

Event structure

{
  "type": "response.text.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "text": "<text>"
}

Properties

Field	Type	Description
type	string	이벤트 유형은 반드시 이어야 `response.text.done`합니다.
response_id	string	답변의 식별자.
item_id	string	물건의 신분증.
output_index	integer	응답에서 출력 항목의 인덱스입니다.
content_index	integer	항목의 콘텐츠 배열에 있는 콘텐츠 부분의 인덱스입니다.
text	string	최종 텍스트 내용.

Components

Audio Formats

RealtimeAudioFormat

입력 오디오에 사용되는 기본 오디오 포맷.

Allowed Values:

pcm16 - 16비트 PCM 오디오 포맷
g711_ulaw - G.711 μ법 오디오 포맷
g711_alaw - G.711 A-law(법률) 오디오 포맷

RealtimeOutputAudioFormat

특정 샘플링 속도를 가진 출력 오디오에 사용되는 오디오 포맷입니다.

Allowed Values:

pcm16 - 기본 샘플링 속도(24kHz)에서의 16비트 PCM 오디오 포맷
pcm16_8000hz - 8kHz 샘플링 속도의 16비트 PCM 오디오 포맷
pcm16_16000hz - 16kHz 샘플링 속도의 16비트 PCM 오디오 포맷
g711_ulaw - 8kHz 샘플링 속도의 G.711 μ-로(mu-law) 오디오 포맷
g711_alaw - G.711 8kHz 샘플링 속도의 A-law 오디오 포맷

RealtimeAudioInputTranscriptionSettings

입력 오디오 전사를 위한 설정.

Field	Type	Description
model	string	전사 모델입니다. 와 로 `gpt-realtimegpt-realtime-mini`지원됨: `whisper-1`, `gpt-4o-transcribe`, , `gpt-4o-mini-transcribe`. `gpt-4o-transcribe-diarize` 다른 모든 모델 및 에이전트와 호환됨: `azure-speech`, `mai-transcribe-1` (미리보기)
language	string	BCP-47(예: `en-US`), 또는 ISO-639-1(예 `en`: ), 또는 자동 감지 기능이 있는 다중 언어(예: )의 선택적 언어 코드(예 `en,zh`: ). 이 설정의 권장 사용법은 Azure 음성 음성 지원 언어를 참조하세요.
custom_speech	object	커스텀 음성 모델에 대한 선택적 설정, 모델에만 `azure-speech` 유효합니다.
phrase_list	string[]	선택적 구문 목록은 편향 인식을 암시하며, 모델에만 `azure-speech` 적용됩니다.
prompt	string	전사를 안내하는 선택적 프롬프트 텍스트, , `gpt-4o-transcribegpt-4o-mini-transcribegpt-4o-transcribe-diarize` , 모델에만 `whisper-1`유효합니다.

RealtimeInputAudioNoiseReductionSettings

이는 다음과 같습니다:

실시간 OpenAINoiseReduction 객체
RealtimeAzureDeepNoiseSuppression 객체

RealtimeOpenAINoiseReduction

명시적인 타입 필드가 있는 OpenAI 노이즈 감소 구성으로, 모델 gpt-realtime-mini 에만 gpt-realtime 적용 가능합니다.

Field	Type	Description
type	string	`near_field` 또는 `far_field`

RealtimeAzureDeepNoiseSuppression

입력 오디오 노이즈 감소 설정.

Field	Type	Description
type	string	여야 합니다. `"azure_deep_noise_suppression"`

RealtimeInputAudioEchoCancellationSettings

서버 측 오디오 처리를 위한 에코 캔슬링 구성.

Field	Type	Description
type	string	여야 합니다. `"server_echo_cancellation"`

Voice Configuration

RealtimeVoice

지원되는 모든 음성 구성의 연합.

이는 다음과 같습니다:

RealtimeOpenAIVoice 객체
RealtimeAzureVoice 객체

RealtimeOpenAIVoice

명시적인 타입 필드가 포함된 OpenAI 음성 구성.

Field	Type	Description
type	string	여야 합니다. `"openai"`
name	string	OpenAI 음성 이름: , , , , `sageversemarinshimmercoralechoballadashalloycedar`

RealtimeAzureVoice

Base for Azure 음성 구성. 이것은 여러 유형이 있는 차별된 노동조합입니다:

RealtimeAzureStandardVoice

Azure 표준 음성 구성.

Field	Type	Description
type	string	여야 합니다. `"azure-standard"`
name	string	음성 이름 (비어 있을 수 없음)
temperature	number	Optional. 온도는 0.0에서 1.0 사이입니다
custom_lexicon_url	string	Optional. 사용자 지정 어휘 URL
custom_text_normalization_url	string	Optional. 사용자 지정 텍스트 정규화 URL
prefer_locales	string[]	Optional. Preferred locales 선호하는 지역은 언어의 억양을 바꿉니다. 값이 설정되어 있지 않으면, TTS는 각 언어의 기본 악센트를 사용합니다. 예를 들어, TTS가 영어로 말할 때는 미국식 영어 억양을 사용합니다. 스페인어로 말할 때는 멕시코 스페인어 억양을 사용해요. prefer_locales를 로 `["en-GB", "es-ES"]`설정하면 영어 억양은 영국식 영어이고 스페인 억양은 유럽 스페인어입니다. 그리고 TTS는 프랑스어, 중국어 등 다른 언어도 구사할 수 있습니다.
locale	string	Optional. Locale specification TTS 출력에 대해 로컬을 강제하세요. 설정하지 않으면 TTS는 항상 해당 지역을 사용해 말합니다. 예를 들어, 로케이지를 로 `en-US`설정하면 TTS는 텍스트 내용이 다른 언어로 되어 있어도 항상 미국식 영어 억양을 사용합니다. 그리고 텍스트 내용이 중국어일 경우 TTS는 무음을 출력합니다.
style	string	Optional. Voice style
pitch	string	Optional. Pitch adjustment
rate	string	Optional. 말하기 속도 조정
volume	string	Optional. Volume adjustment

RealtimeAzureCustomVoice

Azure custom voice configuration (custom voices에 선호).

Field	Type	Description
type	string	여야 합니다. `"azure-custom"`
name	string	음성 이름 (비어 있을 수 없음)
endpoint_id	string	엔드포인트 ID (비어 있을 수 없음)
temperature	number	Optional. 온도는 0.0에서 1.0 사이입니다
custom_lexicon_url	string	Optional. 사용자 지정 어휘 URL
custom_text_normalization_url	string	Optional. 사용자 지정 텍스트 정규화 URL
prefer_locales	string[]	Optional. Preferred locales 선호하는 지역은 언어의 억양을 바꿉니다. 값이 설정되어 있지 않으면, TTS는 각 언어의 기본 악센트를 사용합니다. 예를 들어, TTS가 영어로 말할 때는 미국식 영어 억양을 사용합니다. 스페인어로 말할 때는 멕시코 스페인어 억양을 사용해요. prefer_locales를 로 `["en-GB", "es-ES"]`설정하면 영어 억양은 영국식 영어이고 스페인 억양은 유럽 스페인어입니다. 그리고 TTS는 프랑스어, 중국어 등 다른 언어도 구사할 수 있습니다.
locale	string	Optional. Locale specification TTS 출력에 대해 로컬을 강제하세요. 설정하지 않으면 TTS는 항상 해당 지역을 사용해 말합니다. 예를 들어 로칼로 `en-US`설정하면 TTS는 항상 미국식 영어 억양으로 텍스트 내용을 말합니다. 심지어 텍스트 내용이 다른 언어로 되어 있어도 마찬가지입니다. 그리고 텍스트 내용이 중국어일 경우 TTS는 무음을 출력합니다.
style	string	Optional. Voice style
pitch	string	Optional. Pitch adjustment
rate	string	Optional. 말하기 속도 조정
volume	string	Optional. Volume adjustment

Example:

{
  "type": "azure-custom",
  "name": "my-custom-voice",
  "endpoint_id": "12345678-1234-1234-1234-123456789012",
  "temperature": 0.7,
  "style": "cheerful",
  "locale": "en-US"
}

RealtimeAzurePersonalVoice

Azure 개인 음성 설정.

Field	Type	Description
type	string	여야 합니다. `"azure-personal"`
name	string	음성 이름 (비어 있을 수 없음)
temperature	number	Optional. 온도는 0.0에서 1.0 사이입니다
model	string	기본 모델: `DragonLatestNeural`, , `DragonHDOmniLatestNeuralMAI-Voice-1`
custom_lexicon_url	string	Optional. 사용자 지정 어휘 URL
custom_text_normalization_url	string	Optional. 사용자 지정 텍스트 정규화 URL
prefer_locales	string[]	Optional. Preferred locales 선호하는 지역은 언어의 억양을 바꾸는 것입니다. 값이 설정되어 있지 않으면, TTS는 각 언어의 기본 악센트를 사용합니다. 예를 들어, TTS가 영어로 말할 때는 미국식 영어 억양을 사용합니다. 스페인어로 말할 때는 멕시코 스페인어 억양을 사용해요. prefer_locales를 로 `["en-GB", "es-ES"]`설정하면 영어 억양은 영국식 영어이고 스페인 억양은 유럽 스페인어입니다. 그리고 TTS는 프랑스어, 중국어 등 다른 언어도 구사할 수 있습니다.
locale	string	Optional. Locale specification TTS 출력에 대해 로컬을 강제하세요. 설정하지 않으면 TTS는 항상 해당 지역을 사용해 말합니다. 예를 들어 로칼로 `en-US`설정하면 TTS는 항상 미국식 영어 억양으로 텍스트 내용을 말합니다. 심지어 텍스트 내용이 다른 언어로 되어 있어도 마찬가지입니다. 그리고 텍스트 내용이 중국어일 경우 TTS는 무음을 출력합니다.
pitch	string	Optional. Pitch adjustment
rate	string	Optional. 말하기 속도 조정
volume	string	Optional. Volume adjustment

Turn Detection

RealtimeTurnDetection

회전 감지 설정. 이 노조는 여러 종류의 VAD를 지원하는 차별적 노조입니다.

RealtimeServerVAD

기본 VAD 기반 턴 감지.

Field	Type	Description
type	string	여야 합니다. `"server_vad"`
threshold	float	Optional. 활성화 임계값 (0.0-1.0) (기본값: 0.5)
prefix_padding_ms	integer	Optional. 음성 시작 전 오디오 패딩 (기본값: 300)
silence_duration_ms	integer	Optional. 음성 감지 시 침묵 시간 끝 (기본값: 500)
speech_duration_ms	integer	Optional. 최소 발화 시간 (기본값: 200)
end_of_utterance_detection	RealtimeEOUDetection	Optional. 발화 종료 감지 설정
create_response	boolean	Optional. 응답이 생성되는지(기본값: true)를 활성화하거나 비활성화합니다.
interrupt_response	boolean	Optional. 돌입 중단을 활성화하거나 비활성화하세요 (기본값: true).
auto_truncate	boolean	Optional. 중단 시 자동 삭제 (기본값: false)

RealtimeOpenAISemanticVAD

사용자가 말을 마쳤는지 모델을 사용해 판단하는 OpenAI 의미 VAD 구성입니다. 모델 gpt-realtime-mini 에만 gpt-realtime 제공됩니다.

Field	Type	Description
type	string	여야 합니다. `"semantic_vad"`
eagerness	string	Optional. 이는 모델이 사용자를 방해하려는 의지를 조절하여 최대 대기 타임아웃을 조정하는 방법입니다. 전사 모드에서는 모델이 응답하지 않아도 오디오가 청크되는 방식에 영향을 줍니다. 허용되는 값은 다음과 같습니다: - `auto` (기본값)는 와 동치 `medium`이다. - `low` 사용자가 천천히 말할 수 있게 해주고, - `high` 오디오를 최대한 빨리 청크할 예정입니다. 대화 모드에서 모델이 더 자주 응답하거나 전사 이벤트를 더 빠르게 반환하고 싶다면, 열의를 로 `high`설정할 수 있습니다. 반면, 대화 모드에서 사용자가 끊김 없이 말할 수 있게 하거나, 전사 모드에서 더 큰 전사록 청크를 원한다면 열의를 `low`로 설정할 수 있습니다.
create_response	boolean	Optional. 응답이 생성되는지(기본값: true)를 활성화하거나 비활성화합니다.
interrupt_response	boolean	Optional. 돌입 중단을 활성화하거나 비활성화하세요 (기본값: true).

RealtimeAzureSemanticVAD

Azure 의미 VAD는 사용자가 언제 말을 시작하고 말하는지를 의미 음성 모델로 결정하여 소음이 많은 환경에서 보다 강력한 탐지를 제공합니다.

Field	Type	Description
type	string	여야 합니다. `"azure_semantic_vad"`
threshold	float	Optional. 활성화 임계값 (기본값: 0.5)
prefix_padding_ms	integer	Optional. 음성 전 오디오 패딩 (기본값: 300)
silence_duration_ms	integer	Optional. 말하기 종료 시 침묵 시간 (기본값: 500)
end_of_utterance_detection	RealtimeEOUDetection	Optional. EOU 감지 구성
speech_duration_ms	integer	Optional. 최소 발화 시간 (기본값: 80)
remove_filler_words	boolean	Optional. 채우기 단어 제거 (기본값: false)
languages	string[]	Optional. Supports English. 다른 언어들은 무시됩니다(기본값: 없음).
create_response	boolean	Optional. 응답이 생성되는지(기본값: true)를 활성화하거나 비활성화합니다.
interrupt_response	boolean	Optional. 돌입 중단을 활성화하거나 비활성화하세요 (기본값: true).
auto_truncate	boolean	Optional. 중단 시 자동 삭제 (기본값: false)

RealtimeAzureSemanticVADMultilingual

Azure semantic VAD (default variant).

Field	Type	Description
type	string	여야 합니다. `"azure_semantic_vad_multilingual"`
threshold	float	Optional. 활성화 임계값 (기본값: 0.5)
prefix_padding_ms	integer	Optional. 음성 전 오디오 패딩 (기본값: 300)
silence_duration_ms	integer	Optional. 말하기 종료 시 침묵 시간 (기본값: 500)
end_of_utterance_detection	RealtimeEOUDetection	Optional. EOU 감지 구성
speech_duration_ms	integer	Optional. 최소 발화 시간 (기본값: 80)
remove_filler_words	boolean	Optional. 채우기 단어 제거 (기본값: false)
languages	string[]	Optional. 영어, 스페인어, 프랑스어, 이탈리아어, 독일어(DE), 일본어, 포르투갈어, 중국어, 한국어, 힌디어를 지원합니다. 다른 언어들은 무시됩니다(기본값: 없음).
create_response	boolean	Optional. 응답이 생성되는지(기본값: true)를 활성화하거나 비활성화합니다.
interrupt_response	boolean	Optional. 돌입 중단을 활성화하거나 비활성화하세요 (기본값: true).
auto_truncate	boolean	Optional. 중단 시 자동 삭제 (기본값: false)

RealtimeEOUDetection

Azure 발화 종료(EOU)는 사용자가 말을 멈췄음을 표시하면서 자연스러운 일시정지를 허용할 수 있습니다. 발화 종료 감지는 사용자가 인지할 수 있는 지연 시간을 추가하지 않고도 조기 방향 종료 신호를 크게 줄일 수 있습니다.

Field	Type	Description
model	string	`semantic_detection_v1` 영어를 지원하거나 `semantic_detection_v1_multilingual` 영어, 스페인어, 프랑스어, 이탈리아어, 독일어(DE), 일본어, 포르투갈어, 중국어, 한국어, 힌디어 지원
threshold_level	string	Optional. 검출 임계값 수준(`low`, , `medium`, `high` , ), `default`기본값은 `medium` 설정과 같습니다. 설정이 낮을수록 형이 완성될 확률이 더 높아집니다.
timeout_ms	number	Optional. 더 많은 사용자 음성 대기 시간을 밀리초 단위로 제한합니다. 기본값은 1000ms입니다.

Avatar Configuration

RealtimeAvatarConfig

아바타 스트리밍 및 동작 설정.

Field	Type	Description
ice_servers	RealtimeIceServer[]	Optional. WebRTC용 ICE 서버
character	string	아바타의 캐릭터 이름 또는 ID
style	string	Optional. 아바타 스타일 (감정적인 톤, 말하는 스타일)
customized	boolean	아바타가 커스터마이즈되었는지도
video	RealtimeVideoParams	Optional. Video configuration
scene	RealtimeAvatarScene	Optional. 아바타의 줌 레벨, 위치, 회전 및 비디오 프레임 내 이동 진폭 설정
output_protocol	string	Optional. 아바타 스트리밍을 위한 출력 프로토콜. 기본값은 `webrtc`
output_audit_audio	boolean	Optional. 활성화 시, 아바타 출력이 WebRTC를 통해 전달되더라도 검토/디버깅을 위해 WebSocket을 통해 감사 오디오를 전달합니다. 기본값은 `false`

RealtimeIceServer

WebRTC 연결 협상을 위한 ICE 서버 구성.

Field	Type	Description
urls	string[]	ICE 서버 URL (TURN 또는 STUN 엔드포인트)
username	string	Optional. 인증을 위한 사용자 이름
credential	string	Optional. 인증을 위한 자격 증명

RealtimeVideoParams

아바타의 비디오 스트리밍 파라미터.

Field	Type	Description
bitrate	integer	Optional. 비트당 초의 비트레이트 (기본값: 2000000)
codec	string	Optional. 비디오 코덱, 현재 전용(`h264`기본값): `h264`
crop	RealtimeVideoCrop	Optional. Cropping settings
resolution	RealtimeVideoResolution	Optional. Resolution settings

RealtimeVideoCrop

비디오 크롭 직사각형 정의.

Field	Type	Description
top_left	integer[]	왼쪽 상단 모서리 [x, y], 비음수 정수
bottom_right	integer[]	오른쪽 하단 [x, y], 음수가 아닌 정수들

RealtimeVideoResolution

비디오 해상도 사양.

Field	Type	Description
width	integer	픽셀 단위의 너비 (반드시 0이어야 > 함)
height	integer	픽셀 단위의 높이 (0이어야 > 함)

RealtimeAvatarScene

아바타의 줌 레벨, 위치, 회전, 이동 진폭 설정.

Field	Type	Description
zoom	number	Optional. 아바타의 줌 레벨. 범위는 (0, +∞)입니다. 값이 1보다 작으면 확대하고, 값이 1보다 큰 값은 확대됩니다. 기본값은 0입니다
position_x	number	Optional. 아바타의 수평 위치. 범위는 프레임 폭에 비례하여 [-1, 1]입니다. 음수(음수)는 왼쪽으로, 양수는 오른쪽으로 이동합니다. 기본값은 0입니다
position_y	number	Optional. 아바타의 수직 위치. 사거리는 프레임 높이에 비례하는 [-1, 1]입니다. 음수(음수)는 위로, 양수는 아래로 내려갑니다. 기본값은 0입니다
rotation_x	number	Optional. X축(피치)을 중심으로 회전합니다. 범위는 라디안 단위로 [-π, π]입니다. 음수(음수)는 위로, 양수는 아래로 회전합니다. 기본값은 0입니다
rotation_y	number	Optional. Y축을 중심으로 회전하는 것(요). 범위는 라디안 단위로 [-π, π]입니다. 음수(음수)는 왼쪽으로, 양수는 오른쪽으로 회전합니다. 기본값은 0입니다
rotation_z	number	Optional. Z축을 중심으로 회전(롤). 범위는 라디안 단위로 [-π, π]입니다. 음수값은 반시계 방향으로 회전하고, 양수들은 시계 방향으로 회전합니다. 기본값은 0입니다
amplitude	number	Optional. 아바타 움직임의 진폭. 범위는 (0, 1)입니다. (0, 1) 값은 감소된 진폭을 의미하고, 1은 전체 진폭을 의미합니다. 기본값은 0입니다

Animation Configuration

RealtimeAnimation

블렌드셰이프와 비셈을 포함한 애니메이션 출력 설정.

Field	Type	Description
model_name	string	Optional. 애니메이션 모델 이름 (기본값: `"default"`)
outputs	RealtimeAnimationOutputType[]	Optional. 출력 유형 (기본값): `["blendshapes"]`

RealtimeAnimationOutputType

출력할 애니메이션 데이터 종류.

Allowed Values:

blendshapes - 얼굴 블렌드셰이프 데이터
viseme_id - 비세메 식별자 데이터

Session Configuration

RealtimeRequestSession

이벤트에서 session.update 사용되는 세션 구성 객체입니다.

Field	Type	Description
model	string	Optional. 사용할 모델 이름
modalities	RealtimeModality[]	Optional. 세션에서 지원되는 출력 방식들. 예를 들어, "모달리티": ["텍스트", "오디오"]는 텍스트와 오디오 출력 모달리티 모두를 가능하게 하는 기본 설정입니다. 텍스트 출력만 활성화하려면 "modalities": ["text"]를 설정하세요. 아바타 출력을 활성화하려면 "모달리티"를 설정하세요: ["텍스트", "오디오", "아바타"]. 오디오만 활성화할 수는 없습니다.
animation	RealtimeAnimation	Optional. Animation configuration
voice	RealtimeVoice	Optional. Voice configuration
instructions	string	Optional. 모델 시스템 설명서. 이 지침은 OpenAI 음성을 사용할 경우 출력 오디오를 안내할 수 있지만, Azure 음성에는 적용되지 않을 수 있습니다.
input_audio_sampling_rate	integer	Optional. 입력 오디오 샘플링 속도 Hz 기준 (기본값: 24000 `pcm16`/ `g711_ulaw` 및 `g711_alaw`)
input_audio_format	RealtimeAudioFormat	Optional. 입력 오디오 형식 (기본값: `pcm16`)
output_audio_format	RealtimeOutputAudioFormat	Optional. 출력 오디오 형식 (기본값: `pcm16`)
input_audio_noise_reduction	RealtimeInputAudioNoiseReductionSettings	입력 오디오 노이즈 감소 설정. 이 기능을 null로 설정하면 꺼질 수 있습니다. 노이즈 감소는 입력 오디오 버퍼에 추가된 오디오가 VAD와 모델로 전송되기 전에 필터링합니다. 오디오 필터링은 입력 오디오의 인식을 개선하여 VD 및 턴 감지 정확도(오탐 감소)와 모델 성능을 향상시킬 수 있습니다. 이 속성은 공무 가능하다.
input_audio_echo_cancellation	RealtimeInputAudioEchoCancellationSettings	입력 오디오 에코 캔슬링 구성. 이 기능을 null로 설정하면 꺼질 수 있습니다. 이 서비스 측 에코 캔슬링은 에코와 잔향의 영향을 줄여 입력 오디오의 품질을 향상시키는 데 도움을 줍니다. 이 속성은 공무 가능하다.
input_audio_transcription	RealtimeAudioInputTranscriptionSettings	입력 오디오 전사를 위한 구성입니다. 기본적으로 설정은 null(꺼짐)입니다. 입력 오디오 전사는 모델에 고유적으로 적용되지 않는데, 모델은 오디오를 직접 소비하기 때문입니다. 전사는 엔드포인트를 `/audio/transcriptions` 비동기적으로 진행하며, 모델이 정확히 들은 내용보다는 입력된 오디오 콘텐츠에 대한 안내로 다뤄야 합니다. 전사 서비스에 대한 추가 안내를 위해 고객은 선택적으로 언어와 전사 프롬프트를 설정할 수 있습니다. 이 속성은 공무 가능하다.
turn_detection	RealtimeTurnDetection	세션의 턴 감지 설정입니다. 이 기능을 null로 설정하면 꺼질 수 있습니다.
tools	RealtimeTool 배열	세션에서 모델이 사용할 수 있는 도구들.
tool_choice	RealtimeToolChoice	세션 도구 선택. 허용된 값: `auto`, `none`, 그리고 `required`. 그렇지 않으면 사용할 함수 이름을 지정할 수 있습니다.
temperature	number	모델의 샘플링 온도입니다. 허용 온도 값은 [0.6, 1.2]로 제한됩니다. 기본값은 0.8입니다.
max_response_output_tokens	정수 또는 "inf"	도구 호출을 포함한 어시스턴트 응답당 최대 출력 토큰 수. 출력 토큰을 제한하기 위해 1부터 4096 사이의 정수를 지정하세요. 그렇지 않으면 값을 "inf"로 설정해 최대 토큰 수를 허용하세요. 예를 들어, 출력 토큰을 1000개로 제한하려면 를 설정하세요 `"max_response_output_tokens": 1000`. 최대 토큰 수를 허용하려면 를 설정하세요 `"max_response_output_tokens": "inf"`. 기본값은 `"inf"`입니다.
reasoning_effort	ReasoningEffort	Optional. 추론 모델에 대한 추론 노력을 제한합니다. 자세한 내용은 Azure Foundry doc에서 확인하세요. 추론 노력을 줄이면 응답 속도가 빨라지고 추론에 사용되는 토큰 수가 줄어듭니다.
avatar	RealtimeAvatarConfig	Optional. Avatar configuration
output_audio_timestamp_types	RealtimeAudioTimestampType[]	Optional. 출력 오디오용 타임스탬프 유형

RealtimeModality

세션 출력 방식도 지원됩니다.

Allowed Values:

text - 텍스트 출력
audio - 오디오 출력
animation - 애니메이션 출력
avatar - 아바타 비디오 출력

RealtimeAudioTimestampType

오디오 응답 콘텐츠에서 지원되는 출력 타임스탬프 유형.

Allowed Values:

word - 출력 오디오에서 단어당 타임스탬프

ReasoningEffort

추론 모델에 대한 추론 노력을 제한합니다. 각 모델의 지원 값을 모델 문서에서 확인하세요. 추론 노력을 줄이면 응답 속도가 빨라지고 추론에 사용되는 토큰 수가 줄어듭니다.

Allowed Values:

none - 추론 노력 없음
minimal - 최소한의 추론 노력
low - 낮은 추론 노력 - 적은 추론으로 더 빠른 응답
medium - 중간 수준의 추론 노력 - 속도와 깊이의 균형
high - 높은 추론 노력 - 더 철저한 추론, 시간이 더 걸릴 수 있음
xhigh - 초고도의 추론 노력 - 최대 추론 깊이

Tool Configuration

저희는 함수 호출과 MCP 서버에 연결할 수 있는 MCP 도구 두 가지 유형의 도구를 지원합니다.

RealtimeTool

함수 호출을 위한 도구 정의.

Field	Type	Description
type	string	여야 합니다. `"function"`
name	string	Function name
description	string	기능 설명 및 사용 지침
parameters	object	JSON 스키마 객체로서의 함수 매개변수

RealtimeToolChoice

도구 선택 전략.

이는 다음과 같습니다:

"auto" - 모델이 선택하게 한다
"none" - 도구 쓰지 마
"required" - 도구를 사용해야 합니다
{ "type": "function", "name": "function_name" } - 특정 기능을 사용하는

MCPTool

MCP 도구 구성.

Field	Type	Description
type	string	여야 합니다. `"mcp"`
server_label	string	Required. MCP 서버의 라벨입니다.
server_url	string	Required. MCP 서버의 서버 URL입니다.
allowed_tools	string[]	Optional. 허용된 도구 이름 목록. 명시하지 않은 경우, 모든 도구를 사용할 수 있습니다.
headers	object	Optional. MCP 요청에 포함할 추가 헤더.
authorization	string	Optional. MCP 요청을 위한 권한 토큰.
require_approval	문자열 또는 사전	Optional. 문자열로 설정하면 값은 반`always`드시 또는 이어야 `never` 합니다. 사전으로 설정할 경우, `{"never": ["<tool_name_1>", "<tool_name_2>"], "always": ["<tool_name_3>"]}`형식 . 기본 값은 `always`입니다. 로 설정 `always`하면 도구 실행은 승인이 필요하며, mcp 인자가 완료되면 mcp_approval_request 클라이언트에 전송되고, mcp_approval_response 와 `approve=true` 가 수신될 때만 실행됩니다. 로 설정 `never`하면 도구가 승인 없이 자동으로 실행됩니다.

RealtimeConversationResponseItem

이것은 다음 중 하나일 수 있는 유니언 유형입니다:

RealtimeConversationUserMessageItem

사용자 메시지 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"message"`
object	string	여야 합니다. `"conversation.item"`
role	string	여야 합니다. `"user"`
content	RealtimeInputTextContentPart	메시지의 내용입니다.
status	RealtimeItemStatus	아이템의 상태.

RealtimeConversationAssistantMessageItem

보조 메시지 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"message"`
object	string	여야 합니다. `"conversation.item"`
role	string	여야 합니다. `"assistant"`
content	RealtimeOutputTextContentPart[] 또는 RealtimeOutputAudioContentPart[]	메시지의 내용입니다.
status	RealtimeItemStatus	아이템의 상태.

RealtimeConversationSystemMessageItem

시스템 메시지 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"message"`
object	string	여야 합니다. `"conversation.item"`
role	string	여야 합니다. `"system"`
content	RealtimeInputTextContentPart[]	메시지의 내용입니다.
status	RealtimeItemStatus	아이템의 상태.

RealtimeConversationFunctionCallItem

함수 호출 요청 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"function_call"`
object	string	여야 합니다. `"conversation.item"`
name	string	호출할 함수의 이름입니다.
arguments	string	함수 호출의 인자들은 JSON 문자열로 사용됩니다.
call_id	string	함수 호출의 고유 ID.
status	RealtimeItemStatus	아이템의 상태.

RealtimeConversationFunctionCallOutputItem

함수 호출 응답 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"function_call_output"`
object	string	여야 합니다. `"conversation.item"`
name	string	호출된 함수의 이름입니다.
output	string	함수 호출의 출력입니다.
call_id	string	함수 호출의 고유 ID.
status	RealtimeItemStatus	아이템의 상태.

RealtimeConversationMCPListToolsItem

MCP 목록 도구 응답 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"mcp_list_tools"`
server_label	string	MCP 서버의 라벨입니다.

RealtimeConversationMCPCallItem

MCP 콜 응답 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"mcp_call"`
server_label	string	MCP 서버의 라벨입니다.
name	string	호출할 도구의 이름.
approval_request_id	string	MCP 통화 승인 요청 ID입니다.
arguments	string	MCP 콜에 대한 논거입니다.
output	string	MCP 호출의 출력입니다.
error	object	이 오류는 MCP 호출이 실패했는지 상세히 보여줍니다.

RealtimeConversationMCPApprovalRequestItem

MCP 승인 요청 항목.

Field	Type	Description
id	string	아이템의 고유 ID입니다.
type	string	여야 합니다. `"mcp_approval_request"`
server_label	string	MCP 서버의 라벨입니다.
name	string	호출할 도구의 이름.
arguments	string	MCP 콜에 대한 논거입니다.

RealtimeItemStatus

대화 항목 현황.

Allowed Values:

in_progress - 현재 처리 중
completed - 성공적으로 완료됨
incomplete - 불완전(중단 또는 실패)

RealtimeContentPart

메시지 내 내용 부분.

RealtimeInputTextContentPart

텍스트 내용 부분.

Field	Type	Description
type	string	여야 합니다. `"input_text"`
text	string	본문 내용

RealtimeOutputTextContentPart

텍스트 내용 부분.

Field	Type	Description
type	string	여야 합니다. `"text"`
text	string	본문 내용

RealtimeInputAudioContentPart

오디오 콘텐츠 부분입니다.

Field	Type	Description
type	string	여야 합니다. `"input_audio"`
audio	string	Optional. Base64 인코딩 오디오 데이터
transcript	string	Optional. Audio transcript

RealtimeOutputAudioContentPart

오디오 콘텐츠 부분입니다.

Field	Type	Description
type	string	여야 합니다. `"audio"`
audio	string	Base64 인코딩 오디오 데이터
transcript	string	Optional. Audio transcript

Response Objects

RealtimeResponse

모델 추론 응답을 나타내는 응답 객체.

Field	Type	Description
id	string	Optional. Response ID
object	string	Optional. 항상 `"realtime.response"`
status	RealtimeResponseStatus	Optional. Response status
status_details	RealtimeResponseStatusDetails	Optional. Status details
output	RealtimeConversationResponseItem[]	Optional. Output items
usage	RealtimeUsage	Optional. 토큰 사용 통계
conversation_id	string	Optional. 연관된 대화 ID
voice	RealtimeVoice	Optional. 응답에 사용되는 음성
modalities	string[]	Optional. 사용되는 출력 양상
output_audio_format	RealtimeOutputAudioFormat	Optional. 사용되는 오디오 포맷
temperature	number	Optional. Temperature used
max_response_output_tokens	정수 또는 "inf"	Optional. 최대 사용 토큰

RealtimeResponseStatus

응답 상태 값.

Allowed Values:

in_progress - 응답이 생성되고 있습니다
completed - 응답 완료
canceled - 응답이 취소되었습니다
incomplete - 응답 불완전 (중단됨)
failed - 응답 실패 및 오류

RealtimeUsage

토큰 사용 통계.

Field	Type	Description
total_tokens	integer	총 사용 토큰
input_tokens	integer	사용되는 입력 토큰
output_tokens	integer	생성된 출력 토큰
input_token_details	TokenDetails	입력 토큰의 분류
output_token_details	TokenDetails	출력 토큰의 분류

TokenDetails

상세한 토큰 사용 내역입니다.

Field	Type	Description
cached_tokens	integer	Optional. 사용되는 캐시 토큰
text_tokens	integer	Optional. 사용되는 텍스트 토큰
audio_tokens	integer	Optional. 사용되는 오디오 토큰

Error Handling

RealtimeErrorDetails

오류 정보 객체.

Field	Type	Description
type	string	오류 유형 (예: `"invalid_request_error"`, ) `"server_error"`
code	string	Optional. 특정 오류 코드
message	string	사람이 읽을 수 있는 오류 설명
param	string	Optional. 오류와 관련된 매개변수
event_id	string	Optional. 오류를 일으킨 클라이언트 이벤트의 ID

RealtimeConversationRequestItem

conversation.item.create 이벤트를 통해 대화 내 새 항목을 생성하기 위해 이 RealtimeConversationRequestItem 객체를 사용합니다.

이것은 다음 중 하나일 수 있는 유니언 유형입니다:

RealtimeSystemMessageItem

시스템 메시지 항목입니다.

Field	Type	Description
type	string	물건의 종류. 허용 값: `message`
role	string	메시지의 역할. 허용 값: `system`
content	RealtimeInputTextContentPart 배열	메시지의 내용입니다.
id	string	아이템의 고유 ID입니다. 클라이언트는 서버 측 컨텍스트를 관리하는 데 도움을 주기 위해 ID를 지정할 수 있습니다. 클라이언트가 ID를 제공하지 않으면 서버가 ID를 생성합니다.

RealtimeUserMessageItem

사용자 메시지 항목입니다.

Field	Type	Description
type	string	물건의 종류. 허용 값: `message`
role	string	메시지의 역할. 허용 값: `user`
content	RealtimeInputTextContentPart 또는 RealtimeInputAudioContentPart 배열입니다	메시지의 내용입니다.
id	string	아이템의 고유 ID입니다. 클라이언트는 서버 측 컨텍스트를 관리하는 데 도움을 주기 위해 ID를 지정할 수 있습니다. 클라이언트가 ID를 제공하지 않으면 서버가 ID를 생성합니다.

RealtimeAssistantMessageItem

보조 메시지 항목.

Field	Type	Description
type	string	물건의 종류. 허용 값: `message`
role	string	메시지의 역할. 허용 값: `assistant`
content	RealtimeOutputTextContentPart 배열	메시지의 내용입니다.

RealtimeFunctionCallItem

함수 호출 항목입니다.

Field	Type	Description
type	string	물건의 종류. 허용 값: `function_call`
name	string	호출할 함수의 이름입니다.
arguments	string	함수 인자를 JSON 문자열로 호출합니다.
call_id	string	함수 호출 항목의 ID입니다.
id	string	아이템의 고유 ID입니다. 클라이언트는 서버 측 컨텍스트를 관리하는 데 도움을 주기 위해 ID를 지정할 수 있습니다. 클라이언트가 ID를 제공하지 않으면 서버가 ID를 생성합니다.

RealtimeFunctionCallOutputItem

출력 항목을 호출하는 함수입니다.

Field	Type	Description
type	string	물건의 종류. 허용 값: `function_call_output`
call_id	string	함수 호출 항목의 ID입니다.
output	string	함수 호출의 출력은 함수 결과가 있는 자유 형식 문자열이며, 이 역시 비어 있을 수 있습니다.
id	string	아이템의 고유 ID입니다. 클라이언트가 ID를 제공하지 않으면 서버가 ID를 생성합니다.

RealtimeMCPApprovalResponseItem

MCP 승인 응답 항목입니다.

Field	Type	Description
type	string	물건의 종류. 허용 값: `mcp_approval_response`
approve	boolean	MCP 요청이 승인되었는지 여부.
approval_request_id	string	MCP 승인 요청의 ID입니다.
id	string	아이템의 고유 ID입니다. 클라이언트는 서버 측 컨텍스트를 관리하는 데 도움을 주기 위해 ID를 지정할 수 있습니다. 클라이언트가 ID를 제공하지 않으면 서버가 ID를 생성합니다.

RealtimeFunctionTool

실시간 엔드포인트가 사용하는 함수 도구의 정의.

Field	Type	Description
type	string	도구의 종류. 허용 값: `function`
name	string	함수 이름입니다.
description	string	기능 설명과 사용 지침. 예를 들어, "이 함수를 사용해 현재 시간을 얻으세요."
parameters	object	함수의 매개변수는 JSON 객체 형태로 이루어져 있습니다.

RealtimeItemStatus

Allowed Values:

in_progress
completed
incomplete

RealtimeResponseAudioContentPart

Field	Type	Description
type	string	내용의 종류. 허용 값: `audio`
transcript	string	오디오 대본입니다. 이 속성은 공무 가능하다.

RealtimeResponseFunctionCallItem

Field	Type	Description
type	string	물건의 종류. 허용 값: `function_call`
name	string	함수 호출 항목의 이름입니다.
call_id	string	함수 호출 항목의 ID입니다.
arguments	string	함수의 인자 집합이 항목을 호출합니다.
status	RealtimeItemStatus	아이템의 상태.

RealtimeResponseFunctionCallOutputItem

Field	Type	Description
type	string	물건의 종류. 허용 값: `function_call_output`
call_id	string	함수 호출 항목의 ID입니다.
output	string	함수 호출 항목의 출력입니다.

RealtimeResponseOptions

Field	Type	Description
modalities	array	응답의 출력 양식들. 허용된 값: `text`, `audio` 예를 들어, 는 `"modalities": ["text", "audio"]` 텍스트와 오디오 출력 모드를 모두 가능하게 하는 기본 설정입니다. 텍스트 출력만 활성화하려면 .을 설정하세요 `"modalities": ["text"]`. 오디오만 활성화할 수는 없습니다.
instructions	string	모델의 응답을 안내하는 명령어(시스템 메시지)입니다.
voice	RealtimeVoice	세션의 모델 응답에 사용된 음성입니다. 한 번 모델의 오디오 응답에 음성 사용이 완료되면 변경할 수 없습니다.
tools	RealtimeTool 배열	세션에서 모델이 사용할 수 있는 도구들.
tool_choice	RealtimeToolChoice	세션 도구 선택.
temperature	number	모델의 샘플링 온도입니다. 허용 온도 값은 [0.6, 1.2]로 제한됩니다. 기본값은 0.8입니다.
max_response_output_tokens	정수 또는 "inf"	도구 호출을 포함한 어시스턴트 응답당 최대 출력 토큰 수. 출력 토큰을 제한하기 위해 1부터 4096 사이의 정수를 지정하세요. 그렇지 않으면 값을 "inf"로 설정해 최대 토큰 수를 허용하세요. 예를 들어, 출력 토큰을 1000개로 제한하려면 를 설정하세요 `"max_response_output_tokens": 1000`. 최대 토큰 수를 허용하려면 를 설정하세요 `"max_response_output_tokens": "inf"`. 기본값은 `"inf"`입니다.
reasoning_effort	ReasoningEffort	Optional. 추론 모델에 대한 추론 노력을 제한합니다. 각 모델의 지원 값을 모델 문서에서 확인하세요. 추론 노력을 줄이면 응답 속도가 빨라지고 추론에 사용되는 토큰 수가 줄어듭니다.
conversation	string	응답이 추가되는 대화를 제어합니다. 지원되는 값은 `auto` 와 `none`입니다. 값(또는 이 속성을 설정하지 않음)은 `auto` 응답의 내용이 세션의 기본 대화에 추가되도록 보장합니다. 이 속성을 설정 `none` 하면 기본 대화에 항목이 추가되지 않는 아웃 오브 밴드 응답을 생성하세요. 기본값은 `"auto"`
metadata	map	객체에 부착할 수 있는 최대 16개의 키-값 쌍으로 설정됩니다. 이는 객체에 대한 추가 정보를 구조화된 형식으로 저장하는 데 유용할 수 있습니다. 키는 최대 64자, 값은 최대 512자까지 가능합니다. 예를 들어: `metadata: { topic: "classification" }`

RealtimeResponseSession

객체는 RealtimeResponseSession Realtime API에서 세션을 나타냅니다. 서버 이벤트에서 다음과 같은 것들이 사용됩니다:

session.created
session.updated

Field	Type	Description
object	string	세션 객체입니다. 허용 값: `realtime.session`
id	string	세션의 고유 ID.
model	string	세션에 사용된 모델입니다.
modalities	array	세션의 출력 방식들. 허용된 값: `text`, `audio` 예를 들어, 는 `"modalities": ["text", "audio"]` 텍스트와 오디오 출력 모드를 모두 가능하게 하는 기본 설정입니다. 텍스트 출력만 활성화하려면 .을 설정하세요 `"modalities": ["text"]`. 오디오만 활성화할 수는 없습니다.
instructions	string	모델의 텍스트와 오디오 응답을 안내하는 지침(시스템 메시지)입니다. 다음은 텍스트와 오디오 응답의 내용과 형식을 안내하는 데 도움이 되는 몇 가지 예시 지침입니다: `"instructions": "be succinct"` `"instructions": "act friendly"` `"instructions": "here are examples of good responses"` 다음은 오디오 동작을 안내하는 데 도움이 되는 몇 가지 예시 지침입니다: `"instructions": "talk quickly"` `"instructions": "inject emotion into your voice"` `"instructions": "laugh frequently"` 모델이 항상 이러한 지침을 따르지는 않을 수 있지만, 원하는 동작에 대한 지침을 제공합니다.
voice	RealtimeVoice	세션의 모델 응답에 사용된 음성입니다. 한 번 모델의 오디오 응답에 음성 사용이 완료되면 변경할 수 없습니다.
input_audio_sampling_rate	integer	입력 오디오의 샘플링 속도입니다.
input_audio_format	RealtimeAudioFormat	입력 오디오의 형식.
output_audio_format	RealtimeAudioFormat	출력 오디오 형식.
input_audio_transcription	RealtimeAudioInputTranscriptionSettings	오디오 입력 전사 설정입니다. 이 속성은 공무 가능하다.
turn_detection	RealtimeTurnDetection	세션의 턴 감지 설정입니다. 이 속성은 공무 가능하다.
tools	RealtimeTool 배열	세션에서 모델이 사용할 수 있는 도구들.
tool_choice	RealtimeToolChoice	세션 도구 선택.
temperature	number	모델의 샘플링 온도입니다. 허용 온도 값은 [0.6, 1.2]로 제한됩니다. 기본값은 0.8입니다.
max_response_output_tokens	정수 또는 "inf"	도구 호출을 포함한 어시스턴트 응답당 최대 출력 토큰 수. 출력 토큰을 제한하기 위해 1부터 4096 사이의 정수를 지정하세요. 그렇지 않으면 값을 "inf"로 설정해 최대 토큰 수를 허용하세요. 예를 들어, 출력 토큰을 1000개로 제한하려면 를 설정하세요 `"max_response_output_tokens": 1000`. 최대 토큰 수를 허용하려면 를 설정하세요 `"max_response_output_tokens": "inf"`.

RealtimeResponseStatusDetails

Field	Type	Description
type	RealtimeResponseStatus	응답 상태.

RealtimeRateLimitsItem

Field	Type	Description
name	string	이 항목에 포함된 요금 한도 부동산 이름에 관한 정보도 포함됩니다.
limit	integer	이 요금 한도 부동산의 최대 설정 한도입니다.
remaining	integer	이 요율 한도 부동산의 설정된 한도에 대해 남은 할당량을 사용합니다.
reset_seconds	number	이 속도 제한 속성이 재설정될 때까지 남은 시간은 초 단위입니다.

보이스 라이브 퀵스타트를 시도해 보세요
Voice Live 상담원 퀵스타트를 시도해 보세요
Voice Live API를 사용하는 방법에 대해 자세히 알아보기

피드백

이 페이지가 도움이 되었나요?

Last updated on 2026-05-08

음성 실시간 2025-10-01 API 참고

Key Features

Client Events

session.update

Event Structure

Properties

Azure Custom Voice 예시

session.avatar.connect

Event Structure

Properties

input_audio_buffer.append

Event Structure

Properties

input_audio_buffer.commit

Event Structure

Properties

input_audio_buffer.clear

Event Structure

Properties

conversation.item.create

Event Structure

Properties

오디오 콘텐츠가 포함된 예시

함수 호출 출력 예시

MCP 승인 응답 예시

conversation.item.retrieve

Event Structure

Properties

conversation.item.truncate

Event Structure

Properties

conversation.item.delete

Event Structure

Properties

response.create

Event Structure

Properties

도구 선택이 있는 예시

애니메이션 예시

response.cancel

Event Structure

Properties

input_audio_buffer.append

Event structure

Properties

input_audio_buffer.clear

Event structure

Properties

input_audio_buffer.commit

Event structure

Properties

Server Events

session.created

Event Structure

Properties

session.updated

Event Structure

Properties

session.avatar.connecting

Event Structure

Properties

conversation.item.created

Event Structure

Properties

오디오 아이템 예시

conversation.item.retrieved

Event Structure

Properties

conversation.item.truncated

Event structure

Properties

conversation.item.deleted

Event Structure

Properties

response.created

Event Structure

Properties

response.done

Event Structure

Properties

음성 실시간 `2025-10-01` API 참고