채팅 모델 쿼리

중요합니다

베타에서 새로운 Unity AI 게이트웨이 환경을 사용할 수 있습니다. 새로운 Unity AI 게이트웨이는 향상된 기능을 사용하여 LLM 엔드포인트 및 코딩 에이전트를 관리하기 위한 엔터프라이즈 제어 평면입니다. Unity AI 게이트웨이를 사용한 AI 거버넌스를 참조하세요.

이 문서에서는 채팅 및 범용 작업에 최적화되고 Unity AI Gateway에서 제공하는 기본 모델에 대한 쿼리 요청을 작성하는 방법을 알아봅니다.

Tip

지니 코드 (에이전트 모드)는 이 작업을 수행할 수 있습니다. 다음 예제 프롬프트를 사용해 보세요.

Query the databricks-claude-sonnet-4-5 chat model using the OpenAI client. Send a system prompt and a user question, and print the response.

이 문서의 예제는 다음 중 하나를 사용하여 사용할 수 있는 기본 모델을 쿼리하는 데 적용됩니다.

Databricks가 호스팅하는 파운데이션 모델로 불리는 파운데이션 모델 API입니다.
Databricks 외부에서 호스트되는 기본 모델이라고 하는 외부 모델입니다.

요구 사항

요구 사항을 참조하세요.
선택한 쿼리 클라이언트 옵션에 따라 클러스터에 적절한 패키지를 설치합니다.

쿼리 예제

메모

다음 예제는 Unity AI 게이트웨이 및 모델 서비스를 기반으로 합니다. 모델 서비스 대신 엔드포인트를 제공하는 모델을 사용하는 경우 모델 서비스 이름을 엔드포인트 이름으로 바꿉니다. 사용 가능한 파운데이션 모델과 해당 모델 서비스 및 엔드포인트 이름 목록은 Foundation Model API에서 사용할 수 있는 Databricks 호스팅 파운데이션 모델을 참조하세요.

이 섹션의 예제에서는 다른 클라이언트 옵션을 사용하여 Foundation Model API 토큰당 종량제 모델 서비스를 쿼리하는 방법을 보여 줍니다.

OpenAI 채팅 완료

OpenAI 클라이언트를 사용하려면 모델 서비스 이름을 입력으로 model 지정합니다. 다음 예제에서는 Databricks API 토큰이 있고 openai 컴퓨팅에 설치되어 있다고 가정합니다. OpenAI 클라이언트 를 Databricks에 연결하려면 Databricks 작업 영역 인스턴스 도 필요합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/ai-gateway/mlflow/v1"
)

response = client.chat.completions.create(
    model="system.ai.claude-sonnet-4-5",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

예를 들어 REST API를 사용할 때 채팅 모델에 필요한 요청 형식은 다음과 같습니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 매개 변수를 포함할 수 있습니다. 추가 쿼리 매개 변수를 참조하세요.

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 REST API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "model": "databricks-claude-sonnet-4-5",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

OpenAI 응답

중요합니다

이 섹션에서는 OpenAI 모델에 대한 OpenAI 응답 매개 변수의 전체 집합을 지원하는 네이티브 통과인 OpenAI 응답 API에 대해 설명합니다. Anthropic Claude, Google Gemini 또는 Databricks에서 호스트하는 개방형 모델에서 응답 요청 형식을 사용하려면 Open Responses API를 사용하여 모델 쿼리를 참조하세요.

OpenAI 응답 API를 사용하려면 모델 서비스 이름을 입력으로 model 지정합니다. 다음 예제에서는 Azure Databricks API 토큰이 있고 openai 컴퓨팅에 설치되어 있다고 가정합니다. OpenAI 클라이언트 를 Azure Databricks에 연결하려면 Azure Databricks 작업 영역 인스턴스 도 필요합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/ai-gateway/mlflow/v1"
)

response = client.responses.create(
    model="system.ai.gpt-5",
    input=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_output_tokens=256
)

예를 들어 OpenAI 응답 API를 사용할 때 예상되는 요청 형식은 다음과 같습니다. 이 API의 URL 경로는 .입니다 /serving-endpoints/responses.

{
  "model": "databricks-gpt-5",
  "input": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_output_tokens": 100,
  "temperature": 0.1
}

다음은 응답 API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1698824353,
  "model": "databricks-gpt-5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": []
    }
  ],
  "usage": {
    "input_tokens": 7,
    "output_tokens": 74,
    "total_tokens": 81
  }
}

REST API (REST 애플리케이션 프로그래밍 인터페이스)

중요합니다

다음 예제에서는 REST API 매개 변수를 사용하여 외부 모델을 제공하는 서비스 엔드포인트를 쿼리합니다. 이러한 매개 변수는 공개 미리 보기 로 제공되며 정의가 변경될 수 있습니다. POST /serving-endpoints/{name}/invocations를 참조하세요.

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/<your-external-model-endpoint>/invocations \

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 REST API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "model": "databricks-claude-sonnet-4-5",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

MLflow 배포용 SDK

중요합니다

다음 예제에서는 predict()MLflow 배포 SDK의 API를 사용합니다.


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="system.ai.claude-sonnet-4-5",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 REST API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "model": "databricks-claude-sonnet-4-5",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

Databricks Python SDK

이 코드는 작업 영역의 Notebook에서 실행되어야 합니다. Azure Databricks Notebook에서 Python용 Databricks SDK 사용을 참조하세요.

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="system.ai.claude-sonnet-4-5",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 REST API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "model": "databricks-claude-sonnet-4-5",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

지원되는 모델

지원되는 채팅 모델에 대한 Foundation 모델 유형을 참조하세요.

추가 리소스

피드백

이 페이지가 도움이 되었나요?

Last updated on 2026-06-30