콘텐츠 이해 분석기 만들기

5분

Tip

자세한 내용은 텍스트 및 이미지 탭을 참조하세요.

대부분의 시나리오에서는 Content Understanding Studio의 시각적 인터페이스를 사용하여 분석기를 만들고 테스트하는 것이 좋습니다. 그러나 경우에 따라 원하는 콘텐츠 필드에 대한 스키마의 JSON 정의를 API에 제출하여 분석기를 만들 수 있습니다.

분석기 스키마 정의

분석기는 콘텐츠 파일에서 추출하거나 생성하려는 필드를 정의하는 스키마를 기반으로 합니다. 가장 간단하게 스키마는 분석기 정의 예제와 같이 JSON 문서에서 지정할 수 있는 필드 집합입니다.

{
    "description": "Simple business card",
    "baseAnalyzerId": "prebuilt-document",
    "config": {
        "returnDetails": true
    },
    "fieldSchema": {
        "fields": {
            "ContactName": {
                "type": "string",
                "method": "extract",
                "description": "Name on business card"
            },
            "EmailAddress": {
                "type": "string",
                "method": "extract",
                "description": "Email address on business card"
            }
        }
    },
    "models": {
        "completion": "gpt-4.1",
        "embedding": "text-embedding-3-large"
    }
}

사용자 지정 분석기 스키마의 이 예제는 미리 작성된 문서 분석기를 기반으로 하며 명함에서 찾을 것으로 예상되는 두 필드인 ContactName 및 EmailAddress에 대해 설명합니다. 두 필드 모두 문자열 데이터 형식으로 정의되며 문서에서 추출 되어야 합니다. 즉, 문서에 대한 정보를 유추하여 생성 할 수 있는 필드가 아니라 "읽기"가 가능하도록 문자열 값이 문서에 있어야 합니다. 이 개체는 models 분석기가 처리에 사용하는 생성 모델을 지정합니다.

비고

이 예제는 작업 분석기를 만드는 데 필요한 최소한의 정보로 의도적으로 간단합니다. 실제로 스키마에는 다양한 형식의 더 많은 필드가 포함될 가능성이 높으며 분석기 정의에는 더 많은 구성 설정이 포함됩니다. JSON에는 샘플 문서가 포함될 수도 있습니다. 자세한 내용은 Azure Content Understanding API 설명서를 참조하세요 .

Python SDK를 사용하여 분석기 만들기

분석기 정의를 사용하면 Python SDK를 사용하여 분석기를 만들 수 있습니다. ContentUnderstandingClient 클래스는 비동기 생성 프로세스를 처리하는 begin_create_analyzer 메서드를 제공합니다.

from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client
endpoint = "<YOUR_ENDPOINT>"
credential = AzureKeyCredential("<YOUR_API_KEY>")
client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)

# Define the analyzer
analyzer_name = "business_card_analyser"
analyzer_definition = {
    "description": "Simple business card",
    "baseAnalyzerId": "prebuilt-document",
    "config": {"returnDetails": True},
    "fieldSchema": {
        "fields": {
            "ContactName": {
                "type": "string",
                "method": "extract",
                "description": "Name on business card"
            },
            "EmailAddress": {
                "type": "string",
                "method": "extract",
                "description": "Email address on business card"
            }
        }
    },
    "models": {
        "completion": "gpt-4.1",
        "embedding": "text-embedding-3-large"
    }
}

# Create the analyzer and wait for completion
poller = client.begin_create_analyzer(analyzer_name, body=analyzer_definition)
result = poller.result()
print(f"Analyzer created: {result.analyzer_id}")

REST API를 사용하여 분석기 만들기

또는 REST API를 직접 사용할 수 있습니다. JSON 데이터는 분석기 만들기 작업을 시작하기 위해 요청 헤더의 API 키를 사용하여 엔드포인트에 요청으로 PUT 제출됩니다.

요청의 PUT 응답에는 요청을 제출 하여 요청 상태를 확인하는 데 사용할 수 있는 콜백 URL을 제공하는 GET이 헤더에 포함됩니다.

다음 Python 코드는 card.json 파일의 내용에 따라 분석기를 만드는 요청을 제출합니다(앞에서 설명한 JSON 정의를 포함하는 것으로 가정).

import json
import requests

# Get the business card schema
with open("card.json", "r") as file:
    schema_json = json.load(file)

# Use a PUT request to submit the schema for a new analyzer
analyzer_name = "business_card_analyser"

headers = {
    "Ocp-Apim-Subscription-Key": "<YOUR_API_KEY>",
    "Content-Type": "application/json"}

url = f"{<YOUR_ENDPOINT>}/contentunderstanding/analyzers/{analyzer_name}?api-version=2025-11-01"

response = requests.put(url, headers=headers, data=json.dumps(schema_json))

# Get the response and extract the ID assigned to the operation
callback_url = response.headers["Operation-Location"]

# Use a GET request to check the status of the operation
result_response = requests.get(callback_url, headers=headers)

# Keep polling until the operation is complete
status = result_response.json().get("status")
while status == "Running":
    result_response = requests.get(callback_url, headers=headers)
    status = result_response.json().get("status")

print("Done!")

피드백

이 페이지가 도움이 되었나요?