비전 기반 채팅 앱 개발

5분

Tip

자세한 내용은 텍스트 및 이미지 탭을 참조하세요.

멀티모달 모델을 사용하여 비전 기반 채팅에 참여하는 클라이언트 앱을 개발하려면 텍스트 기반 채팅에 사용되는 것과 동일한 기본 기술을 사용할 수 있습니다. 모델이 배포된 엔드포인트에 연결해야 하며, 해당 엔드포인트를 사용하여 모델에 메시지로 구성된 프롬프트를 제출하고 응답을 처리합니다.

주요 차이점은 비전 기반 채팅에 대한 프롬프트에 텍스트 콘텐츠 항목과 이미지 콘텐츠 항목이 모두 포함된 여러 부분으로 구성된 사용자 메시지가 포함된다는 점입니다.

모델에 제출되는 다중 파트 프롬프트의 다이어그램입니다.

응답 API를 사용하여 이미지 기반 프롬프트 제출

응답 API를 사용하여 프롬프트에 이미지를 포함하려면 웹 기반 이미지 파일의 URL을 지정하거나 로컬 이미지를 로드하고 Base64 형식으로 데이터를 인코딩하고 형식으로 URL data:image/jpeg;base64,{image_data} 을 제출합니다("jpeg"를 "png" pr 다른 형식으로 바꿉니다).

다음 Python 예제에서는 응답 API를 사용하여 프롬프트에서 이미지를 제출하는 방법을 보여 줍니다.

# Read the image data from a local file
image_path = Path("dragon-fruit.jpeg")
image_format = "jpeg"
with open(image_path, "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL

# Send the image data in a prompt to the model
response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "developer", "content": "You are an AI assistant for chefs planning recipes."},
        {"role": "user", "content": [  
            { "type": "input_text", "text": "What desserts could I make with this?"},
            { "type": "input_image", "image_url": data_url}
        ] } 
    ]
)
print(response.output_text)

ChatCompletions API를 사용하여 이미지 기반 프롬프트 제출

Azure OpenAI 엔드포인트를 사용하여 응답 API를 지원하지 않는 모델에 프롬프트를 제출할 때 CatCompletions API를 사용할 수 있습니다. 이런 식으로:

# Read the image data from a local file
image_path = Path("orange.jpeg")
image_format = "jpeg"
with open(image_path, "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL

# Send the image data in a prompt to the model
response = client.chat.completions.create(
    model="Phi-4-multimodal-instruct",
    messages=[
        {"role": "system", "content": "You are an AI assistant for chefs planning recipes."},
        { "role": "user", "content": [  
            { "type": "text", "text": "What can I make with this fruit?"},
            { "type": "image_url", "image_url": {"url": data_url}}
        ] }
    ]
)
print(response.choices[0].message.content)

피드백

이 페이지가 도움이 되었나요?