How to Identify Animals in Images Using GPT-4 Vision

3 min readMay 12, 2024

GPT-4 Vision is an iteration of OpenAI’s Generative Pre-trained Transformer (GPT) models, specifically tailored for image recognition tasks. Unlike traditional computer vision models that rely on handcrafted features and algorithms, GPT-4 Vision utilizes deep learning techniques to understand and interpret images in a more nuanced manner. With its immense computational power and extensive training data, GPT-4 Vision can accurately recognize a wide range of objects, including animals, in images.

Before we dive into the code, let’s ensure that we have the necessary tools and environment set up to work with GPT-4 Vision. First, you’ll need to obtain an API key from OpenAI to access their services. You can sign up for an API key on the OpenAI website. Once you have your API key, install the required Python libraries, including requests, openai, PIL, and json.

This Python script will identify animals in images using the GPT-4 Vision model and generating insights about them using the GPT-3.5 Turbo Instruct model.

import base64
import requests
import openai
from PIL import Image
from io import BytesIO
import json
import os

# Set your OpenAI API key
api_key = "API_KEY"
openai.api_key = api_key

# Function to encode the image
def encode_image(image_data):
    return base64.b64encode(image_data).decode('utf-8')

# Function to perform animal identification using GPT-4 Vision
def identify_animal(image_data):
    base64_image = encode_image(image_data)

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    payload = {
        "model": "gpt-4-vision-preview",
        'messages': [
            {
                'role': 'system',
                'content': 'You have to give concise and short answers'
            },
            {
                'role': 'user',
                'content': [
                    {
                        'type': 'text',
                        'text': 'GPT, your task is to identify which animal it is with precision. Analyze any image of animal i provide and respond strictly with the name of the animal, and nothing else—no explanations, no additional text. If a condition is unrecognizable, reply with \'I don\'t know\'. If the image is not animal-related, say \'Please pick another image\'',
                    },
                    {
                        'type': 'image_url',
                        'image_url': {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                        },
                    },
                ],
            },
        ],
        "max_tokens": 50
    }

    # Make API request for animal identification
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

    # Assuming the JSON is stored in the variable 'response'
    json_data = response.json()

    # Extracting 'content' from the JSON
    content_value = json_data['choices'][0]['message']['content']

    return content_value

# Function to generate insights using GPT-3.5 Turbo Instruct
def generate_insights(animal_name):
    prompt = f"""
    \n**Scientific name:** Share the scientific name of {animal_name}.
    \n**Description :** Give a 2 line description of the animal {animal_name}.
    """,
        
    response = openai.Completion.create(
        engine="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=800,
        n=1,
        stop=None,
        temperature=0.1,
    )

    insights = response.choices[0].text.strip()

    # Remove original questions from the generated insights
    insights = insights.replace("Briefly describe the animal:", "")
    
    return insights

# Function to save conversation log
def log_conversation(image_data, animal_name, insights):
    log = {
        "image_data": base64.b64encode(image_data).decode('utf-8'),
        "animal_name": animal_name,
        "insights": insights
    }

    # Save log to a file
    filename = "conversation_log.json"
    if os.path.exists(filename):
        with open(filename, "r") as file:
            data = json.load(file)
            data.append(log)
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
    else:
        with open(filename, "w") as file:
            json.dump([log], file, indent=4)

def main():
    # Example usage
    image_path = "images.jpeg"
    with open(image_path, "rb") as file:
        image_data = file.read()

    # Identify animal using GPT-4 Vision
    animal_name = identify_animal(image_data)
    print("Animal Name:", animal_name)

    # Generate insights using GPT-3.5 Turbo Instruct
    insights = generate_insights(animal_name)
    print("Insights and Observations:", insights)

    # Log conversation
    log_conversation(image_data, animal_name, insights)

if __name__ == "__main__":
    main()

Output after running the above code

The log_conversation function wil save conversation log, including the uploaded image, identified animal name, and generated insights, into a JSON file.

How to Identify Animals in Images Using GPT-4 Vision

Written by Shankar Sharma

No responses yet