Harnessing OCR and OpenAI: A Structural Guide to Passport Text Extraction

Shankar Sharma
4 min readOct 4, 2024

--

😁😂😭

Input Image ☝️

In this blog post, I will walk you through a Python function designed to summarize the architecture of a Streamlit application that extracts text from passport images. This application leverages Optical Character Recognition (OCR) and OpenAI’s API for advanced text formatting. Understanding this architecture is crucial for anyone looking to develop similar applications or enhance their current implementations.

Function Overview

The function we’ll be discussing is called explain_code_architecture. Its purpose is to provide a structured overview of the code used for passport text extraction, detailing the components, parameters, returns, and the flow of execution. Below is the complete function definition:

def explain_code_architecture():
"""
Explains the architecture of the passport text extraction and formatting code.

Returns:
str: A detailed explanation of the code structure, including parameters, return values, and flow for each function.
"""

explanation = """
The code is designed to extracts text from an uploaded passport image using Optical Character Recognition (OCR) and formats the extracted text using the OpenAI API.

1. **Function: extract_text_from_image**

Parameters:
- image: An image file (in PIL format) uploaded by the user.

Returns:
- normalized_text: A string containing the normalized text extracted from the image.

Flow:
1. Convert the uploaded image from PIL format to OpenCV format.
2. Convert the image to grayscale to enhance the OCR process.
3. Apply thresholding to create a binary image, improving OCR accuracy.
4. Use pytesseract to extract text from the thresholded image.
5. Normalize specific characters in the extracted text.
6. Concatenate normalized lines into a single string and return.

2. **Function: process_text_with_openai**

Parameters:
- text: The extracted text to be processed.
- prompt_template: A template string used to format the input text for the OpenAI API.

Returns:
- A string containing the formatted output from the OpenAI API.

Flow:
1. Set the OpenAI API key for authorization.
2. Format the provided text into a specific prompt template.
3. Call the OpenAI API with the formatted prompt to obtain a response.
4. Extract and return the relevant formatted information from the response.

3. **Function: main**

Parameters:
- None

Returns:
- None (directly interacts with the Streamlit interface).

Flow:
1. Set the app title in the Streamlit interface.
2. Create a sidebar for file upload, allowing users to upload an image of the passport.
3. Display the uploaded image in the sidebar.
4. Call the extract_text_from_image function to extract text from the uploaded image.
5. Display the extracted text in a text area for user visibility.
6. Define a prompt template for formatting the extracted text.
7. Call the process_text_with_openai function to format the extracted text.
8. Display the formatted output in another text area for user visibility.

Summary:
- The code consists of three main components: two functions for processing (text extraction and text processing) and a main function that ties together the user interface and the processing logic.
- The architecture effectively utilizes Streamlit for the web interface, OpenCV and Pytesseract for image processing, and OpenAI's API for advanced text processing and formatting.
"""

return explanation.strip()

Understanding the Function Components

1. Extracting Text from Images

The extract_text_from_image function is responsible for processing the uploaded image. It converts the image format, applies grayscale transformation, and uses thresholding techniques to enhance OCR accuracy. The extracted text is then normalized to correct any specific character misinterpretations that may occur during the OCR process.

2. Processing Text with OpenAI

The process_text_with_openai function takes the extracted text and formats it for OpenAI’s API. This function sets the necessary API key, constructs a prompt based on the text input, and sends a request to OpenAI. It receives a formatted response that is easy to read and understand, which is critical for displaying passport details correctly.

3. Streamlit Application Structure

The main function serves as the entry point of the application. It sets up the user interface, allowing users to upload passport images and displaying both the extracted and formatted text. This function integrates the other components, ensuring a smooth user experience and proper data flow from image upload to text extraction and formatting.

Extracted Text from Image:

TS AND LIMITAT

oma
PASSPORT OR]
mIERSSEPORT: TypelType _Issuing Country/Pays émetteur Passport No./N* de passeport
© amon CAN 6coo0000

SANTA

Given names/Prénoms

CLAUS

Nationality’Nationaité
LAPAND/CANADIENNE

Date of birth/Date de naissance

25 DEC /DEC O00

Sex/Sexe Place of bith/Lieu de naissance

M/F wwW.EDITABLE-TEMPLATES.cc

Date o ssue/Date de déivrance

25 DEC /DEC 00

Date of expiryDate expiration

25 DEC /DEC 00
Issuing Authoriy/Autorté de délvrance

LAPAND NOVA SCOTIA

44490?

P<CANSANTA<KCLAUS <<<<<<<<<<K KKK KKK KKK KKK KKK
GCOOD000<OCANODD0000M0000000<<<<<<<<<<<<<<00

NS ET RESTRICTIONS,

Formatted Output from Image:

- Type: P
- Country code: CAN
- Passport No: GCOO0000
- Surname: Claus
- Given Name(s): Santa
- Nationality: Lapland/Canadian
- Sex: M
- Date of Birth: 25 Dec 2000
- Place of Birth: Lapland
- Place of Issue: Nova Scotia
- Date of Issue: 25 Dec 2000
- Date of Expiry: 25 Dec 2000
- Machine Readable Zone: GCOOD000<OCANODD0000M0000000

The explain_code_architecture function is a powerful tool for understanding the underlying structure and functionality of the passport text extraction application. By summarizing the code’s architecture, it provides clarity and serves as a valuable reference for developers aiming to build or enhance similar applications.

--

--

No responses yet