Introduction to Optical Character Recognition (OCR) using OpenCV in Python

Optical Character Recognition (OCR) is a technology that enables the conversion of different types of documents, such as scanned images or PDF files, into editable and searchable data. In this tutorial, we will introduce you to OCR using OpenCV in Python, and guide you through extracting text from images.

Table of Contents

  1. Prerequisites
  2. Installing OpenCV and Tesseract
  3. Image Preprocessing
  4. Text Recognition with Tesseract
  5. Improving Accuracy
  6. Conclusion

Prerequisites

To follow along with this tutorial, you should have a basic understanding of Python and OpenCV. It's also helpful to have some experience with image processing techniques.

Installing OpenCV and Tesseract

First, we need to install OpenCV and Tesseract, an OCR engine, for Python. You can install them using the following commands:

pip install opencv-python
pip install pytesseract

Additionally, you'll need to install the Tesseract binary for your operating system. You can find the installation instructions for different platforms in the Tesseract GitHub repository.

Image Preprocessing

To improve the accuracy of our OCR system, we should preprocess the input images. Some common preprocessing techniques include resizing, grayscaling, thresholding, and noise removal.

Let's start by loading an image and converting it to grayscale:

import cv2

image = cv2.imread('input_image.jpg')
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Next, we'll apply thresholding to the image:

_, thresholded_image = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

Now, let's remove any noise from the image using morphological operations:

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
denoised_image = cv2.morphologyEx(thresholded_image, cv2.MORPH_CLOSE, kernel)

Text Recognition with Tesseract

After preprocessing the image, we can now use Tesseract to extract text from it:

import pytesseract

text = pytesseract.image_to_string(denoised_image)
print(text)

This should output the extracted text from the input image.

Improving Accuracy

To further improve the accuracy of our OCR system, you can experiment with different preprocessing techniques or fine-tune Tesseract's parameters. For example, you can set the OCR engine mode (OEM) and the page segmentation mode (PSM) as follows:

custom_config = r'--oem 3 --psm 6'
text = pytesseract.image_to_string(denoised_image, config=custom_config)
print(text)

You can find more information on Tesseract's parameters in the official documentation.

Conclusion

In this tutorial, we introduced OCR using OpenCV in Python, and showed you how to extract text from images. We also demonstrated some basic image preprocessing techniques and how to improve the accuracy of the extracted text by tweaking Tesseract's parameters. With this knowledge, you can now build your own OCR applications and experiment with different techniques to achieve even better results.

An AI coworker, not just a copilot

View VelocityAI