第5回 文字認識


一覧に戻る

第5回 文字認識

こんにちは。東京システム技研)AIプロジェクトです。

今回のテーマは文字認識です。文字認識とは手書きや印刷された文字をコンピュータが読み取る技術のことで、画像認識や自然言語処理の技術が用いられています。文字認識の概要については解説動画をご覧ください。

今回の実装のソースコードはこちらからダウンロードできます。

目次

解説動画

概要

  • 実装内容
    • Azure Form RecognizerのAPIを用いてOCRを行い、テキストの抽出やバウンディングボックスの書き込みを行います。
  • 実装環境
    • Google Colaboratoryを使用します。Google Colaboratoryに関する説明はこちらをご覧ください。

Azure Form Recognizer

今回使用するクラウドAPIはMicrosoftのAzure Form Recgnizerです。
Form RecgnizerはOCR用のAPIで、画像やPDFから、テキストデータやテーブルデータを抽出することができます。

利用にはAzureのエンドポイントを作成する必要があります。エンドポイントはこちらから作成できます。(事前にAzureの無料アカウントを作成する必要があります。)

APIリファレンス:
https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeWithCustomForm

処理の流れ

  1. Google Colaboratoryに画像ファイルをアップロード
  2. 認証キー・画像データをMicrosoft Azureに送信してAPI実行、認証結果を取得
  3. 出力結果を整形し、バウンディングボックス(認識箇所を囲う四角形)を出力

実装

準備

データのダウンロード

ハンズオン用のサンプルデータをダウンロードします。こちらからダウンロードしてください。

  • 使用するデータ
    • 請求書サンプル.pdf
      • 請求書風のpdfファイル
    • 画像サンプル2.png
      • スイスのホテルのレシート(Wikipediaより)
      • 演習で使用

データのアップロードと解凍

ダウンロードしたai_seminar_05_sampledata.zipをGoogle Colaboratoryにアップロードします。手順はこちらを参照

アップロードできたら、以下のコマンドで解凍します。

# zipファイルの解凍
!unzip -Ocp932 ai_seminar_05_sampledata.zip

パッケージのインストールとインポート

Form Recognizerの利用にはazure-ai-formrecognizerというライブラリを使用します。
https://docs.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python
https://pypi.org/project/azure-ai-formrecognizer/

また、PDFを画像に変換するために、pdf2imageとpoppler-utilsをインストールします。

# 必要なライブラリをインストール
!pip install azure-ai-formrecognizer
!pip install pdf2image
!apt-get install poppler-utils
%matplotlib inline

# 必要なライブラリをインポート
import os
import copy

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw

from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
from pdf2image import convert_from_path

Azureと接続

FormRecognizerClientクラスのインスタンスを生成し、エンドポイントに接続します。
以下のセルの「key」と「endpoint」には作成したエンドポイントのキーとURLを入力してください。

# 認証情報(作成したエンドポイントの値)を入力
key = 'XXX'
endpoint = 'https://YYY'

# 接続
form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key))

画像ファイルの認識

アップロードした画像をColaboratory上に表示して確認します。

# ファイル名をセット
filename = '/content/画像サンプル1.png'

# Pillowライブラリで画像を表示
img = Image.open(filename).convert('RGB')
plt.figure(figsize=(15, 15))
plt.imshow(img)
plt.show()

上のような画像が表示されればOKです。

次に画像ファイルを読み込み、Form Recognizerに入力します。open関数で画像ファイルを読み込む際はバイナリモードで読み込むように引数modeに’b’を指定する必要があります。

OCRの実行にはFormRecognizerClient.begin_recognize_contentメソッドを使用します。pngファイルを入力する場合はcontent_type=’image/png’とします。

# 画像を読み込み
with open(filename, 'rb') as image_file:
    content = image_file.read()

# OCR実行
poller = form_recognizer_client.begin_recognize_content(content, content_type='image/png')

# 結果の取り出し
pages = poller.result()

結果をpagesという変数で受け取ったので、中身を確認します。pagesはFormPageクラスのオブジェクトを要素とするリストになっていて、入力が複数ページの場合はページごとに結果が出力されるようになっています。今回入力は1枚の画像なので、pagesには1つだけの結果が入っています。vars関数でFormPageのインスタンス変数とその値を辞書形式で表示してみます。

# 結果を確認
vars(pages[0])
{'height': 338.0,
 'lines': [FormLine(text=日本語, bounding_box=[Point(x=51.0, y=39.0), Point(x=154.0, y=39.0), Point(x=154.0, y=71.0), Point(x=51.0, y=71.0)], words=[FormWord(text=日, bounding_box=[Point(x=52.0, y=40.0), Point(x=72.0, y=40.0), Point(x=72.0, y=72.0), Point(x=52.0, y=71.0)], confidence=0.996, page_number=1, kind=word), FormWord(text=本, bounding_box=[Point(x=84.0, y=39.0), Point(x=108.0, y=39.0), Point(x=107.0, y=72.0), Point(x=84.0, y=72.0)], confidence=0.998, page_number=1, kind=word), FormWord(text=語, bounding_box=[Point(x=120.0, y=39.0), Point(x=143.0, y=39.0), Point(x=142.0, y=72.0), Point(x=119.0, y=72.0)], confidence=0.994, page_number=1, kind=word)], page_number=1, kind=line, appearance=None),
  FormLine(text=87年前、われわれの父祖たちは、自由の精神にはぐくまれ、, bounding_box=[Point(x=223.0, y=111.0), Point(x=1077.0, y=111.0), Point(x=1077.0, y=145.0), Point(x=223.0, y=145.0)], words=[FormWord(text=87, bounding_box=[Point(x=223.0, y=113.0), Point(x=260.0, y=113.0), Point(x=260.0, y=146.0), Point(x=224.0, y=146.0)], confidence=0.994, page_number=1, kind=word), FormWord(text=年, bounding_box=[Point(x=261.0, y=113.0), Point(x=285.0, y=113.0), Point(x=285.0, y=146.0), Point(x=261.0, y=146.0)], confidence=0.996, page_number=1, kind=word), FormWord(text=前, bounding_box=[Point(x=293.0, y=113.0), Point(x=317.0, y=113.0), Point(x=317.0, y=146.0), Point(x=293.0, y=146.0)], confidence=0.997, page_number=1, kind=word), FormWord(text=、, bounding_box=[Point(x=323.0, y=113.0), Point(x=346.0, y=112.0), Point(x=347.0, y=146.0), Point(x=323.0, y=146.0)], confidence=0.996, page_number=1, kind=word), FormWord(text=わ, bounding_box=[Point(x=350.0, y=112.0), Point(x=374.0, y=112.0), Point(x=374.0, y=146.0), Point(x=350.0, y=146.0)], confidence=0.99,
  FormLine(text=人はみな平等に創られているという信条にささげられた新しい国家を、, bounding_box=[Point(x=152.0, y=149.0), Point(x=1148.0, y=149.0), Point(x=1148.0, y=183.0), Point(x=152.0, y=183.0)], words=[FormWord(text=人, bounding_box=[Point(x=156.0, y=152.0), Point(x=178.0, y=151.0), Point(x=178.0, y=182.0), Point(x=155.0, y=182.0)], confidence=0.997, page_number=1, kind=word), FormWord(text=は, bounding_box=[Point(x=190.0, y=151.0), Point(x=215.0, y=151.0), Point(x=214.0, y=183.0), Point(x=189.0, y=182.0)], confidence=0.995, page_number=1, kind=word), FormWord(text=み, bounding_box=[Point(x=222.0, y=151.0), Point(x=247.0, y=151.0), Point(x=246.0, y=183.0), Point(x=222.0, y=183.0)], confidence=0.995, page_number=1, kind=word), FormWord(text=な, bounding_box=[Point(x=257.0, y=151.0), Point(x=281.0, y=151.0), Point(x=281.0, y=183.0), Point(x=256.0, y=183.0)], confidence=0.995, page_number=1, kind=word), FormWord(text=平, bounding_box=[Point(x=291.0, y=151.0), Point(x=313.0, y=151.0), Point(x=313.0, y=183.0), Point(x=290.0, y=183.0)], confidence=0,
  FormLine(text=この大陸に誕生させた。, bounding_box=[Point(x=480.0, y=188.0), Point(x=823.0, y=189.0), Point(x=822.0, y=224.0), Point(x=480.0, y=222.0)], words=[FormWord(text=こ, bounding_box=[Point(x=481.0, y=190.0), Point(x=503.0, y=190.0), Point(x=503.0, y=222.0), Point(x=481.0, y=221.0)], confidence=0.995, page_number=1, kind=word), FormWord(text=の, bounding_box=[Point(x=508.0, y=190.0), Point(x=531.0, y=189.0), Point(x=531.0, y=222.0), Point(x=509.0, y=222.0)], confidence=0.997, page_number=1, kind=word), FormWord(text=大, bounding_box=[Point(x=545.0, y=189.0), Point(x=568.0, y=189.0), Point(x=568.0, y=222.0), Point(x=545.0, y=222.0)], confidence=0.998, page_number=1, kind=word), FormWord(text=陸, bounding_box=[Point(x=583.0, y=189.0), Point(x=606.0, y=189.0), Point(x=606.0, y=222.0), Point(x=583.0, y=222.0)], confidence=0.996, page_number=1, kind=word), FormWord(text=に, bounding_box=[Point(x=614.0, y=189.0), Point(x=637.0, y=189.0), Point(x=636.0, y=222.0), Point(x=614.0, y=222.0)], confidence=0.996, page_number=1, ki],
 'page_number': 1,
 'selection_marks': [FormSelectionMark(text=None, bounding_box=[Point(x=156.0, y=153.0), Point(x=185.0, y=153.0), Point(x=185.0, y=184.0), Point(x=156.0, y=184.0)], confidence=0.41, page_number=1, state=selected, kind=selectionMark)],
 'tables': [],
 'text_angle': 0.0,
 'unit': 'pixel',
 'width': 1309.0}

認識結果の各変数の説明は以下の通りです。

変数説明
heightfloatピクセル/インチ単位の画像/PDFの高さ
lineslist[FormLine]行単位で認識されたテキストのリスト
page_numberintページ番号
selection_markslist[FormSelectionMark]認識されたチェックボックスやラジオボタンなど
tableslist[FormTable]ページに含まれる抽出されたテーブルのリスト
text_anglefloatテキストの角度
unitstr単位 画像の場合は”pixcel”、PDFの場合は”inch”
widthfloatピクセル/インチ単位の画像/PDFの幅

認識結果のlinesの値を見ると、FormLineオブジェクトが4つ格納されています。FormLineは1行単位の抽出されたテキストやバウンディングボックスの位置情報などを持っています。つまり入力画像に対して4行分のテキストが抽出されたことが分かります。

バウンディングボックスの表示

各FormLineオブジェクトのバウンディングボックスの位置情報を取り出し、元画像に書き込んで表示します。

# 画像のコピーを作成
copy_img = copy.deepcopy(img)
draw = ImageDraw.Draw(copy_img)

# バウンディングボックス書き込み
lines = pages[0].lines
for line in lines:
    # 0,1,2,3 (左上、右上、右下、左下の順)
    left = line.bounding_box[0].x
    top = line.bounding_box[0].y
    right = line.bounding_box[2].x
    bottom = line.bounding_box[2].y

    # バウンディングボックス1個を描画
    grid = (left, top, right, bottom)
    draw.rectangle(grid, outline='red', fill=None, width=2)

# 画面表示
plt.figure(figsize=(15, 15))
plt.imshow(copy_img)
plt.show()

バウンディングボックスの表示ができました。

PDFファイルの認識

次にPDFファイルのOCRを行い、表とバウンディングボックスを表示します。

画像の時と同様にPDFファイルをColaboratory上に表示してみます。PDFでもバウンディングボックスの書き込みができるように一度PDFを画像に変換します。

# ファイルのパス
pdf_file = '/content/請求書サンプル.pdf'

# PDFを画像に変換
img_file = convert_from_path(pdf_file, fmt='png', output_file='請求書サンプル', output_folder='/content')[0]

変換した画像ファイルを表示します。

# 画像を表示
img = Image.open(img_file.filename).convert('RGB')
plt.figure(figsize=(15, 15))
plt.imshow(img)
plt.show()

表示できました。

次にPDFに対してOCRを実行します。画像の時と同様FormRecognizerClient.begin_recognize_contentメソッドの引数にバイナリとして読み込んだファイルを渡します。PDFの場合はcontent_type=’application/pdf’とします。

# PDFを読み込み
with open(pdf_file, 'rb') as f:
    content = f.read()

# OCR実行
poller = form_recognizer_client.begin_recognize_content(content, content_type='application/pdf')

# 結果の取り出し
pages = poller.result()
vars(pages[0])
{'height': 11.6806,
 'lines': [FormLine(text=株式会社東京システム技研, bounding_box=[Point(x=0.6752, y=1.2373), Point(x=4.1709, y=1.2373), Point(x=4.1709, y=1.5164), Point(x=0.6752, y=1.5164)], words=[FormWord(text=株, bounding_box=[Point(x=0.6752, y=1.241), Point(x=0.9774, y=1.241), Point(x=0.9774, y=1.5164), Point(x=0.6752, y=1.5164)], confidence=1.0, page_number=1, kind=word), FormWord(text=式, bounding_box=[Point(x=0.9897, y=1.2407), Point(x=1.2712, y=1.2407), Point(x=1.2712, y=1.5142), Point(x=0.9897, y=1.5142)], confidence=1.0, page_number=1, kind=word), FormWord(text=会, bounding_box=[Point(x=1.2861, y=1.2407), Point(x=1.5878, y=1.2407), Point(x=1.5878, y=1.5164), Point(x=1.2861, y=1.5164)], confidence=1.0, page_number=1, kind=word), FormWord(text=社, bounding_box=[Point(x=1.5942, y=1.2404), Point(x=1.8883, y=1.2404), Point(x=1.8883, y=1.5152), Point(x=1.5942, y=1.5152)], confidence=1.0, page_number=1, kind=word), FormWord(text=東, bounding_box=[Point(x=2.0036, y=1.2373), Point(x=2.3007, y=1.2373), Point(x=2.3007, y=1.5161), Point(x=2.0036, y=1.5161,
  FormLine(text=〒160-0023東京都新宿区西新宿1丁目21番明宝ビル3F, bounding_box=[Point(x=0.7139, y=1.7138), Point(x=4.499, y=1.7138), Point(x=4.499, y=1.843), Point(x=0.7139, y=1.843)], words=[FormWord(text=〒160-0023, bounding_box=[Point(x=0.7139, y=1.7303), Point(x=1.4885, y=1.7303), Point(x=1.4885, y=1.8361), Point(x=0.7139, y=1.8361)], confidence=1.0, page_number=1, kind=word), FormWord(text=東, bounding_box=[Point(x=1.5918, y=1.7167), Point(x=1.7264, y=1.7167), Point(x=1.7264, y=1.843), Point(x=1.5918, y=1.843)], confidence=1.0, page_number=1, kind=word), FormWord(text=京, bounding_box=[Point(x=1.7318, y=1.7173), Point(x=1.8637, y=1.7173), Point(x=1.8637, y=1.8419), Point(x=1.7318, y=1.8419)], confidence=1.0, page_number=1, kind=word), FormWord(text=都, bounding_box=[Point(x=1.8684, y=1.7177), Point(x=1.9989, y=1.7177), Point(x=1.9989, y=1.8418), Point(x=1.8684, y=1.8418)], confidence=1.0, page_number=1, kind=word), FormWord(text=新, bounding_box=[Point(x=2.0074, y=1.7184), Point(x=2.1416, y=1.7184), Point(x=2.1416, y=1.8429), Po,
  FormLine(text=TEL 03-3342-2651, bounding_box=[Point(x=0.6952, y=1.9603), Point(x=1.964, y=1.9603), Point(x=1.964, y=2.0661), Point(x=0.6952, y=2.0661)], words=[FormWord(text=TEL, bounding_box=[Point(x=0.6952, y=1.9624), Point(x=0.9431, y=1.9624), Point(x=0.9431, y=2.0642), Point(x=0.6952, y=2.0642)], confidence=1.0, page_number=1, kind=word), FormWord(text=03-3342-2651, bounding_box=[Point(x=0.9999, y=1.9603), Point(x=1.964, y=1.9603), Point(x=1.964, y=2.0661), Point(x=0.9999, y=2.0661)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=FAX 03-3348-4634, bounding_box=[Point(x=2.1748, y=1.9601), Point(x=3.4559, y=1.9601), Point(x=3.4559, y=2.0664), Point(x=2.1748, y=2.0664)], words=[FormWord(text=FAX, bounding_box=[Point(x=2.1748, y=1.9624), Point(x=2.4233, y=1.9624), Point(x=2.4233, y=2.0642), Point(x=2.1748, y=2.0642)], confidence=1.0, page_number=1, kind=word), FormWord(text=03-3348-4634, bounding_box=[Point(x=2.4836, y=1.9601), Point(x=3.4559, y=1.9601), Point(x=3.4559, y=2.0664), Point(x=2.4836, y=2.0664)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=請求書№ 999, bounding_box=[Point(x=0.6325, y=2.5265), Point(x=2.4258, y=2.5265), Point(x=2.4258, y=2.7809), Point(x=0.6325, y=2.7809)], words=[FormWord(text=請, bounding_box=[Point(x=0.6325, y=2.529), Point(x=0.885, y=2.529), Point(x=0.885, y=2.7809), Point(x=0.6325, y=2.7809)], confidence=1.0, page_number=1, kind=word), FormWord(text=求, bounding_box=[Point(x=0.9053, y=2.5265), Point(x=1.1686, y=2.5265), Point(x=1.1686, y=2.7806), Point(x=0.9053, y=2.7806)], confidence=1.0, page_number=1, kind=word), FormWord(text=書, bounding_box=[Point(x=1.1859, y=2.5301), Point(x=1.4447, y=2.5301), Point(x=1.4447, y=2.7803), Point(x=1.1859, y=2.7803)], confidence=1.0, page_number=1, kind=word), FormWord(text=№, bounding_box=[Point(x=1.5636, y=2.5549), Point(x=1.8224, y=2.5549), Point(x=1.8224, y=2.7675), Point(x=1.5636, y=2.7675)], confidence=1.0, page_number=1, kind=word), FormWord(text=999, bounding_box=[Point(x=1.936, y=2.5524), Point(x=2.4258, y=2.5524), Point(x=2.4258, y=2.7678), Point(x=1.936, y=2.7678)], co,
  FormLine(text=2022/2/7, bounding_box=[Point(x=4.6152, y=2.9196), Point(x=5.3026, y=2.9196), Point(x=5.3026, y=3.06), Point(x=4.6152, y=3.06)], words=[FormWord(text=2022/2/7, bounding_box=[Point(x=4.6152, y=2.9196), Point(x=5.3026, y=2.9196), Point(x=5.3026, y=3.06), Point(x=4.6152, y=3.06)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=請求先, bounding_box=[Point(x=0.6325, y=3.5869), Point(x=1.4397, y=3.5869), Point(x=1.4397, y=3.8413), Point(x=0.6325, y=3.8413)], words=[FormWord(text=請, bounding_box=[Point(x=0.6325, y=3.5894), Point(x=0.885, y=3.5894), Point(x=0.885, y=3.8413), Point(x=0.6325, y=3.8413)], confidence=1.0, page_number=1, kind=word), FormWord(text=求, bounding_box=[Point(x=0.9053, y=3.5869), Point(x=1.1686, y=3.5869), Point(x=1.1686, y=3.841), Point(x=0.9053, y=3.841)], confidence=1.0, page_number=1, kind=word), FormWord(text=先, bounding_box=[Point(x=1.1886, y=3.5913), Point(x=1.4397, y=3.5913), Point(x=1.4397, y=3.841), Point(x=1.1886, y=3.841)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=件名, bounding_box=[Point(x=5.3946, y=3.5883), Point(x=5.9212, y=3.5883), Point(x=5.9212, y=3.8413), Point(x=5.3946, y=3.8413)], words=[FormWord(text=件, bounding_box=[Point(x=5.3946, y=3.5902), Point(x=5.6662, y=3.5902), Point(x=5.6662, y=3.8413), Point(x=5.3946, y=3.8413)], confidence=1.0, page_number=1, kind=word), FormWord(text=名, bounding_box=[Point(x=5.6793, y=3.5883), Point(x=5.9212, y=3.5883), Point(x=5.9212, y=3.8396), Point(x=5.6793, y=3.8396)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=株式会社○○物産, bounding_box=[Point(x=0.6026, y=3.9582), Point(x=1.8699, y=3.9582), Point(x=1.8699, y=4.0979), Point(x=0.6026, y=4.0979)], words=[FormWord(text=株, bounding_box=[Point(x=0.6026, y=3.9596), Point(x=0.7544, y=3.9596), Point(x=0.7544, y=4.0979), Point(x=0.6026, y=4.0979)], confidence=1.0, page_number=1, kind=word), FormWord(text=式, bounding_box=[Point(x=0.7605, y=3.9594), Point(x=0.9019, y=3.9594), Point(x=0.9019, y=4.0968), Point(x=0.7605, y=4.0968)], confidence=1.0, page_number=1, kind=word), FormWord(text=会, bounding_box=[Point(x=0.9094, y=3.9594), Point(x=1.0609, y=3.9594), Point(x=1.0609, y=4.0979), Point(x=0.9094, y=4.0979)], confidence=1.0, page_number=1, kind=word), FormWord(text=社, bounding_box=[Point(x=1.0641, y=3.9593), Point(x=1.2118, y=3.9593), Point(x=1.2118, y=4.0973), Point(x=1.0641, y=4.0973)], confidence=1.0, page_number=1, kind=word), FormWord(text=○○, bounding_box=[Point(x=1.2817, y=3.9694), Point(x=1.5583, y=3.9694), Point(x=1.5583, y=4.0944), Point(x=1.2817, y=4.0944),
  FormLine(text=果物, bounding_box=[Point(x=5.3757, y=3.9606), Point(x=5.6693, y=3.9606), Point(x=5.6693, y=4.0994), Point(x=5.3757, y=4.0994)], words=[FormWord(text=果, bounding_box=[Point(x=5.3757, y=3.9665), Point(x=5.525, y=3.9665), Point(x=5.525, y=4.0991), Point(x=5.3757, y=4.0991)], confidence=1.0, page_number=1, kind=word), FormWord(text=物, bounding_box=[Point(x=5.5304, y=3.9606), Point(x=5.6693, y=3.9606), Point(x=5.6693, y=4.0994), Point(x=5.5304, y=4.0994)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=〒999-9999 ○○県△△市××1-1-1, bounding_box=[Point(x=0.6241, y=4.1773), Point(x=3.1084, y=4.1773), Point(x=3.1084, y=4.3161), Point(x=0.6241, y=4.3161)], words=[FormWord(text=〒999-9999, bounding_box=[Point(x=0.6241, y=4.1896), Point(x=1.4781, y=4.1896), Point(x=1.4781, y=4.3082), Point(x=0.6241, y=4.3082)], confidence=1.0, page_number=1, kind=word), FormWord(text=○○, bounding_box=[Point(x=1.5536, y=4.1877), Point(x=1.8302, y=4.1877), Point(x=1.8302, y=4.3127), Point(x=1.5536, y=4.3127)], confidence=1.0, page_number=1, kind=word), FormWord(text=県, bounding_box=[Point(x=1.8495, y=4.1825), Point(x=1.9922, y=4.1825), Point(x=1.9922, y=4.3161), Point(x=1.8495, y=4.3161)], confidence=1.0, page_number=1, kind=word), FormWord(text=△△, bounding_box=[Point(x=2.0105, y=4.1914), Point(x=2.2932, y=4.1914), Point(x=2.2932, y=4.3056), Point(x=2.0105, y=4.3056)], confidence=1.0, page_number=1, kind=word), FormWord(text=市, bounding_box=[Point(x=2.3124, y=4.1773), Point(x=2.4519, y=4.1773), Point(x=2.4519, y=4.3156), P,
  FormLine(text=TEL 123-456-7890, bounding_box=[Point(x=0.6034, y=4.4109), Point(x=2.0048, y=4.4109), Point(x=2.0048, y=4.5287), Point(x=0.6034, y=4.5287)], words=[FormWord(text=TEL, bounding_box=[Point(x=0.6034, y=4.4134), Point(x=0.8772, y=4.4134), Point(x=0.8772, y=4.5262), Point(x=0.6034, y=4.5262)], confidence=1.0, page_number=1, kind=word), FormWord(text=123-456-7890, bounding_box=[Point(x=0.9499, y=4.4109), Point(x=2.0048, y=4.4109), Point(x=2.0048, y=4.5287), Point(x=0.9499, y=4.5287)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=品目, bounding_box=[Point(x=2.8057, y=5.1724), Point(x=3.1104, y=5.1724), Point(x=3.1104, y=5.3162), Point(x=2.8057, y=5.3162)], words=[FormWord(text=品, bounding_box=[Point(x=2.8057, y=5.1724), Point(x=2.9522, y=5.1724), Point(x=2.9522, y=5.3162), Point(x=2.8057, y=5.3162)], confidence=1.0, page_number=1, kind=word), FormWord(text=目, bounding_box=[Point(x=2.9794, y=5.1741), Point(x=3.1104, y=5.1741), Point(x=3.1104, y=5.3159), Point(x=2.9794, y=5.3159)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=金額, bounding_box=[Point(x=6.3564, y=5.1676), Point(x=6.6872, y=5.1676), Point(x=6.6872, y=5.3172), Point(x=6.3564, y=5.3172)], words=[FormWord(text=金, bounding_box=[Point(x=6.3564, y=5.1678), Point(x=6.5202, y=5.1678), Point(x=6.5202, y=5.3136), Point(x=6.3564, y=5.3136)], confidence=1.0, page_number=1, kind=word), FormWord(text=額, bounding_box=[Point(x=6.5226, y=5.1676), Point(x=6.6872, y=5.1676), Point(x=6.6872, y=5.3172), Point(x=6.5226, y=5.3172)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=りんご 5kg, bounding_box=[Point(x=0.9207, y=5.5566), Point(x=1.5635, y=5.5566), Point(x=1.5635, y=5.7215), Point(x=0.9207, y=5.7215)], words=[FormWord(text=りんご, bounding_box=[Point(x=0.9207, y=5.5566), Point(x=1.2626, y=5.5566), Point(x=1.2626, y=5.698), Point(x=0.9207, y=5.698)], confidence=1.0, page_number=1, kind=word), FormWord(text=5kg, bounding_box=[Point(x=1.3163, y=5.5698), Point(x=1.5635, y=5.5698), Point(x=1.5635, y=5.7215), Point(x=1.3163, y=5.7215)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥2,000.00, bounding_box=[Point(x=6.1416, y=5.5727), Point(x=6.8976, y=5.5727), Point(x=6.8976, y=5.7172), Point(x=6.1416, y=5.7172)], words=[FormWord(text=¥2,000.00, bounding_box=[Point(x=6.1416, y=5.5727), Point(x=6.8976, y=5.5727), Point(x=6.8976, y=5.7172), Point(x=6.1416, y=5.7172)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=みかん 1kg, bounding_box=[Point(x=0.9184, y=5.9533), Point(x=1.6098, y=5.9533), Point(x=1.6098, y=6.1081), Point(x=0.9184, y=6.1081)], words=[FormWord(text=みかん, bounding_box=[Point(x=0.9184, y=5.9533), Point(x=1.2939, y=5.9533), Point(x=1.2939, y=6.0842), Point(x=0.9184, y=6.0842)], confidence=1.0, page_number=1, kind=word), FormWord(text=1kg, bounding_box=[Point(x=1.3697, y=5.9565), Point(x=1.6098, y=5.9565), Point(x=1.6098, y=6.1081), Point(x=1.3697, y=6.1081)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥500.00, bounding_box=[Point(x=6.2166, y=5.9596), Point(x=6.8243, y=5.9596), Point(x=6.8243, y=6.0784), Point(x=6.2166, y=6.0784)], words=[FormWord(text=¥500.00, bounding_box=[Point(x=6.2166, y=5.9596), Point(x=6.8243, y=5.9596), Point(x=6.8243, y=6.0784), Point(x=6.2166, y=6.0784)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=バナナ 10kg, bounding_box=[Point(x=0.9127, y=6.3322), Point(x=1.6949, y=6.3322), Point(x=1.6949, y=6.4951), Point(x=0.9127, y=6.4951)], words=[FormWord(text=バナナ, bounding_box=[Point(x=0.9127, y=6.3322), Point(x=1.2824, y=6.3322), Point(x=1.2824, y=6.4704), Point(x=0.9127, y=6.4704)], confidence=1.0, page_number=1, kind=word), FormWord(text=10kg, bounding_box=[Point(x=1.3596, y=6.3434), Point(x=1.6949, y=6.3434), Point(x=1.6949, y=6.4951), Point(x=1.3596, y=6.4951)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥3,000.00, bounding_box=[Point(x=6.1416, y=6.3464), Point(x=6.8976, y=6.3464), Point(x=6.8976, y=6.4908), Point(x=6.1416, y=6.4908)], words=[FormWord(text=¥3,000.00, bounding_box=[Point(x=6.1416, y=6.3464), Point(x=6.8976, y=6.3464), Point(x=6.8976, y=6.4908), Point(x=6.1416, y=6.4908)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=小計, bounding_box=[Point(x=4.6958, y=7.4945), Point(x=4.9948, y=7.4945), Point(x=4.9948, y=7.6316), Point(x=4.6958, y=7.6316)], words=[FormWord(text=小, bounding_box=[Point(x=4.6958, y=7.4959), Point(x=4.8423, y=7.4959), Point(x=4.8423, y=7.631), Point(x=4.6958, y=7.631)], confidence=1.0, page_number=1, kind=word), FormWord(text=計, bounding_box=[Point(x=4.8533, y=7.4945), Point(x=4.9948, y=7.4945), Point(x=4.9948, y=7.6316), Point(x=4.8533, y=7.6316)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥5,500.00, bounding_box=[Point(x=6.1416, y=7.5065), Point(x=6.8976, y=7.5065), Point(x=6.8976, y=7.6508), Point(x=6.1416, y=7.6508)], words=[FormWord(text=¥5,500.00, bounding_box=[Point(x=6.1416, y=7.5065), Point(x=6.8976, y=7.5065), Point(x=6.8976, y=7.6508), Point(x=6.1416, y=7.6508)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=消費税率, bounding_box=[Point(x=4.3913, y=7.8789), Point(x=4.9939, y=7.8789), Point(x=4.9939, y=8.0197), Point(x=4.3913, y=8.0197)], words=[FormWord(text=消, bounding_box=[Point(x=4.3913, y=7.8789), Point(x=4.5276, y=7.8789), Point(x=4.5276, y=8.0192), Point(x=4.3913, y=8.0192)], confidence=1.0, page_number=1, kind=word), FormWord(text=費, bounding_box=[Point(x=4.5463, y=7.8808), Point(x=4.6882, y=7.8808), Point(x=4.6882, y=8.0195), Point(x=4.5463, y=8.0195)], confidence=1.0, page_number=1, kind=word), FormWord(text=税, bounding_box=[Point(x=4.6958, y=7.8808), Point(x=4.8423, y=7.8808), Point(x=4.8423, y=8.0192), Point(x=4.6958, y=8.0192)], confidence=1.0, page_number=1, kind=word), FormWord(text=率, bounding_box=[Point(x=4.8538, y=7.8801), Point(x=4.9939, y=7.8801), Point(x=4.9939, y=8.0197), Point(x=4.8538, y=8.0197)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=10.00%, bounding_box=[Point(x=6.2447, y=7.8949), Point(x=6.8063, y=7.8949), Point(x=6.8063, y=8.012), Point(x=6.2447, y=8.012)], words=[FormWord(text=10.00%, bounding_box=[Point(x=6.2447, y=7.8949), Point(x=6.8063, y=7.8949), Point(x=6.8063, y=8.012), Point(x=6.2447, y=8.012)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=その他費用, bounding_box=[Point(x=4.3048, y=8.2674), Point(x=4.9858, y=8.2674), Point(x=4.9858, y=8.4062), Point(x=4.3048, y=8.4062)], words=[FormWord(text=その, bounding_box=[Point(x=4.3048, y=8.2791), Point(x=4.5342, y=8.2791), Point(x=4.5342, y=8.402), Point(x=4.3048, y=8.402)], confidence=1.0, page_number=1, kind=word), FormWord(text=他, bounding_box=[Point(x=4.5419, y=8.2677), Point(x=4.6877, y=8.2677), Point(x=4.6877, y=8.4057), Point(x=4.5419, y=8.4057)], confidence=1.0, page_number=1, kind=word), FormWord(text=費, bounding_box=[Point(x=4.6994, y=8.2674), Point(x=4.8412, y=8.2674), Point(x=4.8412, y=8.4062), Point(x=4.6994, y=8.4062)], confidence=1.0, page_number=1, kind=word), FormWord(text=用, bounding_box=[Point(x=4.8504, y=8.2745), Point(x=4.9858, y=8.2745), Point(x=4.9858, y=8.4062), Point(x=4.8504, y=8.4062)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥1,000.00, bounding_box=[Point(x=6.1416, y=8.2798), Point(x=6.8976, y=8.2798), Point(x=6.8976, y=8.4241), Point(x=6.1416, y=8.4241)], words=[FormWord(text=¥1,000.00, bounding_box=[Point(x=6.1416, y=8.2798), Point(x=6.8976, y=8.2798), Point(x=6.8976, y=8.4241), Point(x=6.1416, y=8.4241)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=合計, bounding_box=[Point(x=4.4882, y=8.6122), Point(x=4.976, y=8.6122), Point(x=4.976, y=8.8357), Point(x=4.4882, y=8.8357)], words=[FormWord(text=合, bounding_box=[Point(x=4.4882, y=8.6134), Point(x=4.7342, y=8.6134), Point(x=4.7342, y=8.8352), Point(x=4.4882, y=8.8352)], confidence=1.0, page_number=1, kind=word), FormWord(text=計, bounding_box=[Point(x=4.7452, y=8.6122), Point(x=4.976, y=8.6122), Point(x=4.976, y=8.8357), Point(x=4.7452, y=8.8357)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=¥7,050.00, bounding_box=[Point(x=6.1416, y=8.6669), Point(x=6.8976, y=8.6669), Point(x=6.8976, y=8.8112), Point(x=6.1416, y=8.8112)], words=[FormWord(text=¥7,050.00, bounding_box=[Point(x=6.1416, y=8.6669), Point(x=6.8976, y=8.6669), Point(x=6.8976, y=8.8112), Point(x=6.1416, y=8.8112)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0)),
  FormLine(text=この請求書に関してご, bounding_box=[Point(x=0.609, y=9.669), Point(x=1.9263, y=9.669), Point(x=1.9263, y=9.8115), Point(x=0.609, y=9.8115)], words=[FormWord(text=この, bounding_box=[Point(x=0.609, y=9.6897), Point(x=0.8269, y=9.6897), Point(x=0.8269, y=9.8047), Point(x=0.609, y=9.8047)], confidence=1.0, page_number=1, kind=word), FormWord(text=請, bounding_box=[Point(x=0.8403, y=9.6724), Point(x=0.9793, y=9.6724), Point(x=0.9793, y=9.8111), Point(x=0.8403, y=9.8111)], confidence=1.0, page_number=1, kind=word), FormWord(text=求, bounding_box=[Point(x=0.9905, y=9.671), Point(x=1.1356, y=9.671), Point(x=1.1356, y=9.811), Point(x=0.9905, y=9.811)], confidence=1.0, page_number=1, kind=word), FormWord(text=書, bounding_box=[Point(x=1.1451, y=9.673), Point(x=1.2877, y=9.673), Point(x=1.2877, y=9.8108), Point(x=1.1451, y=9.8108)], confidence=1.0, page_number=1, kind=word), FormWord(text=に, bounding_box=[Point(x=1.303, y=9.683), Point(x=1.4057, y=9.683), Point(x=1.4057, y=9.8042), Point(x=1.303, y=9.8042)], confidence=,
  FormLine(text=不明な点がございましたら、下記までお問い合わせください。, bounding_box=[Point(x=1.9183, y=9.669), Point(x=5.3912, y=9.669), Point(x=5.3912, y=9.8118), Point(x=1.9183, y=9.8118)], words=[FormWord(text=不, bounding_box=[Point(x=1.9183, y=9.6828), Point(x=2.0648, y=9.6828), Point(x=2.0648, y=9.8113), Point(x=1.9183, y=9.8113)], confidence=1.0, page_number=1, kind=word), FormWord(text=明, bounding_box=[Point(x=2.0813, y=9.679), Point(x=2.2069, y=9.679), Point(x=2.2069, y=9.811), Point(x=2.0813, y=9.811)], confidence=1.0, page_number=1, kind=word), FormWord(text=な, bounding_box=[Point(x=2.2273, y=9.6808), Point(x=2.3471, y=9.6808), Point(x=2.3471, y=9.8092), Point(x=2.2273, y=9.8092)], confidence=1.0, page_number=1, kind=word), FormWord(text=点, bounding_box=[Point(x=2.3572, y=9.6734), Point(x=2.5015, y=9.6734), Point(x=2.5015, y=9.8116), Point(x=2.3572, y=9.8116)], confidence=1.0, page_number=1, kind=word), FormWord(text=がございましたら、, bounding_box=[Point(x=2.5104, y=9.669), Point(x=3.4902, y=9.669), Point(x=3.4902, y=9.8099), Point(,
  FormLine(text=担当者鈴木太郎, bounding_box=[Point(x=0.6043, y=9.8906), Point(x=1.7184, y=9.8906), Point(x=1.7184, y=10.0293), Point(x=0.6043, y=10.0293)], words=[FormWord(text=担, bounding_box=[Point(x=0.6043, y=9.8916), Point(x=0.7513, y=9.8916), Point(x=0.7513, y=10.0289), Point(x=0.6043, y=10.0289)], confidence=1.0, page_number=1, kind=word), FormWord(text=当, bounding_box=[Point(x=0.7679, y=9.8918), Point(x=0.8925, y=9.8918), Point(x=0.8925, y=10.0282), Point(x=0.7679, y=10.0282)], confidence=1.0, page_number=1, kind=word), FormWord(text=者, bounding_box=[Point(x=0.9125, y=9.8906), Point(x=1.0554, y=9.8906), Point(x=1.0554, y=10.029), Point(x=0.9125, y=10.029)], confidence=1.0, page_number=1, kind=word), FormWord(text=鈴, bounding_box=[Point(x=1.1146, y=9.8906), Point(x=1.2656, y=9.8906), Point(x=1.2656, y=10.0287), Point(x=1.1146, y=10.0287)], confidence=1.0, page_number=1, kind=word), FormWord(text=木, bounding_box=[Point(x=1.2673, y=9.893), Point(x=1.4195, y=9.893), Point(x=1.4195, y=10.0293), Point(x=1.2673, y=10,
  FormLine(text=よろしくお願いいたします。, bounding_box=[Point(x=0.6083, y=10.1138), Point(x=2.0766, y=10.1138), Point(x=2.0766, y=10.2477), Point(x=0.6083, y=10.2477)], words=[FormWord(text=よろしくお, bounding_box=[Point(x=0.6083, y=10.1138), Point(x=1.1406, y=10.1138), Point(x=1.1406, y=10.2458), Point(x=0.6083, y=10.2458)], confidence=1.0, page_number=1, kind=word), FormWord(text=願, bounding_box=[Point(x=1.1458, y=10.1156), Point(x=1.2957, y=10.1156), Point(x=1.2957, y=10.2477), Point(x=1.1458, y=10.2477)], confidence=1.0, page_number=1, kind=word), FormWord(text=いいたします。, bounding_box=[Point(x=1.309, y=10.1146), Point(x=2.0766, y=10.1146), Point(x=2.0766, y=10.2449), Point(x=1.309, y=10.2449)], confidence=1.0, page_number=1, kind=word)], page_number=1, kind=line, appearance=TextAppearance(style_name=other, style_confidence=1.0))],
 'page_number': 1,
 'selection_marks': [FormSelectionMark(text=None, bounding_box=[Point(x=5.7473, y=3.7342), Point(x=5.9176, y=3.7342), Point(x=5.9176, y=3.847), Point(x=5.7473, y=3.847)], confidence=0.251, page_number=1, state=unselected, kind=selectionMark),
  FormSelectionMark(text=None, bounding_box=[Point(x=2.473, y=4.2273), Point(x=2.5626, y=4.2273), Point(x=2.5626, y=4.3111), Point(x=2.473, y=4.3111)], confidence=0.223, page_number=1, state=selected, kind=selectionMark),
  FormSelectionMark(text=None, bounding_box=[Point(x=2.5935, y=4.2268), Point(x=2.6863, y=4.2268), Point(x=2.6863, y=4.3158), Point(x=2.5935, y=4.3158)], confidence=0.223, page_number=1, state=selected, kind=selectionMark)],
 'tables': [FormTable(page_number=1, cells=[FormTableCell(text=, row_index=0, column_index=0, row_span=1, column_span=1, bounding_box=[Point(x=0.5628, y=5.0483), Point(x=2.2571, y=5.0483), Point(x=2.2498, y=5.4446), Point(x=0.5628, y=5.4446)], confidence=1.0, is_header=True, is_footer=False, page_number=1, field_elements=None), FormTableCell(text=品目, row_index=0, column_index=1, row_span=1, column_span=1, bounding_box=[Point(x=2.2571, y=5.0483), Point(x=5.3388, y=5.0483), Point(x=5.3388, y=5.4446), Point(x=2.2498, y=5.4446)], confidence=1.0, is_header=True, is_footer=False, page_number=1, field_elements=[FormWord(text=品, bounding_box=[Point(x=2.8057, y=5.1724), Point(x=2.9522, y=5.1724), Point(x=2.9522, y=5.3162), Point(x=2.8057, y=5.3162)], confidence=1.0, page_number=1, kind=word), FormWord(text=目, bounding_box=[Point(x=2.9794, y=5.1741), Point(x=3.1104, y=5.1741), Point(x=3.1104, y=5.3159), Point(x=2.9794, y=5.3159)], confidence=1.0, page_number=1, kind=word)]), FormTableCell(text=金額, row_index=0, column_index=2, row_],
 'text_angle': 0.0,
 'unit': 'inch',
 'width': 8.2639}

認識結果が得られました。ここで、認識したテキストを表示してみます。

# 抽出されたテキストを表示
for line in pages[0].lines:
    print(line.text)
株式会社東京システム技研
〒160-0023東京都新宿区西新宿1丁目21番明宝ビル3F
TEL 03-3342-2651
FAX 03-3348-4634
請求書№ 999
2022/2/7
請求先
件名
株式会社○○物産
果物
〒999-9999 ○○県△△市××1-1-1
TEL 123-456-7890
品目
金額
りんご 5kg
¥2,000.00
みかん 1kg
¥500.00
バナナ 10kg
¥3,000.00
小計
¥5,500.00
消費税率
10.00%
その他費用
¥1,000.00
合計
¥7,050.00
この請求書に関してご
不明な点がございましたら、下記までお問い合わせください。
担当者鈴木太郎
よろしくお願いいたします。

概ね正しく抽出されたことが分かります。

表の認識結果の表示

使用したサンプルファイルには表形式の箇所があります。表の認識結果はtablesに格納されているので、その値を確認します。まず、認識した表の数を出力します。

len(pages[0].tables)
1

表の数は1つでした。中身を確認します。

# 表データの確認
table = pages[0].tables[0]
vars(table)
{'bounding_box': [Point(x=0.5606, y=5.033),
  Point(x=7.6812, y=5.0331),
  Point(x=7.6817, y=6.8562),
  Point(x=0.5585, y=6.8552)],
 'cells': [FormTableCell(text=, row_index=0, column_index=0, row_span=1, column_span=1, bounding_box=[Point(x=0.5628, y=5.0483), Point(x=2.2571, y=5.0483), Point(x=2.2498, y=5.4446), Point(x=0.5628, y=5.4446)], confidence=1.0, is_header=True, is_footer=False, page_number=1, field_elements=None),
  FormTableCell(text=品目, row_index=0, column_index=1, row_span=1, column_span=1, bounding_box=[Point(x=2.2571, y=5.0483), Point(x=5.3388, y=5.0483), Point(x=5.3388, y=5.4446), Point(x=2.2498, y=5.4446)], confidence=1.0, is_header=True, is_footer=False, page_number=1, field_elements=[FormWord(text=品, bounding_box=[Point(x=2.8057, y=5.1724), Point(x=2.9522, y=5.1724), Point(x=2.9522, y=5.3162), Point(x=2.8057, y=5.3162)], confidence=1.0, page_number=1, kind=word), FormWord(text=目, bounding_box=[Point(x=2.9794, y=5.1741), Point(x=3.1104, y=5.1741), Point(x=3.1104, y=5.3159), Point(x=2.9794, y=5.3159)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=金額, row_index=0, column_index=2, row_span=1, column_span=1, bounding_box=[Point(x=5.3388, y=5.0483), Point(x=7.683, y=5.0483), Point(x=7.683, y=5.4446), Point(x=5.3388, y=5.4446)], confidence=1.0, is_header=True, is_footer=False, page_number=1, field_elements=[FormWord(text=金, bounding_box=[Point(x=6.3564, y=5.1678), Point(x=6.5202, y=5.1678), Point(x=6.5202, y=5.3136), Point(x=6.3564, y=5.3136)], confidence=1.0, page_number=1, kind=word), FormWord(text=額, bounding_box=[Point(x=6.5226, y=5.1676), Point(x=6.6872, y=5.1676), Point(x=6.6872, y=5.3172), Point(x=6.5226, y=5.3172)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=りんご 5kg, row_index=1, column_index=0, row_span=1, column_span=1, bounding_box=[Point(x=0.5628, y=5.4446), Point(x=2.2498, y=5.4446), Point(x=2.2498, y=5.8208), Point(x=0.5628, y=5.8208)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=りんご, bounding_box=[Point(x=0.9207, y=5.5566), Point(x=1.2626, y=5.5566), Point(x=1.2626, y=5.698), Point(x=0.9207, y=5.698)], confidence=1.0, page_number=1, kind=word), FormWord(text=5kg, bounding_box=[Point(x=1.3163, y=5.5698), Point(x=1.5635, y=5.5698), Point(x=1.5635, y=5.7215), Point(x=1.3163, y=5.7215)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=, row_index=1, column_index=1, row_span=1, column_span=1, bounding_box=[Point(x=2.2498, y=5.4446), Point(x=5.3388, y=5.4446), Point(x=5.3388, y=5.8208), Point(x=2.2498, y=5.8208)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=None),
  FormTableCell(text=¥2,000.00, row_index=1, column_index=2, row_span=1, column_span=1, bounding_box=[Point(x=5.3388, y=5.4446), Point(x=7.683, y=5.4446), Point(x=7.683, y=5.8208), Point(x=5.3388, y=5.8208)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=¥2,000.00, bounding_box=[Point(x=6.1416, y=5.5727), Point(x=6.8976, y=5.5727), Point(x=6.8976, y=5.7172), Point(x=6.1416, y=5.7172)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=みかん 1kg, row_index=2, column_index=0, row_span=1, column_span=1, bounding_box=[Point(x=0.5628, y=5.8208), Point(x=2.2498, y=5.8208), Point(x=2.2498, y=6.2104), Point(x=0.5628, y=6.2104)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=みかん, bounding_box=[Point(x=0.9184, y=5.9533), Point(x=1.2939, y=5.9533), Point(x=1.2939, y=6.0842), Point(x=0.9184, y=6.0842)], confidence=1.0, page_number=1, kind=word), FormWord(text=1kg, bounding_box=[Point(x=1.3697, y=5.9565), Point(x=1.6098, y=5.9565), Point(x=1.6098, y=6.1081), Point(x=1.3697, y=6.1081)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=, row_index=2, column_index=1, row_span=1, column_span=1, bounding_box=[Point(x=2.2498, y=5.8208), Point(x=5.3388, y=5.8208), Point(x=5.3388, y=6.2104), Point(x=2.2498, y=6.2104)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=None),
  FormTableCell(text=¥500.00, row_index=2, column_index=2, row_span=1, column_span=1, bounding_box=[Point(x=5.3388, y=5.8208), Point(x=7.683, y=5.8208), Point(x=7.683, y=6.2104), Point(x=5.3388, y=6.2104)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=¥500.00, bounding_box=[Point(x=6.2166, y=5.9596), Point(x=6.8243, y=5.9596), Point(x=6.8243, y=6.0784), Point(x=6.2166, y=6.0784)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=バナナ 10kg, row_index=3, column_index=0, row_span=1, column_span=2, bounding_box=[Point(x=0.5628, y=6.2104), Point(x=5.3388, y=6.2104), Point(x=5.3388, y=6.6), Point(x=0.5628, y=6.6)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=バナナ, bounding_box=[Point(x=0.9127, y=6.3322), Point(x=1.2824, y=6.3322), Point(x=1.2824, y=6.4704), Point(x=0.9127, y=6.4704)], confidence=1.0, page_number=1, kind=word), FormWord(text=10kg, bounding_box=[Point(x=1.3596, y=6.3434), Point(x=1.6949, y=6.3434), Point(x=1.6949, y=6.4951), Point(x=1.3596, y=6.4951)], confidence=1.0, page_number=1, kind=word)]),
  FormTableCell(text=¥3,000.00, row_index=3, column_index=2, row_span=1, column_span=1, bounding_box=[Point(x=5.3388, y=6.2104), Point(x=7.683, y=6.2104), Point(x=7.683, y=6.6), Point(x=5.3388, y=6.6)], confidence=1.0, is_header=False, is_footer=False, page_number=1, field_elements=[FormWord(text=¥3,000.00, bounding_box=[Point(x=6.1416, y=6.3464), Point(x=6.8976, y=6.3464), Point(x=6.8976, y=6.4908), Point(x=6.1416, y=6.4908)], confidence=1.0, page_number=1, kind=word)])],
 'column_count': 3,
 'page_number': 1,
 'row_count': 4}

tablesの中身はFormTableクラスとなっていて、認識した表の情報が格納されています。各変数の説明は以下の通りです。

変数説明
page_numberint認識した表が存在するページの番号
cellslist[FormTableCell]表に含まれるセルのリスト
row_countint行数
column_countint列数
bounding_boxlist[Point]バウンディングボックスの4隅を表すポイントのリスト

FormTableオブジェクトの値を元に表を作成します。まず行数と列数から空の表を作成します。

# 空の表を作成(値はすべて欠損値(NaN))
df = pd.DataFrame(np.zeros([table.row_count, table.column_count]) * np.nan)
df
012
0NaNNaNNaN
1NaNNaNNaN
2NaNNaNNaN
3NaNNaNNaN

次に作成した空の表に抽出結果を1つずつ入れていきます。

# 抽出結果を一つずつ表に代入する。
for cell in table.cells:
    r = cell.row_index
    c = cell.column_index
    df.loc[r,c] = cell.text 
df
012
0品目金額
1りんご 5kg¥2,000.00
2みかん 1kg¥500.00
3バナナ 10kgNaN¥3,000.00

上の表のような結果となりました。品目が2列に分かれてしまいましたが、表が認識されたことが確認できました。

バウンディングボックスの表示

最後にバウンディングボックスの表示を行います。手順は画像の時と同様です。ただし、PDFは認識結果の単位がインチとなっているので、ピクセル単位の画像に重ねてもスケールが異なるため正しい位置に表示できません。そのため、スケールを合わせる必要があります。

# 画像のコピーを作成
copy_img = copy.deepcopy(img)
draw = ImageDraw.Draw(copy_img)

# バウンディングボックスのスケールを調整
scale = img.size[1] / pages[0].height

# バウンディングボックス書き込み
lines = pages[0].lines
for line in lines:
    # 0,1,2,3 (左上、右上、右下、左下の順)
    left = line.bounding_box[0].x * scale
    top = line.bounding_box[0].y * scale
    right = line.bounding_box[2].x * scale
    bottom = line.bounding_box[2].y * scale

    # バウンディングボックス1個を描画
    grid = (left, top, right, bottom)
    draw.rectangle(grid, outline='red', fill=None, width=3)

# 画面表示
plt.figure(figsize=(15, 15))
plt.imshow(copy_img)
plt.show()

上手く表示できました。

演習問題

画像サンプル2.pngに対してOCRを行い、バウンディングボックスを表示してください。


今回はこれで以上になります。次回のテーマは音声認識の予定です。

最後までお読みいただきありがとうございました。それでは引き続き次回もよろしくお願いいたします。


一覧に戻る