代码之家 › 专栏 › 技术社区 › Atinesh

Tesseract OCR在TIFF文件上失败

pytesseract tesseract ocr python

Atinesh · 技术社区 · 6 年前

我有多页 .tif 文件,我试图从中提取文本使用Tesseract OCR,但我得到这个错误

代码

from PIL import Image
import pytesseract

img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0))  # OCR on 1st Page
text = ' '.join(text.split())
print(text)

错误

知道为什么吗

2 回复 | 直到 6 年前

Blender 6 年前

Image.seek 没有返回值,因此您实际上正在运行:

pytesseract.image_to_string(None)

相反,请执行以下操作:

img.seek(0)
text = pytesseract.image_to_string(img)

Jitesh Vacheta 6 年前

我有一个同样的问题,我尝试了下面的代码,它为我工作:-

导入全局
导入操作系统

“设置Tesseract OCR.exe文件路径” )

b = ''
for i in glob.glob('Fullpath of your image directory/*.tif'):  <-- you can give *.jpg extension in case of jpg image
    if  glob.glob('*.tif'):
        b = b +  (pytesseract.image_to_string(i))
print(b)

学习愉快!

推荐文章

Rahul Kishan · 如何使用Microsoft制作实时摄像头。媒体ocr具有边框覆盖,用户可以在UWP中触摸并进行文本到语音转换?

6 年前

Casper · MNIST OCR的图像预处理

6 年前

mandok · 降低图片中的噪波,使用tesseract启用OCR

6 年前

Rodrigo Pelissier · 如何将TesseractOCRiOS与西班牙语结合使用?

6 年前

Snake · 将google vision api文本检测限制在特定区域

6 年前

Bunnies_Nothing · 允许空格的17位正则表达式

7 年前

Kilazur · 使用数据库定位器在单个列上进行搜索,可信度是否正确?

7 年前

Maddy · 如何在python中使用OCR获取图像重绘文本的坐标

7 年前

Sean · OCR:低对比度/噪声区域

7 年前

S. A. · 在后台运行Automator工作流

7 年前