代码之家  ›  专栏  ›  技术社区  ›  Yohan D

改进图片以检测区域内的字符

  •  2
  • Yohan D  · 技术社区  · 6 年前

    Input Image

    • 旋转图像使蓝色矩形水平[需要帮助]
    • 根据蓝色矩形裁剪图像[需要帮助]
    • 应用阈值过滤器和高斯模糊
    • 使用Tesseract检测字符

      img = Image.open('grid.jpg')
      image = np.array(img.convert("RGB"))[:, :, ::-1].copy()
      
      
      # Need to rotate the image here and fill the blanks
      # Need to crop the image here
      
      # Gray  the image
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      
      # Otsu's thresholding
      ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
      
      # Gaussian Blur
      blur = cv2.GaussianBlur(th3, (5, 5), 0)
      
      # Save the image
      cv2.imwrite("preproccessed.jpg", blur)
      
      # Apply the OCR
      pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
      tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'
      
      preprocessed = Image.open('preproccessed.jpg')
      boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)
      

    output

    OCR问题:

    • 有时,Tesseract会将一行中的字符识别为一个单词(gcvdrteuqcebursidec),有时也会识别为单个字母。我希望它永远是一个词。
    • 右下角的小金字塔被认为是一个字符

    欢迎提出任何其他提高认可度的建议

    4 回复  |  直到 6 年前
        1
  •  2
  •   Mark Setchell    6 年前

    这里有一个方法来继续。。。

    转换为HSV,然后从每个角开始,向图片的中间方向前进,寻找距离每个角最近的像素,该像素有些饱和,色调与周围的蓝色矩形相匹配。这将给你4个红色标记:

    enter image description here

    现在使用透视变换将每个点移动到角点,使图像直线化。我使用了ImageMagick,但是您应该可以看到,我将坐标(210,51)处的左上角红点转换为新图像左上角的(0,0)。同样地,右上角的红点(1754,19)被移到(2064,0)。终端中的ImageMagick命令是:

    convert wordsearch.jpg \
      -distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg
    

    enter image description here

    下一个问题是光线不均匀-也就是说左下角比图像的其余部分暗。为了抵消这一点,我克隆图像并对其进行模糊处理以去除高频(只是框模糊,或者框平均值很好),因此它现在代表缓慢变化的照明。然后我从中减去图像,这样我就能有效地去除背景变化,只留下高频的东西——比如你的信。然后我将结果标准化,使白色和黑色分别为白色和黑色,阈值为50%。

    convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
       -compose difference -composite  -negate -normalize -threshold 50% final.jpg
    

    enter image description here

        2
  •  1
  •   Kinght 金    6 年前

    以下是我识别字符的步骤:

    (1) detect the blue in hsv space, approx the inner blur contour and sort the corner points:
    (2) find persprctive transform matrix and do perspective transform
    (3) threshold it (and find characters)
    (4) use `mnist` algorithms to recognize the chars
    

    step (1) find the corners of the blur rect
    

    Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)

    enter image description here enter image description here

    step (2) crop
    

    enter image description here

    step (3) threshold (and find the chars)
    

    enter image description here enter image description here

    step (4) on working...
    
        3
  •  1
  •   jcupitt    6 年前

    pyvips .

    如果图像只是旋转的(即很少或没有透视),你可以用FFT来找到旋转角度。漂亮、规则的字符网格将在变换上生成一组清晰的线条。它应该非常坚固。这是在整个图像上执行FFT,但是如果您想要更高的速度,可以先将其缩小一点。

    import sys
    import pyvips
    
    image = pyvips.Image.new_from_file(sys.argv[1])
    
    # to monochrome, take the fft, wrap the origin to the centre, get magnitude
    fft = image.colourspace('b-w').fwfft().wrap().abs()
    

    enter image description here

    def to_rectangular(image):
        xy = pyvips.Image.xyz(image.width, image.height)
        xy *= [1, 360.0 / image.height]
        index = xy.rect()
        scale = min(image.width, image.height) / float(image.width)
        index *= scale / 2.0
        index += [image.width / 2.0, image.height / 2.0]
        return image.mapim(index)
    
    # sum of columns, sum of rows
    cols, rows = to_rectangular(fft).project()
    

    制作:

    enter image description here

    投影为:

    enter image description here

    然后寻找峰值并旋转:

    # blur the rows projection a bit, then get the maxpos
    v, x, y = rows.gaussblur(10).maxpos()
    
    # and turn to an angle in degrees we should counter-rotate by
    angle = 270 - 360 * y / rows.height
    
    image = image.rotate(angle)
    

    enter image description here

    cols, rows = image.project() 
    
    h = (cols[2] - cols[1]) > 10000
    v = (rows[2] - rows[1]) > 10000
    
    # search in from the edges for the first non-zero value
    cols, rows = h.profile()
    left = rows.avg()
    
    cols, rows = h.fliphor().profile()
    right = h.width - rows.avg()
    width = right - left
    
    cols, rows = v.profile()
    top = cols.avg()
    
    cols, rows = v.flipver().profile()
    bottom = v.height - cols.avg()
    height = bottom - top
    
    # move the crop in by a margin
    margin = 10
    left += margin
    top += margin
    width -= 2 * margin
    height -= 2 * margin
    
    # and crop!
    image = image.crop(left, top, width, height)
    

    使:

    enter image description here

    image = image.colourspace('b-w').gaussblur(70) - image
    

    使:

    enter image description here