代码之家  ›  专栏  ›  技术社区  ›  Casper

MNIST OCR的图像预处理

  •  5
  • Casper  · 技术社区  · 6 年前

    我正忙于python中的OCR应用程序来读取数字。我使用OpenCV在图像上找到轮廓,裁剪它,然后将图像预处理为28x28,以用于MNIST数据集。我的图像不是方形的,所以当我调整图像大小时,我似乎失去了很多质量。有什么建议我可以试试吗?

    This is the original image

    This is after editing it

    And this is the quality it should be

    我试过 http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html ,比如扩张和开放。但这并不能让它变得更好,它只会让它变得模糊。。。

    这是我使用的代码(查找轮廓、裁剪、调整大小、设置阈值,然后将其居中)

    import numpy as np
    import cv2
    import imutils
    import scipy
    from imutils.perspective import four_point_transform
    from scipy import ndimage
    
    images = np.zeros((4, 784))
    correct_vals = np.zeros((4, 10))
    
    i = 0
    
    
    def getBestShift(img):
        cy, cx = ndimage.measurements.center_of_mass(img)
    
        rows, cols = img.shape
        shiftx = np.round(cols / 2.0 - cx).astype(int)
        shifty = np.round(rows / 2.0 - cy).astype(int)
    
        return shiftx, shifty
    
    
    def shift(img, sx, sy):
        rows, cols = img.shape
        M = np.float32([[1, 0, sx], [0, 1, sy]])
        shifted = cv2.warpAffine(img, M, (cols, rows))
        return shifted
    
    
    for no in [1, 3, 4, 5]:
        image = cv2.imread("images/" + str(no) + ".jpg")
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        blurred = cv2.GaussianBlur(gray, (5, 5), 0)
        edged = cv2.Canny(blurred, 50, 200, 255)
    
        cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
                                cv2.CHAIN_APPROX_SIMPLE)
        cnts = cnts[0] if imutils.is_cv2() else cnts[1]
        cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
        displayCnt = None
    
        for c in cnts:
            # approximate the contour
            peri = cv2.arcLength(c, True)
            approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    
            # if the contour has four vertices, then we have found
            # the thermostat display
            if len(approx) == 4:
                displayCnt = approx
                break
    
        warped = four_point_transform(gray, displayCnt.reshape(4, 2))
        gray = cv2.resize(255 - warped, (28, 28))
        (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY |     cv2.THRESH_OTSU)
    
    
        while np.sum(gray[0]) == 0:
            gray = gray[1:]
    
        while np.sum(gray[:, 0]) == 0:
            gray = np.delete(gray, 0, 1)
    
        while np.sum(gray[-1]) == 0:
            gray = gray[:-1]
    
        while np.sum(gray[:, -1]) == 0:
            gray = np.delete(gray, -1, 1)
    
        rows, cols = gray.shape
    
        if rows > cols:
            factor = 20.0 / rows
            rows = 20
            cols = int(round(cols * factor))
            gray = cv2.resize(gray, (cols, rows))
    
        else:
            factor = 20.0 / cols
            cols = 20
            rows = int(round(rows * factor))
            gray = cv2.resize(gray, (cols, rows))
    
        colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
        rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
        gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')
    
        shiftx, shifty = getBestShift(gray)
        shifted = shift(gray, shiftx, shifty)
        gray = shifted
    
        cv2.imwrite("processed/" + str(no) + ".png", gray)
        cv2.imshow("imgs", gray)
        cv2.waitKey(0)
    
    1 回复  |  直到 6 年前
        1
  •  4
  •   Zev    6 年前

    调整图像大小时,请确保选择最适合您需要的插值。为此,我建议:

    gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)
    

    这导致 enter image description here 在剩下的处理之后。

    您可以在此处看到方法的比较: http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/ 但因为只有少数,你可以全部尝试,看看什么能带来最好的结果。看起来默认值是INTER\u LINEAR。