Fractionner des lignes de texte dans un document numérisé

Question

J'essaie de trouver un moyen de rompre le fractionnement des lignes de texte dans un document numérisé qui a été adapté par seuils. En ce moment, je stocke les valeurs de pixels du document sous forme d'entiers non signés de 0 à 255, et je prends la moyenne des pixels de chaque ligne, et je divise les lignes en plages selon que la moyenne des valeurs de pixels est supérieur à 250, puis je prends la médiane de chaque plage de lignes pour laquelle cela vaut. Cependant, cette méthode échoue parfois, car il peut y avoir des taches noires sur l'image.

Existe-t-il un moyen plus résistant au bruit pour effectuer cette tâche?

EDIT: Voici un peu de code. "déformé" est le nom de l'image d'origine, "coupes" est l'endroit où je veux diviser l'image.

warped = threshold_adaptive(warped, 250, offset = 10) warped = warped.astype("uint8") * 255 # get areas where we can split image on whitespace to make OCR more accurate color_level = np.array([np.sum(line) / len(line) for line in warped]) cuts = [] i = 0 while(i < len(color_level)): if color_level[i] > 250: begin = i while(color_level[i] > 250): i += 1 cuts.append((i + begin)/2) # middle of the whitespace region else: i += 1

EDIT 2: Exemple d'image ajoutée

Miki · Accepted Answer

À partir de votre image d'entrée, vous devez rendre le texte en blanc et l'arrière-plan en noir

Vous devez ensuite calculer l'angle de rotation de votre facture. Une approche simple consiste à trouver le minAreaRect de tous les points blancs (findNonZero), et vous obtenez:

Ensuite, vous pouvez faire pivoter votre facture, de sorte que le texte soit horizontal:

Vous pouvez maintenant calculer la projection horizontale (reduce). Vous pouvez prendre la valeur moyenne de chaque ligne. Appliquez un seuil th sur l'histogramme pour tenir compte du bruit dans l'image (ici j'ai utilisé 0, c'est-à-dire pas de bruit). Les lignes avec uniquement un arrière-plan auront une valeur >0, les lignes de texte auront la valeur 0 dans l'histogramme. Ensuite, prenez la coordonnée moyenne des cases de chaque séquence continue de cases blanches dans l'histogramme. Ce sera la coordonnée y de vos lignes:

Voici le code. C'est en C++, mais comme la plupart du travail concerne les fonctions OpenCV, il devrait être facilement convertible en Python. Au moins, vous pouvez l'utiliser comme référence:

#include <opencv2/opencv.hpp> using namespace cv; using namespace std; int main() { // Read image Mat3b img = imread("path_to_image"); // Binarize image. Text is white, background is black Mat1b bin; cvtColor(img, bin, COLOR_BGR2GRAY); bin = bin < 200; // Find all white pixels vector<Point> pts; findNonZero(bin, pts); // Get rotated rect of white pixels RotatedRect box = minAreaRect(pts); if (box.size.width > box.size.height) { swap(box.size.width, box.size.height); box.angle += 90.f; } Point2f vertices[4]; box.points(vertices); for (int i = 0; i < 4; ++i) { line(img, vertices[i], vertices[(i + 1) % 4], Scalar(0, 255, 0)); } // Rotate the image according to the found angle Mat1b rotated; Mat M = getRotationMatrix2D(box.center, box.angle, 1.0); warpAffine(bin, rotated, M, bin.size()); // Compute horizontal projections Mat1f horProj; reduce(rotated, horProj, 1, CV_REDUCE_AVG); // Remove noise in histogram. White bins identify space lines, black bins identify text lines float th = 0; Mat1b hist = horProj <= th; // Get mean coordinate of white white pixels groups vector<int> ycoords; int y = 0; int count = 0; bool isSpace = false; for (int i = 0; i < rotated.rows; ++i) { if (!isSpace) { if (hist(i)) { isSpace = true; count = 1; y = i; } } else { if (!hist(i)) { isSpace = false; ycoords.Push_back(y / count); } else { y += i; count++; } } } // Draw line as final result Mat3b result; cvtColor(rotated, result, COLOR_GRAY2BGR); for (int i = 0; i < ycoords.size(); ++i) { line(result, Point(0, ycoords[i]), Point(result.cols, ycoords[i]), Scalar(0, 255, 0)); } return 0; }

Kinght 金 · Answer

Étapes de base comme @Miki,

lire la source

battu

trouver minAreaRect

chaîne par la matrice tournée

trouver et dessiner des limites supérieures et inférieures

Alors que code en Python :

#!/usr/bin/python3 # 2018.01.16 01:11:49 CST # 2018.01.16 01:55:01 CST import cv2 import numpy as np ## (1) read img = cv2.imread("img02.jpg") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ## (2) threshold th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU) ## (3) minAreaRect on the nozeros pts = cv2.findNonZero(threshed) ret = cv2.minAreaRect(pts) (cx,cy), (w,h), ang = ret if w>h: w,h = h,w ang += 90 ## (4) Find rotated matrix, do rotation M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0) rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0])) ## (5) find and draw the upper and lower boundary of each lines hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1) th = 2 H,W = img.shape[:2] uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th] lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th] rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR) for y in uppers: cv2.line(rotated, (0,y), (W, y), (255,0,0), 1) for y in lowers: cv2.line(rotated, (0,y), (W, y), (0,255,0), 1) cv2.imwrite("result.png", rotated)

Enfin résultat :