Image text extraction software (miniocr) official version
The official version of the picture text extraction software (miniocr) is a text recognition tool built for text in pictures. The image text extraction software (miniocr) uses an optimized optical character recognition system to quickly find matching fonts based on characteristics. Image text extraction software (miniocr) can also accurately identify text in images, and its two outstanding advantages are the recognition function of text fonts and paragraph segmentation. Huajun Software Park provides download services for image text extraction software (miniocr). Everyone is welcome to download and use it!
Image text extraction software (miniocr) software features
1. Use the classification method of composite features.
2. The character set selects 3755 first-level Chinese characters.
3. Choose the most commonly used Song Dynasty font.
4. Choose the font size from small 5 to 1 Chinese characters, mainly for small fonts within 20 points.
5. When English and Chinese are mixed, Chinese will be given priority.
6. When Chinese characters are stuck together, perform dynamic optimization and segmentation.
Image text extraction software (miniocr) recognition principle
OCR software mainly consists of the following parts.
Image input and preprocessing:
Image input: For different image formats, there are different storage formats and different compression methods. There are currently open source projects such as OpenCV and CxImage.
. Preprocessing: mainly includes binarization, noise removal, tilt correction, etc.
Binarization:
Most of the pictures taken by cameras are color images. Color images contain a huge amount of information. For the content of the picture, we can simply divide it into the foreground and the background. In order to make the computer recognize text faster and better, we need to First, process the color image so that the image only has foreground information and background information. You can simply define the foreground information as black and the background information as white. This is a binary image.
Noise removal:
For different documents, we can have different definitions of noise. Denoising based on the characteristics of the noise is called noise removal.
Tilt correction:
Because ordinary users are more casual when taking pictures of documents, the pictures taken will inevitably be tilted, which requires text recognition software to correct.
Layout analysis:
The process of dividing document images into paragraphs and lines is called layout analysis. Due to the diversity and complexity of actual documents, there is currently no fixed and optimal cutting model.
Character cutting:
Due to the limitations of photographing conditions, characters often stick together and pens break, which greatly limits the performance of the recognition system. This requires text recognition software to have a character cutting function.
Character recognition:
This research is already very early. Template matching was introduced relatively early, and later it was mainly focused on feature extraction. Due to the influence of text displacement, stroke thickness, broken strokes, adhesion, rotation and other factors, the characteristics of features are greatly affected. Difficulty of extraction. Layout restoration:
People hope that the recognized text will still be arranged like the original document pictures, with the same paragraphs, the same position, and the same order, and can be output to word documents, pdf documents, etc. This process is called layout restoration.
Post-processing and proofreading:
Post-processing is the process of correcting the recognition results based on specific language context.
Instructions for use of image text extraction software (miniocr)
1. Download the software compressed package file, unzip it and click "MiniOcr.exe" to run it. Mini Ocr is free, green and does not require installation.
2. Click the "Open Image File" button to add the target picture.
3. Mini Ocr will display the image in the window on the right. If the file is a long image, you can also click "Paragraph Split" to cut it!
4. Click the "Text Recognition" button, and Mini Ocr will automatically recognize the text and font information contained in the image file, and supports copying and pasting.
5. After completion, click Save to export the text information in the target file!
Comparison of similar software
Fanyan text recognition OCRThe official version is a versatile text recognition tool. The latest version of Fanyan Text Recognition OCR supports image text extraction, text recognition, image to text and other functions, and can be used for multi-purpose text recognition and extraction. Fanyan text recognition OCR software also has other recognition functions such as general text, office documents, handwritten documents, QR codes, etc.
Magical OCR text recognition softwareIt is a professional, easy-to-use, and efficient OCR text software developed by Beijing Magic Pixel Technology Co., Ltd. It supports functions such as reading from pictures, identifying pictures on mobile phones, identifying content from scanners, identifying content taken by cameras, identifying content in the clipboard, and identifying content from screenshots. The extremely easy-to-use operating experience helps you quickly identify text content in pictures, saving you valuable time.
Huajun editor recommends:
The image text extraction software (miniocr) uses an optimized optical character recognition system to quickly find matching fonts using a method that matches the characteristics. Looking around, software similar to this software is everywhere on the Internet. I hope you like it!