Wednesday, February 08, 2006

Some parts of "Optical Character Recognition" article published in 1992

Optical character recognition (OCR) is a process by which printed text is detected and transformed into a computer text file. OCR consists of two basic processes: scanning and recognition. Scanning, performed with a device called a scanner, digitizes the printed page, creating a coded graphics version of the text that may be stored on disk. That coded version transforms the scanned image into pixels, and it is readable by graphics programs.
The separate recognition process translates the picture of an "A" into the letter "A." A new file is created in a format determined by user instructions. That file is readable by word processor, statistics, and/or database software supported by the OCR program used.
OCR is a technique that can be useful to politicals scientists. For example, research notes taken from printed sources, rather than being laboriously typed, could be scanned, processed, and saved as a file readable by a word processing package.
Content analysis might be almost completely mechanized. Numerical data from government reports could be scanned rather than entered by hand and then made readable by a spreadsheet, database management program, or statistics package.

Guidelines for Best Results
The success of an OCR operation is more dependent on the quality of the scanned image than any other factor. The best scanner and software working with a flawed image that appears readable to the human eye may not produce satisfactory results. An image might be too light with parts of letters slightly malformed or separated, or an image might be too dark with letters touching and loops (such as the inside of a small "e") nearly filled in.

Anne Permaloff, Auburn University at Montgomery
Carl Grafton, Auburn University at Montgomery

Political Science and Politics, Vol. 25, No. 3 (Sep., 1992), 523-531.

No comments: