can recognize Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified
and Traditional), Danish, German (standard and Fraktur script), Greek,
Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian,
Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese,
Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish,
Serbian, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian and Vietnamese.
Tesseract can be trained to work in other languages too .
Tesseract includes the English training data. If you want to use another
language, download the appropriate training data, unpack it using 7-zip, and
copy the .traineddata file into the 'tessdata' directory, probably C:\Program
Tesseract's output will be very poor quality if the input images are not
preprocessed to suit it: Images (especially screenshots) must be scaled up such
that the text x-height is at least 20 pixels, any rotation or skew must be
corrected or no text will be recognized, low-frequency changes in brightness
must be high-pass filtered, or Tesseract's binarization stage will destroy much
of the page, and dark borders must be manually removed, or they will be
misinterpreted as characters .
Tesseract does not come with a GUI and is instead run from the command-line
interface . Tesseract version 3.03 is released and available for use.
220.127.116.11 Desktop OCR Software using Tesseract Tool
Tesseract tool is used as a backend for Desktop OCR Software. Researcher
has studied following Desktop OCR software of Tesseract Tool.
PDF OCR X