| | | | | | | | | | | | | | | | | | | | | |
. | 5/25/2008 18:13:44 | Baird Group | http://www.cse.lehigh.edu/~baird/research.html | academic OCR research group | Latin script, predominantly English; mathematics | The group is at Lehigh University.
1. Document image content extraction (DICE); 2. Human Interactive Proofs and CAPTCHAs; 3. Full integration of document images into digital libraries; and 4. High-performance image understanding systems.
| | | | | | | | | | | | | | | |
. | 5/25/2008 18:16:09 | Govindaraju Group | http://cubs.buffalo.edu/govind/ | academic OCR research group | Latin, Arabic, Devanagari script; handwriting; English, Arabic, Sanskrit, Hindi | Dr. Govindaraju is currently working on extending his expertise in the automated recognition of both machine-printed and hand-written text in Latin script to Arabic and Indic scripts for indexing and searching documents. Dr. Govindaraju's foray into the field of biometrics began with his dissertation on the automated recognition of faces over two decades ago. His recent multi-disciplinary efforts include novel applications such as use of facial expressions for remote biometric authentication. | | | | | | | | | | | | | | | |
. | 5/25/2008 19:05:28 | PRImA | http://www.cse.salford.ac.uk/prima/ | academic OCR research group | | PRImA is a group of researchers aiming at developing world-class Pattern Recognition and Image Analysis techniques for real-world problems.
Techniques developed by PRImA members and their associates have gained international academic standing and are currently in use in Industry. | | | | | | | | | | | | | | | |
. | 5/26/2008 0:44:54 | IUPR | http://www.iupr.org/ | academic OCR research group | | Our research group conducts basic and applied research in pattern recognition, machine learning, image understanding, and artificial intelligence, with practical applications to digital libraries, network security, bioinformatics, historical document analysis, and scientific data analysis. To learn more about us, have a look at our Research Themes, Projects, and Publications. | | tmbdev | | | | | | | | | | | | | |
. | 5/25/2008 18:11:18 | Abbyy Finereader | http://finereader.abbyy.com/ | commercial OCR system | Latin script, 39 languages supported "with dictionaries" | Widely used commercial OCR system. | | | | | | | | | | | | | | | |
. | 5/25/2008 18:12:09 | Omnipage | http://www.nuance.com/omnipage/ | commercial OCR system | Latin script | A widely used commercial OCR system. | | | | | | | | | | | | | | | |
. | 5/25/2008 18:56:11 | Kadmos | http://www.rerecognition.com/ | commercial OCR system | KADMOS recognizes hand print, machine print, fraktur, norm fonts e.g. OCRA, OCRB, F7B, CMC7, E13B ..., and marks. | KADMOS is an easily integrated character recognition software component for professional use and is incorporated in complete packages for recognition solutions of every kind. The latest proprietary mathematical algorithms are used.
KADMOS is available in three versions: - REC for isolated single character - REL for separated text lines - REP for multiline recognition. | | | | | | | | | | | | | | | |
. | 5/25/2008 19:18:06 | TOPOCR | http://www.topocr.com/index.html | commercial OCR system | camera or smartphone | TopOCR is designed to be simple and user-friendly for use with your digital camera or smartphone. | | | | | | | | | | | | | | | |
. | 5/25/2008 19:22:19 | SimpleOCR | http://www.simpleocr.com/ | commercial OCR system | | SimpleOCR is the popular freeware OCR software with hundreds of thousands of users worldwide. SimpleOCR is also a royalty-free OCR SDK for developers to use in their custom applications. | | | | | | | | | | | | | | | |
. | 5/26/2008 0:34:01 | Automatic Reader (Arabic) | http://www.amazon.com/Automatic-Reader-Multilingual-OCR-Gold/dp/B0002A5D5U | commercial OCR system | Arabic |
* A trainable OCR. It recognizes Arabic text * Arabic Natural Language Processing, NLP * Supports both OMNI & Learning technologies * Works with any type of scanner * OCR bilingual documents (Arabic/English, and other Latin based LAnguages. Optionally Available: Great features are offered within Sakhrs Automatic Reader package regarding accuracy enhancement, employing NLP tools, supporting PDF, all new famous image formats, and other script languages that have similar shapes to Arabic such as Farsi, Urdu, Pashto and Jawi.
| | tmbdev | | | | | | | | | | | | | |
. | 5/26/2008 0:35:13 | ReadIris Middle East | http://www.irislink.com/c2-561-189/Readiris-Pro-11-Middle-East---Arabic-OCR-Software.aspx | commercial OCR system | Arabic, Farsi, English, and Hebrew | A mature commercial OCR system for desktop usage. | | tmbdev | | | | | | | | | | | | | |
. | 5/26/2008 0:36:01 | Verus | http://www.novodynamics.com/verus_stand.htm | commercial OCR system | Arabic | Sales blurb: "An extraordinarily advanced OCR solution, VERUS™ Standard provides the most accurate Middle Eastern language optical character recognition in the world. It recognizes Arabic, Farsi (Persian), Dari, and Pashto languages, including embedded English and French. It automatically detects and cleans degraded and skewed documents, automatically identifies a page's primary language, and recognizes a page's fonts without manual intervention. VERUS'™ intuitive user interface allows users to quickly review and edit recognized tex" | | tmbdev | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | MARG | http://marg.nlm.nih.gov/index2.asp | data set for OCR training or testing | English | directly downloadable from the site, contains > 1000 pages of scanned document images, dataset designed for OCR training or evaluation, flatbed scanned, contains a lot of alphabetic scripts (Latin, Greek, Hebrew, Russian, ...) , contains many academic journals; Scanned images of biomedical journals and their ground truth data. | | | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | UW3 | http://documents.cfar.umd.edu/resources/database/3UWCdRom.html | data set for OCR training or testing | English | This dataset has a number of problems: it is hard to obtain, and skew correction was carried out after ground truthing, making the bounding boxes for page elements somewhat inaccurate. | | | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | ETL | | data set for OCR training or testing | Japanese | contains > 1000 pages of scanned document images, contains a lot of CJK (Chinese, Japanese, Korean), contains many academic journals; Eletrotechnial Lab database of printed and handwritten documents. Mostly Japanese. It's unclear where to get that now. IUPR has a copy on disk. Documentation is in Japanese. | | | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | Tobacco Corpus | | data set for OCR training or testing | English | flatbed scanned, many scans are bitonal; Scanned legal documents from the US tobacco law suit. | | | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | Google 1000 | | data set for OCR training or testing | English, some other languages | dataset designed for OCR training or evaluation, contains a lot of historical (pre-1930) documents; A release of 1000 books from Google for the purpose of training and testing OCR systems. The distribution is about 120G large and is shipped on disk from Google. | | | | | | | | | | | | | | | |
. | 5/26/2008 0:37:45 | | http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/8850/27985/01249846.pdf?arnumber=1249846 | data set for OCR training or testing | Devanagari, Hindi | Creation of data resources and design of an evaluation test bed for Devanagari script recognition Setlur, S.; Kompalli, S.; Ramanaprasad, V.; Govindaraju, V. Research Issues in Data Engineering: Multi-lingual Information Management, 2003. RIDE-MLIM 2003. Proceedings. 13th International Workshop on Volume , Issue , 10-11 March 2003 Page(s): 55 - 61 Digital Object Identifier 10.1109/RIDE.2003.1249846 Summary: The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition. | | tmbdev | | | | | | | | | | | | | |
. | 5/26/2008 0:38:58 | IFN/ENIT Arabic Databse | http://www.ifnenit.com/ | data set for OCR training or testing | Arabic handwriting | The IFN/ENIT-database contains material for training and testing of Arabic handwriting recognition software. There are more than 2200 binary images of handwriting sample forms from 411 writers, about 26,000 binary word images have been isolated from the forms and saved individually for easy of access. A ground truth file for each word in the database has been compiled. This file contains information about the word such as the position of the words base line, and information on the individual used characters in the word. | | tmbdev | | | | | | | | | | | | | |
. | 5/25/2008 15:51:58 | Internet Archive | http://www.archive.org | digital library site | mostly English | not-for-profit book scanning and archiving effort; raw scans, Finereader output, PDFs,etc. | | | | | | | | | | | | | | | |