Optical character recognition (OCR)
Text OCR
- Classes : 903,069 annotated scene-text words (32 words per image on average
)
- 28,134 natural images from TextVQA
![](https://production-media.paperswithcode.com/datasets/Screenshot_2021-05-13_at_10.17.05.jpg)
NIST Database
- The US National Institute of Science publishes handwriting from 3600 writers, including more than 800,000 character images.
FUNSD
- Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms.
ICDAR 2003
- The ICDAR2003 dataset is a dataset for scene text recognition. It contains 507 natural scene images (including 258 training images and 249 test images) in total.
ST-VQA
- ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.
Devangri Characters
- A dataset of handwritten Devangari characters, composed of 1800 samples from 36 character classes obtained by 25 native writers.
Mathematics Expressions
- More than 10,000 expressions, including more than 101 mathematical symbols.
Chinese Characters
- A dataset of handwritten Chinese characters containing 909,818 images that corresponds to about 10 news articles.
Arabic Printed Text
- Contains a lexicon of 113,284 words, and uses 10 Arabic fonts.
Document database
- Contains 941 online handwritten documents by 189 writers, and covers lists, tables, formulas, diagrams and drawings.
Iam On-line Handwriting
- Contains forms of handwritten English text acquired on a whiteboard, and includes more than 1700 entries.
Street View Text
- The Street View Text dataset was harvested from Google Street View, and mostly deals with outdoor street level signs and boards.
Street View House Numbers
- Contains 73257 digits of house street numbers, taken from Google Street View.
Natural Environment OCR
- A dataset that contains 659 real world images with 5238 annotations of text.
Scene Text
- Contains 3000 images captured in different environments, including outdoors and indoors scenes under different lighting conditions (clear day, night, strong artificial lights, etc).
Text Detection
- Contains 500 natural images, which are taken using a pocket camera. The indoor images are mainly signs, doorplates and caution plates while the outdoor images are mostly guide boards and billboards.
Stanford OCR
- Contains handwritten words dataset collected by MIT Spoken Language Systems Group, published by Stanford.
Chars74K Data
- This has 74K images of both English and Kannada digits.