Bank Cheque Dataset (Document AI)
Use Case: OCR
Format: .jpg
Count: 2023
Annotation: No
Description: The Bank Cheque Dataset (Document AI): Synthetic bank cheques consists of artificially generated cheque images designed to replicate the appearance and content of real cheques. It includes various elements such as payee names, amounts, dates, signatures, and cheque numbers. This dataset is used for training and evaluating Document AI systems in tasks like optical character recognition (OCR), cheque processing, and automated data extraction, providing a controlled environment for model development without the privacy concerns of real cheques.
Recording Condition: - Clicked Images - Scanned - Web scrapper
Bank Statement Dataset (Document AI)
Use Case: OCR
Format: .jpg, png
Count: 5366
Annotation: No
Description: The Bank Statement Dataset (Document AI): Synthetic bank statements includes artificially generated bank statements designed to simulate real financial documents. It features various transaction records, dates, amounts, and account details, structured to mirror real-world formats and content. This dataset is used for training and evaluating Document AI systems in tasks such as optical character recognition (OCR), data extraction, and document analysis, offering a controlled environment without the privacy issues of actual financial data.
Recording Condition: - Scanned - Bank_Statement - Web scrapper
Chinese Bills Dataset
Use Case: OCR
Format: Image
Count: 6k
Annotation: Yes
Description: The Chinese Bills Dataset includes images or text samples of various types of bills, such as invoices, receipts, and statements, written in Chinese. It features diverse formats and content, including item descriptions, amounts, and dates. This dataset is used for tasks like optical character recognition (OCR), financial document processing, and automated data extraction.
Documents / OCR – Arabic & English OCR Content Dataset
Use Case: Documents / OCR
Format: Images
Count: 1,321
Annotation: No
Description: Arabic and English content image collection: Image + annotation for OCR
Documents / OCR – Barcode Videos Dataset

Use Case: Documents / OCR
Format: Videos
Count: 2,767
Annotation: No
Description: Barcode videos (Code128, UPC/EAN, PDF417, Aztec, Multi-code)
Documents / OCR – Curved Printed Text Dataset

Use Case: Documents / OCR
Format: Images
Count: 18,986
Annotation: No
Description: Curved printed text: Collect images with curved text, or text with nonlinear baseline
Documents / OCR – Financial Documents (Bank, Payslip, Tax, US)

Use Case: Documents / OCR
Format: Images
Count: 26,446
Annotation: No
Description: Financial documents: Bank statement, cheque, payslip, tax, mortgage, insurance claims (US)
Documents / OCR – Financial Documents (Phase 1 – Mortgage)
Use Case: Documents / OCR
Format: Images
Count: 9,192
Annotation: No
Description: Financial Documents (Phase 1) Mortgage dataset – print, scan, photograph
Documents / OCR – Financial Documents (Phase 2 – Insurance)

Use Case: Documents / OCR
Format: Images
Count: 7,636
Annotation: No
Description: Financial Documents (Phase 2) Insurance dataset – print, scan, photograph
Documents / OCR – Handwritten Text Dataset (JP/KR/RU)

Use Case: Documents / OCR
Format: Images
Count: 106,313
Annotation: No
Description: Handwritten Text: LivePhotos with handwritten text (Japanese, Korean, Russian)
Documents / OCR – Invoice Dataset with Bounding Box Annotation

Use Case: Documents / OCR
Format: Images
Count: 87
Annotation: Yes
Description: Invoice dataset with bounding box annotations includes scanned or digital invoices where key fields such as invoice number, date, vendor details, line items, and total amounts are labeled with bounding boxes, enabling AI models to accurately detect and extract structured information from unstructured documents.
Documents / OCR – LivePhotos with Printed Text (JP/KR/RU)
Use Case: Documents / OCR
Format: Images
Count: 4,944
Annotation: No
Description: LivePhotos with printed text (Japanese, Korean, Russian)
Documents / OCR – Multilingual Receipts/Invoices Dataset

Use Case: Documents / OCR
Format: Images
Count: 8,961
Annotation: No
Description: Multilingual receipts and invoices dataset comprises diverse financial documents in multiple languages, enabling AI models to train for cross-lingual text recognition, key field extraction, and document understanding.
Documents / OCR – Synthetic Bank Statements (40 Templates)

Use Case: Documents / OCR
Format: Images
Count: 1,290
Annotation: No
Description: Bank Statements - Unique Template - 40: Synthetic bank statements
Documents / OCR – Synthetic Pay Slips (60 Templates)

Use Case: Documents / OCR
Format: Images
Count: 2,010
Annotation: No
Description: Cheque - Unique Template - 60: Synthetic Pay Slips
Documents / OCR – Synthetic Payslips (130 Templates)

Use Case: Documents / OCR
Format: Images
Count: 2,023
Annotation: No
Description: Payslips - Unique Template - 130: Synthetic bank cheque
Pay Slips Dataset (Document AI)

Use Case: OCR
Format: .jpg
Count: 2010
Annotation: No
Description: The Pay Slips Dataset (Document AI): Synthetic Pay Slips consists of images of artificially generated pay slips without any annotations. It features various pay slip formats and details such as employee names, salaries, and dates, used for training and testing Document AI systems in tasks like OCR and document processing.
Recording Condition: - Scanned - Web scrapper

