Bank Cheque Dataset (Document AI)

Use Case: OCR

Format: .jpg

Count: 2023

Annotation: No

Description: The Bank Cheque Dataset (Document AI): Synthetic bank cheques consists of artificially generated cheque images designed to replicate the appearance and content of real cheques. It includes various elements such as payee names, amounts, dates, signatures, and cheque numbers. This dataset is used for training and evaluating Document AI systems in tasks like optical character recognition (OCR), cheque processing, and automated data extraction, providing a controlled environment for model development without the privacy concerns of real cheques.

Recording Condition: - Clicked Images - Scanned - Web scrapper

Bank Statement Dataset (Document AI)

Use Case: OCR

Format: .jpg, png

Count: 5366

Annotation: No

Description: The Bank Statement Dataset (Document AI): Synthetic bank statements includes artificially generated bank statements designed to simulate real financial documents. It features various transaction records, dates, amounts, and account details, structured to mirror real-world formats and content. This dataset is used for training and evaluating Document AI systems in tasks such as optical character recognition (OCR), data extraction, and document analysis, offering a controlled environment without the privacy issues of actual financial data.

Recording Condition: - Scanned - Bank_Statement - Web scrapper

Chinese Bills Dataset

Use Case: OCR

Format: Image

Count: 6k

Annotation: Yes

Description: The Chinese Bills Dataset includes images or text samples of various types of bills, such as invoices, receipts, and statements, written in Chinese. It features diverse formats and content, including item descriptions, amounts, and dates. This dataset is used for tasks like optical character recognition (OCR), financial document processing, and automated data extraction.

Documents / OCR – Arabic & English OCR Content Dataset

Use Case: Documents / OCR

Format: Images

Count: 1,321

Annotation: No

Description: Arabic and English content image collection: Image + annotation for OCR

Documents / OCR – Barcode Videos Dataset

Use Case: Documents / OCR

Format: Videos

Count: 2,767

Annotation: No

Description: Barcode videos (Code128, UPC/EAN, PDF417, Aztec, Multi-code)

Documents / OCR – Curved Printed Text Dataset

Use Case: Documents / OCR

Format: Images

Count: 18,986

Annotation: No

Description: Curved printed text: Collect images with curved text, or text with nonlinear baseline

Documents / OCR – Financial Documents (Bank, Payslip, Tax, US)

Use Case: Documents / OCR

Format: Images

Count: 26,446

Annotation: No

Description: Financial documents: Bank statement, cheque, payslip, tax, mortgage, insurance claims (US)

Documents / OCR – Financial Documents (Phase 1 – Mortgage)

Use Case: Documents / OCR

Format: Images

Count: 9,192

Annotation: No

Description: Financial Documents (Phase 1) Mortgage dataset – print, scan, photograph

Documents / OCR – Financial Documents (Phase 2 – Insurance)

Use Case: Documents / OCR

Format: Images

Count: 7,636

Annotation: No

Description: Financial Documents (Phase 2) Insurance dataset – print, scan, photograph

Documents / OCR – Handwritten Text Dataset (JP/KR/RU)

Use Case: Documents / OCR

Format: Images

Count: 106,313

Annotation: No

Description: Handwritten Text: LivePhotos with handwritten text (Japanese, Korean, Russian)

Documents / OCR – Invoice Dataset with Bounding Box Annotation

Use Case: Documents / OCR

Format: Images

Count: 87

Annotation: Yes

Description: Invoice dataset with bounding box annotations includes scanned or digital invoices where key fields such as invoice number, date, vendor details, line items, and total amounts are labeled with bounding boxes, enabling AI models to accurately detect and extract structured information from unstructured documents.

Documents / OCR – LivePhotos with Printed Text (JP/KR/RU)

Use Case: Documents / OCR

Format: Images

Count: 4,944

Annotation: No

Description: LivePhotos with printed text (Japanese, Korean, Russian)

Documents / OCR – Multilingual Receipts/Invoices Dataset

Use Case: Documents / OCR

Format: Images

Count: 8,961

Annotation: No

Description: Multilingual receipts and invoices dataset comprises diverse financial documents in multiple languages, enabling AI models to train for cross-lingual text recognition, key field extraction, and document understanding.

Documents / OCR – Synthetic Bank Statements (40 Templates)

Use Case: Documents / OCR

Format: Images

Count: 1,290

Annotation: No

Description: Bank Statements - Unique Template - 40: Synthetic bank statements

Documents / OCR – Synthetic Pay Slips (60 Templates)

Use Case: Documents / OCR

Format: Images

Count: 2,010

Annotation: No

Description: Cheque - Unique Template - 60: Synthetic Pay Slips

Documents / OCR – Synthetic Payslips (130 Templates)

Use Case: Documents / OCR

Format: Images

Count: 2,023

Annotation: No

Description: Payslips - Unique Template - 130: Synthetic bank cheque

Pay Slips Dataset (Document AI)

Use Case: OCR

Format: .jpg

Count: 2010

Annotation: No

Description: The Pay Slips Dataset (Document AI): Synthetic Pay Slips consists of images of artificially generated pay slips without any annotations. It features various pay slip formats and details such as employee names, salaries, and dates, used for training and testing Document AI systems in tasks like OCR and document processing.

Recording Condition: - Scanned - Web scrapper

What We Do Best

AI Data Services

Speciality

Off-The-Shelf Data Catalog & Licensing

Medical Datasets

Computer Vision Datasets

Speech/Audio Datasets

Solutions

By Industry

By Use Case

Financial Document Datasets for Advanced Document AI Training

Bank Cheque Dataset (Document AI)

Bank Statement Dataset (Document AI)

Chinese Bills Dataset

Documents / OCR – Arabic & English OCR Content Dataset

Documents / OCR – Barcode Videos Dataset

Documents / OCR – Curved Printed Text Dataset

Documents / OCR – Financial Documents (Bank, Payslip, Tax, US)

Documents / OCR – Financial Documents (Phase 1 – Mortgage)

Documents / OCR – Financial Documents (Phase 2 – Insurance)

Documents / OCR – Handwritten Text Dataset (JP/KR/RU)

Documents / OCR – Invoice Dataset with Bounding Box Annotation

Documents / OCR – LivePhotos with Printed Text (JP/KR/RU)

Documents / OCR – Multilingual Receipts/Invoices Dataset

Documents / OCR – Synthetic Bank Statements (40 Templates)

Documents / OCR – Synthetic Pay Slips (60 Templates)

Documents / OCR – Synthetic Payslips (130 Templates)

Pay Slips Dataset (Document AI)

AI Data Services

Speciality

Resources

Company

Contact Us

What We Do Best

AI Data Services

Speciality

Off-The-Shelf Data Catalog & Licensing

Medical Datasets

Computer Vision Datasets

Speech/Audio Datasets

Solutions

By Industry

By Use Case

Financial Document Datasets for Advanced Document AI Training

Bank Cheque Dataset (Document AI)

Bank Statement Dataset (Document AI)

Chinese Bills Dataset

Documents / OCR – Arabic & English OCR Content Dataset

Documents / OCR – Barcode Videos Dataset

Documents / OCR – Curved Printed Text Dataset

Documents / OCR – Financial Documents (Bank, Payslip, Tax, US)

Documents / OCR – Financial Documents (Phase 1 – Mortgage)

Documents / OCR – Financial Documents (Phase 2 – Insurance)

Documents / OCR – Handwritten Text Dataset (JP/KR/RU)

Documents / OCR – Invoice Dataset with Bounding Box Annotation

Documents / OCR – LivePhotos with Printed Text (JP/KR/RU)

Documents / OCR – Multilingual Receipts/Invoices Dataset

Documents / OCR – Synthetic Bank Statements (40 Templates)

Documents / OCR – Synthetic Pay Slips (60 Templates)

Documents / OCR – Synthetic Payslips (130 Templates)

Pay Slips Dataset (Document AI)

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?

Where should we send your training data?