Unstructured Data

Definition

Unstructured data is information that does not follow a predefined schema, such as free text, images, video, or audio.

Purpose

The purpose is to capture complex, real-world information that cannot be represented in structured tables.

Importance

  • Represents the majority of data generated today.
  • Enables advanced AI applications in vision, speech, and NLP.
  • Difficult to process and analyze without AI.
  • Raises storage and governance challenges.

How It Works

  1. Collect unstructured data from sources (social media, cameras, sensors).
  2. Store in formats like JSON, multimedia, or raw logs.
  3. Apply AI models to extract meaning.
  4. Convert into structured representations when possible.
  5. Use in downstream analytics and decision-making.

Examples (Real World)

  • Social media posts used for trend analysis.
  • Medical imaging for diagnosis.
  • Customer support chat logs.

References / Further Reading

  • NIST Big Data Interoperability Framework.
  • ISO/IEC TR 20547 Big Data Standards.
  • EMC/IDC Digital Universe Report.