Definition
Unstructured data is information that does not follow a predefined schema, such as free text, images, video, or audio.
Purpose
The purpose is to capture complex, real-world information that cannot be represented in structured tables.
Importance
- Represents the majority of data generated today.
- Enables advanced AI applications in vision, speech, and NLP.
- Difficult to process and analyze without AI.
- Raises storage and governance challenges.
How It Works
- Collect unstructured data from sources (social media, cameras, sensors).
- Store in formats like JSON, multimedia, or raw logs.
- Apply AI models to extract meaning.
- Convert into structured representations when possible.
- Use in downstream analytics and decision-making.
Examples (Real World)
- Social media posts used for trend analysis.
- Medical imaging for diagnosis.
- Customer support chat logs.
References / Further Reading
- NIST Big Data Interoperability Framework.
- ISO/IEC TR 20547 Big Data Standards.
- EMC/IDC Digital Universe Report.