Named Entity Recognition (NER)

Definition

Named Entity Recognition (NER) is an NLP task that identifies and classifies entities in text, such as people, organizations, locations, dates, or products.

Purpose

The purpose is to structure unstructured text by extracting key entities. It supports search, information extraction, and knowledge graph building.

Importance

  • Fundamental for information retrieval and NLP pipelines.
  • Errors propagate to downstream applications.
  • Domain-specific NER (e.g., medical, legal) requires custom datasets.
  • Related to tasks like entity linking and relation extraction.

How It Works

  1. Collect and preprocess text.
  2. Annotate datasets with entity categories.
  3. Train models on labeled examples (CRFs, transformers).
  4. Predict entities in unseen text.
  5. Validate accuracy with test data.

Examples (Real World)

  • spaCy: open-source NLP library with built-in NER.
  • Stanford CoreNLP: provides named entity recognition tools.
  • Financial NLP: extracts company names from reports.

References / Further Reading