Definition
Named Entity Recognition (NER) is an NLP task that identifies and classifies entities in text, such as people, organizations, locations, dates, or products.
Purpose
The purpose is to structure unstructured text by extracting key entities. It supports search, information extraction, and knowledge graph building.
Importance
- Fundamental for information retrieval and NLP pipelines.
- Errors propagate to downstream applications.
- Domain-specific NER (e.g., medical, legal) requires custom datasets.
- Related to tasks like entity linking and relation extraction.
How It Works
- Collect and preprocess text.
- Annotate datasets with entity categories.
- Train models on labeled examples (CRFs, transformers).
- Predict entities in unseen text.
- Validate accuracy with test data.
Examples (Real World)
- spaCy: open-source NLP library with built-in NER.
- Stanford CoreNLP: provides named entity recognition tools.
- Financial NLP: extracts company names from reports.
References / Further Reading
- Jurafsky & Martin. Speech and Language Processing. Stanford.
- Lample et al. “Neural Architectures for Named Entity Recognition.” ACL.
- Hugging Face Transformers NER Models.