Definition
Pre-training is the initial training of a machine learning model on large general-purpose datasets before fine-tuning on specific tasks.
Purpose
The purpose is to provide models with broad representations that transfer to multiple tasks, reducing data and compute requirements for downstream adaptation.
Importance
- Foundation for modern LLMs and vision models.
- Improves performance across diverse tasks.
- Costly in terms of data and computation.
- Requires careful dataset curation to avoid bias.
How It Works
- Collect massive general datasets (text, images).
- Define unsupervised or self-supervised learning tasks.
- Train models to learn general features.
- Save pre-trained weights for reuse.
- Fine-tune on smaller task-specific datasets.
Examples (Real World)
- BERT pre-trained on Wikipedia and BooksCorpus.
- CLIP trained on image–text pairs.
- GPT models pre-trained on large-scale internet text.
References / Further Reading
- Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers.” NAACL 2019.
- Radford et al. “Language Models are Few-Shot Learners.” NeurIPS 2020.
- OpenAI GPT-4 Technical Report.