Definition
Text-to-image is a generative AI task where models create visual images based on natural language prompts.
Purpose
The purpose is to enable creative design, art generation, and visualization from text.
Importance
- Expands human creativity and productivity.
- Raises copyright and misinformation concerns.
- Requires safeguards for harmful prompts.
- Related to diffusion models and GANs.
How It Works
- Train model on paired text-image datasets.
- Encode text into embeddings.
- Map text embeddings to image representations.
- Generate images using diffusion or GAN techniques.
- Refine with user prompts or constraints.
Examples (Real World)
- DALL·E (OpenAI): generates creative images from text.
- Stable Diffusion: open-source image generation model.
- MidJourney: AI-powered art generation.
References / Further Reading
- Ramesh et al. “Zero-Shot Text-to-Image Generation.” OpenAI.
- Stable Diffusion Model Card — Stability AI.
- IEEE Computer Graphics and Applications: Generative AI in Imaging.