Embedding Model
An embedding model maps inputs (text, images, or other modalities) to dense numerical vectors (embeddings) in a shared vector space, where semantic similarity between inputs corresponds to proximity between their vectors.
Details
Embedding models are distinct from generative LLMs: rather than producing text token by token, they produce a fixed-dimensional embedding that captures the meaning of the input. They are typically encoder-based models (often derived from the transformer architecture) trained or fine-tuned with contrastive or similarity objectives so that related inputs cluster together in the vector space.
Embedding models are foundational to RAG pipelines, where they power the retrieval step: documents are split by a chunking strategy, each chunk is converted to an embedding stored in a vector database, and queries are embedded at inference time to find the most similar chunks via nearest-neighbor search.
Examples
- A text embedding model that encodes documents and queries into 1024-dimensional vectors for similarity search in a RAG pipeline.
- A multimodal embedding model that maps both images and text descriptions into a shared vector space for cross-modal retrieval.
- An embedding model fine-tuned on domain-specific data (e.g., legal or medical text) to improve retrieval quality in a specialized application.
Synonyms
text embedding model, vector embedding model