Embedding Model

An embedding model maps inputs (text, images, or other modalities) to dense numerical vectors (embeddings) in a shared vector space, where semantic similarity between inputs corresponds to proximity between their vectors.

Details

Embedding models are distinct from generative LLMs: rather than producing text token by token, they produce a fixed-dimensional embedding that captures the meaning of the input. They are typically encoder-based models (often derived from the transformer architecture) trained or fine-tuned with contrastive or similarity objectives so that related inputs cluster together in the vector space.

Embedding models are foundational to RAG pipelines, where they power the retrieval step: documents are split by a chunking strategy, each chunk is converted to an embedding stored in a vector database, and queries are embedded at inference time to find the most similar chunks via nearest-neighbor search.

Examples

  • A text embedding model that encodes documents and queries into 1024-dimensional vectors for similarity search in a RAG pipeline.
  • A multimodal embedding model that maps both images and text descriptions into a shared vector space for cross-modal retrieval.
  • An embedding model fine-tuned on domain-specific data (e.g., legal or medical text) to improve retrieval quality in a specialized application.

Synonyms

text embedding model, vector embedding model