Text/word embedding

Official Definition

A numerical representation of text (words, phrases, or documents) in a continuous vector space, where semantically similar text is mapped to nearby points.

Source: AIEOG AI Lexicon (Feb 2026), adapted from NIST AI 100-1 and arXiv:1301.3781

What text/word embedding means in plain language

Text or word embeddings are a way of converting words and text into numbers that computers can process. But unlike simple numbering (assigning each word a random ID), embeddings capture meaning. Words with similar meanings end up with similar numerical representations, and the mathematical relationships between embeddings can capture semantic relationships.

For example, in a well-trained embedding space, the vectors for “bank” and “financial institution” would be close together, while “bank” and “airplane” would be far apart. Embeddings can even capture analogical relationships: the vector relationship between “king” and “queen” is similar to the relationship between “man” and “woman.”

Embeddings are foundational to modern NLP and are used in search, document similarity, classification, clustering, and as the input layer for more complex models including large language models. They are also the core technology behind RAG systems, where documents are embedded and retrieved based on semantic similarity to a query.

Why it matters in financial services

Embeddings power many AI applications in financial services, often invisibly:

Semantic search. Embedding-based search finds relevant documents based on meaning rather than keyword matching, improving regulatory research and internal knowledge retrieval.
Document classification. Compliance documents, customer complaints, and regulatory filings can be classified based on their embedding similarity.
Anomaly detection. Unusual transaction descriptions or communications can be identified by measuring their distance from normal patterns in embedding space.
RAG systems. Retrieval augmented generation relies on embeddings to match queries to relevant source documents.

Governance considerations include bias in embeddings (embeddings can encode stereotypes and biases from their training data), stability (embedding representations can change when models are updated), and interpretability (the meaning of individual embedding dimensions is typically not human-interpretable).

Key considerations for compliance teams

Assess for bias. Evaluate whether embeddings encode protected characteristics or stereotypes that could affect downstream decisions.
Track embedding model versions. When embedding models are updated, downstream applications may behave differently. Manage these changes.
Validate for your domain. General-purpose embeddings may not accurately represent financial and regulatory terminology. Test domain-specific performance.
Document embedding choices. Record which embedding model is used, why it was selected, and what limitations are known.
Monitor for drift. If the embedding model or the data being embedded changes, downstream performance may be affected.
Include in AI governance. Embedding models are a component of the AI system and should be documented in the AI inventory.

Stay current on AI risk in financial services

Get practical guidance on AI governance, model risk, and regulatory developments delivered to your inbox. Stay up to date on the latest in financial compliance from our experts.