What is an embedding in AI? (Explained for kids)

Updated May 8, 2026 · 270 words

An embedding is a way of turning words, images, or other data into a list of numbers that an AI can compare. Words with similar meanings get similar numbers. "King" and "queen" end up close together in number-space. "Pizza" and "guitar" end up far apart.

How to explain it to a 7-year-old

🧒 "AI doesn''t understand words — it understands numbers. So we turn each word into a list of numbers. Words that mean similar things get similar number lists. ''Happy'' and ''joyful'' get close numbers; ''happy'' and ''refrigerator'' don''t."

How to explain it to a 14-year-old

🎒 "An embedding is a vector — typically 100 to 4,000 numbers — that represents an item in a way captures meaning. Embeddings are how AI ''compares'' things: take the dot product of two vectors and you get a similarity score."

A famous example

The vector for "king" minus the vector for "man" plus the vector for "woman" approximately equals the vector for "queen." That''s embeddings capturing real semantic relationships, not just word matching.

Real-world uses

🔎 Semantic search ("find me articles like this one")
🎵 Music recommendations
🏷️ Photo tagging by content
🤖 Powering RAG (retrieval-augmented generation) systems

Where this comes up in Chippu

Band D (d2-2) gets technical with embeddings; younger bands skip them.

Related terms

Token — what gets embedded in language models
Large language model
AI model

Frequently asked questions

What is the difference between an embedding and a token?

A token is a chunk of text (a word or piece of word). An embedding is the list of numbers that token gets converted to before the AI processes it. Token is the input format; embedding is the internal representation.

How does ChatGPT use embeddings?

Every token in your message gets converted to an embedding (a vector of ~12k numbers in GPT-4). The neural network operates on those vectors to predict the next token. Embeddings are the language of the model's internal world.