What is training data? (Explained for kids and parents)
Updated May 8, 2026 · 360 words
Training data is the collection of examples you show a computer to teach it something. To teach a computer what a cat looks like, training data is photos of cats — usually thousands of them, each labeled "cat." The bigger and cleaner the training data, the smarter the AI gets.
How to explain it to a 7-year-old
🧒 "It's the homework you give the computer. You show it lots of examples — like 'this is a cat, this is a dog, this is a fish' — until it learns the difference."
How to explain it to a 14-year-old
🎒 "Training data is the dataset used to teach a model. Each example has an input (a photo) and a label (what's in it). The model guesses, gets corrected, adjusts. Quality and diversity of training data is the single biggest factor in how good the model becomes."
A real-world example
When you tag your friend's face in a photo on Google Photos, you're contributing to training data. That tag tells the system "this face = this name," and it learns to recognize that person in other photos.
Why training data quality matters
- 🟢 Good training data is diverse, balanced, and labeled correctly
- 🔴 Bad training data is biased, missing examples, or has wrong labels — and produces AI that's biased or wrong
- ⚠️ Famous example: early face-recognition tools were trained mostly on white faces and worked poorly for people with darker skin. The fix was better training data.
Where this comes up in Chippu
Band A explores it through play (a2-1, "How Does AI Learn"). Band C goes into how to spot bias in training data (c3-1).
Related terms
- Machine learning — the process that uses training data
- Neural network — what training data trains
- Bias — what happens when training data isn't diverse enough