Lesson 4 — How AI Models are Trained
The performance of any AI model is directly related to the quality and quantity of its training data.
- Text-Based LLMs: To be effective, LLMs require a huge amount of training data, often scraped from large portions of the internet. This training process consumes a massive amount of energy and requires powerful Graphics Processing Units (GPUs), which are far better at the parallel processing required for these tasks than standard Central Processing Units (CPUs).
- Image-Based Models: These models are “trained on text-image pairs that are manually tagged.” This means humans have manually described millions of images with text labels, allowing the model to learn associations between words (like “a dog on a skateboard”) and visual concepts.