Lesson 4 — How AI Models are Trained

The performance of any AI model is directly related to the quality and quantity of its training data.

  1. Text-Based LLMs: To be effective, LLMs require a huge amount of training data, often scraped from large portions of the internet. This training process consumes a massive amount of energy and requires powerful Graphics Processing Units (GPUs), which are far better at the parallel processing required for these tasks than standard Central Processing Units (CPUs).
  2. Image-Based Models: These models are “trained on text-image pairs that are manually tagged.” This means humans have manually described millions of images with text labels, allowing the model to learn associations between words (like “a dog on a skateboard”) and visual concepts.