What is a Validation Set?
When you train a machine learning model, you need a way to check whether it is actually learning useful patterns or just memorizing the training data. The validation set is your model's practice exam -- a portion of your data that the model never trains on but that you use to evaluate its performance during development. Think of it as a dress rehearsal before the final performance. The model practices on the training data, performs its dress rehearsal on the validation data, and only gets one shot at the real test data at the very end.
Without a validation set, you are flying blind. You have no reliable way to know if your model will work on data it has never seen before, and you risk building something that looks impressive in practice but falls apart in production. This concept is one of the most fundamental ideas in applied machine learning, and getting it wrong can invalidate entire projects.
The Train / Validation / Test Split
Before any training begins, machine learning practitioners divide their dataset into three distinct subsets. The training set is the largest portion, typically 60 to 80 percent of the total data. This is the data the model actually learns from. During training, the model sees these examples repeatedly, adjusting its internal parameters to minimize prediction errors on this subset.
The validation set usually comprises 10 to 20 percent of the data. The model never trains on this data, but you evaluate the model on it frequently during training. After every few training steps or epochs, you check how well the model predicts on the validation set. This gives you an honest signal about whether the model is genuinely learning or just memorizing. If training accuracy keeps climbing but validation accuracy plateaus or drops, that is a clear sign of overfitting.
The test set is the remaining 10 to 20 percent. It is the final, untouched exam. You only use it once, at the very end, to report the model's true performance. It must remain sealed until you are done with all experimentation.
Common Split Ratios
A typical split is 70/15/15 or 80/10/10 (train/validation/test). For very large datasets with millions of examples, the validation and test sets can be as small as 1% each because even 1% of a million is still 10,000 examples -- more than enough for a reliable estimate.
Why Not Just Use the Test Set?
This is one of the most common mistakes beginners make: using the test set to make decisions during training. It seems harmless -- "I'll just peek at the test accuracy to see how I'm doing" -- but this subtle error can invalidate your entire evaluation. Here is why.
Every time you check your model's performance on a dataset and then make a decision based on that result -- whether it is changing a hyperparameter, modifying the architecture, or deciding when to stop training -- you are indirectly fitting your model to that data. The more you use the test set to guide decisions, the more your model becomes tuned to the test set, even though it never explicitly trained on it. Your reported test accuracy becomes an overestimate of real-world performance.
The validation set exists precisely to absorb this "decision contamination." You can look at validation performance as often as you like, tune your model based on it, and try hundreds of different configurations. The validation set will become slightly overfit to your decisions, but that is acceptable because the test set remains clean and unbiased.
Think of it this way: the validation set is your sparring partner in the gym. You learn from every round. The test set is the championship match. You only get one shot, and your sparring sessions should have prepared you well enough that the match is just a formality to confirm your skill.
Hyperparameter Tuning with the Validation Set
The validation set is the cornerstone of hyperparameter tuning, the process of finding the best settings for your model. Hyperparameters are the configuration choices you make before training begins -- things like the learning rate, the number of layers, the batch size, the regularization strength, and when to stop training. Unlike model weights, which are learned during training, hyperparameters are set by you, the practitioner.
Here is how the process works in practice. You train your model with one set of hyperparameters on the training set. You evaluate the result on the validation set. Then you adjust the hyperparameters, train again, and evaluate again. You repeat this cycle, searching for the combination of hyperparameters that gives the best validation performance. This is why the validation set is sometimes called the "dev set" or "development set" -- it is the dataset you use during active development.
Common hyperparameter search strategies include grid search, where you try every combination of values from a predefined list; random search, where you sample hyperparameter values randomly; and Bayesian optimization, where you use the results of previous trials to intelligently choose the next combination to try. In all cases, the validation set is the judge that tells you which combination is best.
Early stopping is another critical use of the validation set. During training, you monitor validation loss after each epoch. When validation loss stops improving for a set number of consecutive epochs (called the "patience"), you stop training and roll back to the model weights from the best epoch. This prevents overfitting and saves compute time. Without a validation set, you would have no reliable signal for when to stop.
Key Takeaway
The validation set is one of the simplest yet most critical concepts in machine learning. It serves as a feedback mechanism that keeps you honest during model development. Without it, you are making blind decisions about hyperparameters, architecture, and training duration. With it, you have a reliable, repeatable signal that guides your development process.
Remember the three roles: the training set teaches, the validation set coaches, and the test set examines. Respecting these boundaries is what separates rigorous machine learning from guesswork. Every major ML framework, from scikit-learn to PyTorch to TensorFlow, has built-in support for train/validation/test splits, making it easy to follow this best practice from day one.
The next time you train a model, resist the urge to peek at the test set. Let the validation set do its job, and save the test set for when it truly matters.
Next: Vision Transformer (ViT) →