Blogs · Machine Learning · Evaluation

Model Validations and Performance Evaluators

A practical guide to validation splits, cross-validation, classification and regression metrics, calibration, slice evaluation, and model comparison.

2019.05.29 · 2 min read · by Zhenlin Wang

Introduction

Model validation estimates how well a model will perform on data it has not seen. It is one of the most important parts of machine learning because training metrics mostly tell you how well the model fit the training data.

Good validation answers:

Splitting Data

The split should match production reality.

Random Split

Use random splits when examples are independent and identically distributed.

Time-Based Split

Use time-based splits when the model predicts future behavior from past data.

Group Split

Use group splits when records from the same user, account, patient, or entity could leak across train and test.

Cross-Validation

Use cross-validation when data is small and you need a more stable estimate. Be careful with time-dependent data; ordinary k-fold cross-validation can leak future information.

Classification Metrics

Common metrics:

Choose based on error cost. A fraud model, medical model, and spam filter may all need different precision-recall tradeoffs.

Regression Metrics

Common metrics:

Plot residuals. A single aggregate metric can hide systematic underprediction or overprediction.

Calibration

Calibration asks whether predicted probabilities match observed frequencies.

If a model predicts 0.8 probability for many cases, roughly 80 percent of those cases should be positive.

Calibration matters when probabilities drive decisions:

Use reliability diagrams, Brier score, and calibration by slice.

Slice Evaluation

Aggregate performance is not enough.

Evaluate by:

A model can improve overall while hurting the group that matters most.

Model Comparison

Compare models under the same conditions:

For noisy results, use repeated runs or confidence intervals. For product launches, combine offline validation with online testing when possible.

Validation Checklist

Before trusting a model:

Validation is not a formality. It is the evidence behind model trust.