A typical software testing suite will include:
- unit tests which operate on atomic pieces of the codebase and can be run quickly during development,
- regression tests replicate bugs that we’ve previously encountered and fixed,
- integration tests which are typically longer-running tests that observe higher-level behaviors that leverage multiple components in the codebase,
For machine learning systems, we should be running model evaluation and model tests in parallel.
- Model evaluation covers metrics and plots which summarize performance on a validation or test dataset.
- Model testing involves explicit checks for behaviors that we expect our model to follow.
How do you write model tests?
Pre-train test
- Early bug discovery + training short-circuiting (saves training cost)
- Things to check:
- output distribution
- gradient-related information (training loss curve)
- data quality
- label leakage
Post-train test
- post mortem issue discovery and model behavior analysis
- Things to check:
- Invariance Test (use a set of perturbations we should be able to make to the input without affecting the model’s output)
- Directional Expectation Test
- Data Unit Test (similar to regression test, with failued model scenarios)
- Things to check:
- post mortem issue discovery and model behavior analysis
Organizing tests
- structuring your tests around the “skills” we expect the model to acquire while learning to perform a given task.
Model Dev Pipeline
{source: https://www.jeremyjordan.me/testing-ml/}