Criss Wang's Log Book

2024-07-28

A typical software testing suite will include:

unit tests which operate on atomic pieces of the codebase and can be run quickly during development,
regression tests replicate bugs that we’ve previously encountered and fixed,
integration tests which are typically longer-running tests that observe higher-level behaviors that leverage multiple components in the codebase,

For machine learning systems, we should be running model evaluation and model tests in parallel.

Model evaluation covers metrics and plots which summarize performance on a validation or test dataset.
Model testing involves explicit checks for behaviors that we expect our model to follow.

How do you write model tests?

Pre-train test
- Early bug discovery + training short-circuiting (saves training cost)
- Things to check:
  - output distribution
  - gradient-related information (training loss curve)
  - data quality
  - label leakage
Post-train test
- post mortem issue discovery and model behavior analysis
  - Things to check:
    - Invariance Test (use a set of perturbations we should be able to make to the input without affecting the model’s output)
    - Directional Expectation Test
    - Data Unit Test (similar to regression test, with failued model scenarios)
Organizing tests
- structuring your tests around the “skills” we expect the model to acquire while learning to perform a given task.
Model Dev Pipeline

{source: https://www.jeremyjordan.me/testing-ml/}

https://criss-wang.github.io/post/blogs/temp/testing/

Author

Zhenlin Wang

Posted on

2024-07-28

Updated on

2024-07-28

Licensed under