Blogs · Machine Learning · Model Selection

Feature Selection and Model Selection

A practical guide to selecting features, choosing models, avoiding leakage, comparing validation results, and balancing accuracy with complexity.

2019.06.04 · 2 min read · by Zhenlin Wang

Introduction

Feature selection and model selection are both attempts to answer the same question:

Which representation and model should we trust for this task?

Feature selection chooses the inputs. Model selection chooses the learning algorithm and configuration. Both affect accuracy, interpretability, cost, and reliability.

Feature Selection Goals

Feature selection can help:

But removing features can also remove signal. Always compare against a baseline with clear validation.

Feature Selection Methods

Filter Methods

Filter methods score features without training the final model.

Examples:

They are fast and useful for cleanup, but they may miss interactions between features.

Wrapper Methods

Wrapper methods evaluate feature subsets by training models.

Examples:

They can work well but are computationally expensive and can overfit validation data if repeated too many times.

Embedded Methods

Embedded methods select features as part of model training.

Examples:

They are practical, but importance scores can be biased. Correlated features and high-cardinality categorical variables need extra care.

Avoid Leakage

Feature selection must happen inside the training pipeline, not before the train/test split.

Bad pattern:

Use all data to select features, then split into train and test.

Better pattern:

Split data, fit feature selection on training folds, evaluate on held-out folds.

Leakage often creates beautiful validation numbers and painful production failures.

Model Selection

Start with simple baselines:

Then compare more complex models:

Choose based on the full system, not only validation score.

Comparison Criteria

Compare models by:

A model with slightly lower accuracy may be the better engineering choice if it is faster, easier to explain, and more reliable.

Validation Strategy

Use validation that matches the real task:

The validation method is part of the model selection decision.

Closing

Feature selection and model selection are not one-time rituals. They are controlled experiments. Start simple, avoid leakage, compare fairly, and choose the model that best satisfies the product and engineering constraints.