Blogs · Draft Notes · MLOps · Feature Engineering

Some Tricks in Real-World Machine Learning Engineering

Practical notes on moving from notebooks to pipelines, handling missing values, scaling features, encoding categories, and keeping ML code production-friendly.

2024.03.02 · 3 min read · by Zhenlin Wang

Introduction

Real-world machine learning engineering is often less about clever algorithms and more about small habits that prevent messy systems.

This post collects practical tricks I keep reaching for:

Convert Notebooks Into Scripts Early

Notebooks are excellent for exploration. They are poor as the long-term source of truth for a training pipeline.

A useful pattern is:

  1. Explore in a notebook.
  2. Move reusable logic into Python modules.
  3. Keep the notebook as a report or scratchpad.
  4. Run training through a script or CLI.

For a quick conversion:

jupyter nbconvert --to script train_model.ipynb

Then clean the generated script into functions:

def load_data(config):
    ...


def build_features(data, config):
    ...


def train_model(features, labels, config):
    ...


def main(config):
    data = load_data(config)
    features, labels = build_features(data, config)
    model = train_model(features, labels, config)
    return model

The goal is not to ban notebooks. The goal is to keep production behavior in code that can be tested, reviewed, and rerun.

Handle Missing Values by Cause

Missing values are not all the same.

Ask why the value is missing:

Different causes need different treatment:

The last case is important. Not every missing value should be “handled”; some should page the owner.

Scale Features Consistently

Scaling should be fit on training data and reused everywhere else.

Bad pattern:

train["x"] = (train["x"] - train["x"].mean()) / train["x"].std()
test["x"] = (test["x"] - test["x"].mean()) / test["x"].std()

Better pattern:

from sklearn.preprocessing import StandardScaler


scaler = StandardScaler()
train_x = scaler.fit_transform(train[feature_cols])
valid_x = scaler.transform(valid[feature_cols])
test_x = scaler.transform(test[feature_cols])

The same fitted scaler must be available in inference. Otherwise training and serving will disagree.

Encode Categories With Future Values in Mind

Categorical features cause production issues because new categories appear after deployment.

Options:

Hashing is a strong industrial trick:

from sklearn.feature_extraction import FeatureHasher


hasher = FeatureHasher(n_features=2**18, input_type="string")
features = hasher.transform(user_category_lists)

Collisions happen, but a fixed-size hashed representation avoids the “unknown category broke inference” problem.

Keep Feature Pipelines Versioned

Feature logic should be treated like model code.

Track:

When a model changes behavior, feature drift is often the culprit. Versioning makes the investigation possible.

Do Not Hide Data Leakage

Leakage often looks like great performance.

Watch for:

Use time-based splits when the production task is time-based. Random splits can be misleading when future information leaks into training.

Closing

Small ML engineering habits compound. Move code out of notebooks, version preprocessing, treat missing values by cause, make feature transforms reusable, and assume production data will surprise you.

These tricks are not glamorous, but they are the difference between a model that works once and a system that keeps working.