Blogs · Draft Notes · MLOps · Deployment

More on Model Deployment

A practical overview of model deployment patterns, artifact promotion, online and batch serving, rollout strategies, rollback, and production monitoring.

2021.11.01 · 2 min read · by Zhenlin Wang

Introduction

Model deployment is the step where a trained model becomes part of a product or workflow. It is not only “put the model behind an API.” A deployed model needs an artifact, runtime, interface, rollout plan, rollback path, logs, and monitoring.

This post is a compact deployment checklist.

What Gets Deployed

Deploy more than weights.

A deployable model package should include:

If any of these are missing, the deployment is fragile.

Serving Patterns

Online Serving

Online serving is request-response inference. Use it when users or downstream services need predictions immediately.

Examples:

Watch latency, availability, error rate, cost per request, and fallback behavior.

Batch Serving

Batch serving runs predictions over many records on a schedule.

Examples:

Batch serving needs monitoring too. A failed batch job can silently poison downstream dashboards or products.

Streaming Serving

Streaming serving processes events as they arrive. It is useful when the system reacts to event streams but does not need a direct user response for each event.

Examples:

Rollout Strategies

Choose rollout strategy based on risk.

Shadow and canary deployments are especially useful when offline metrics do not fully predict production behavior.

Rollback

Rollback should be designed before launch.

You need:

Rollback fails when the new model changes schemas or downstream assumptions without compatibility planning.

Monitoring

Monitor both service health and model health.

Service metrics:

Model metrics:

Monitoring should include model version. Otherwise you cannot connect a production issue to the artifact that caused it.

Security and Privacy

Deployment often exposes data paths that training did not.

Check:

For LLM systems, also consider prompt injection, data exfiltration, tool-call permissions, and retrieval source trust.

Closing

Good deployment makes model behavior observable and reversible. A model that cannot be rolled back, monitored, or explained operationally is not production-ready, even if its validation score is strong.

For a broader post-training view, see MLOps Post-Training Considerations.