Blogs · Unsupervised Learning · Clustering

Clustering: DBSCAN

A practical guide to DBSCAN, density-based clustering, epsilon, minimum samples, noise points, and when density clustering is useful.

2020.02.06 · 1 min read · by Zhenlin Wang

Introduction

DBSCAN is a density-based clustering algorithm. It groups points that are packed closely together and marks isolated points as noise.

Unlike K-means, DBSCAN does not require choosing the number of clusters ahead of time.

Core Ideas

DBSCAN uses two main parameters:

Point types:

When DBSCAN Helps

Use DBSCAN when:

Examples:

Parameter Choice

The hardest part is choosing eps.

Common approach:

  1. Compute distance to the k-th nearest neighbor.
  2. Sort those distances.
  3. Look for an elbow in the curve.

Feature scaling matters. If one feature dominates distance, DBSCAN will cluster mostly by that feature.

Limitations

DBSCAN struggles when:

In high dimensions, consider dimensionality reduction or another method.

Closing

DBSCAN is valuable because it finds dense regions and labels noise. It is a strong choice for spatial or density-shaped clustering problems, but it depends heavily on meaningful distances and good parameter choices.