Customer Segmentation with Machine Learning Using Three Clustering Approaches
Segmenting customers into actionable groups is fundamental for personalized marketing and retention strategies—this Google Colab notebook provides three progressive clustering methodologies (K-Means with RFM analysis, hierarchical clustering with dendrograms, DBSCAN with multi-feature detection) to automatically identify customer segments based on behavioral patterns. Created for the MLforSEO Introduction to Clustering module, this implementation enables marketing teams to transform transactional data into strategic customer segments, with flexible data loading from UCI datasets, Google Analytics 4 exports, or custom CSV files containing purchase histories.
Section 1: K-Means Clustering with RFM Analysis (Beginner) The foundational approach calculates RFM metrics (Recency in days since last purchase, Frequency as unique order count, Monetary as total revenue) for each customer, applies StandardScaler normalization to equalize feature importance, and uses K-Means clustering to identify segments.
The implementation includes elbow method optimization testing K values 2-10 with inertia and silhouette score evaluation, outlier capping at 99th percentile to prevent skewing, automated segment naming logic that assigns intuitive labels (Champions, Loyal Customers, Big Spenders, Promising, At Risk, Need Attention, Lost/Hibernating) based on RFM thresholds, and comprehensive visualizations showing Recency vs Monetary, Frequency vs Monetary scatter plots plus cluster size and revenue contribution distributions.
Section 2: Hierarchical Clustering (Intermediate) The intermediate approach builds customer relationship hierarchies using Ward linkage method on scaled RFM features, visualizes the clustering structure through truncated dendrograms showing the last 30 merges with suggested cut heights, and applies AgglomerativeClustering to assign final segment labels.
This method enables exploration of nested segment relationships, provides interpretable dendrogram visualizations where vertical height indicates cluster distinctiveness, includes side-by-side comparison with K-Means results showing silhouette scores and visual clustering patterns, and supports computational efficiency through optional sampling for datasets exceeding 5,000 customers.
Section 3: DBSCAN with Multi-Feature Analysis (Advanced) The advanced approach engineers 8 additional behavioral features beyond RFM including NumOrders, ProductVariety (unique products purchased), AvgItemsPerOrder, TotalItems, AvgOrderValue, OrderValueStd (spending consistency), CustomerLifetime (days between first and last purchase), and OrderFrequency (orders per day active).
DBSCAN clustering with configurable eps and min_samples parameters identifies dense customer regions while flagging outliers (labeled as -1), enables discovery of irregular-shaped segments that K-Means cannot detect, includes dedicated outlier analysis showing characteristics of anomalous high-value customers or unusual purchasing patterns, and provides parameter tuning warnings when outlier percentages exceed 50%.
Use this for:
- Marketing campaign targeting by assigning personalized messaging and offers to Champions versus At Risk segments based on RFM profiles
- Customer retention strategy by identifying Lost/Hibernating segments requiring win-back campaigns versus Loyal Customers needing nurture
- Revenue optimization by calculating segment-level contribution percentages to prioritize high-value customer groups for VIP programs
- Churn prediction by monitoring customer migration patterns from active segments into At Risk or Lost categories over time
- Budget allocation by understanding segment sizes and lifetime values to distribute marketing spend proportionally across customer tiers
This is perfect for marketing analysts, CRM managers, and e-commerce strategists working with transactional customer data (1,000+ customers recommended)—particularly valuable when personalizing email campaigns, designing loyalty programs with tier-based rewards, optimizing customer acquisition costs by modeling target segment profiles, or building predictive models that forecast segment transitions and lifetime value trajectories for strategic planning.
What’s Included
- Three progressive methodologies from beginner-friendly K-Means (RFM only) through hierarchical relationship mapping to advanced DBSCAN with 8+ engineered features and outlier detection
- Flexible data loading supporting UCI Online Retail dataset (541K transactions), Google Analytics 4 exports, Screaming Frog data, WordPress media libraries, or custom CSV uploads
- Comprehensive optimization toolkit including elbow method for K selection, silhouette score evaluation, dendrogram visualization for hierarchical cut-point determination, and DBSCAN parameter tuning
- Export-ready outputs generating customer_segments.csv with individual assignments plus segment_summary.csv showing aggregate statistics, revenue contributions, and customer distribution percentages for stakeholder reporting
Created by
Introduction to Machine Learning for SEOs
This resource is part of a comprehensive course. Access the full curriculum and learning path.
View Full CourseAvailable in Academy
This resource is available to academy members.
Access in Academy