Coding Script or Notebook
Google Colab
Academy (Access via Course)

Image Clustering by Color with Three Machine Learning Approaches for Visual Content Organization

Organizing large image collections by visual similarity is essential for e-commerce product categorization, content management, and visual search optimization—this Google Colab notebook demonstrates three progressive clustering techniques (color histograms, dominant color extraction, pre-trained CNN features) to automatically group images based on color characteristics.

Created for the Introduction to Clustering module, this implementation enables SEO professionals and content managers to segment visual assets at scale, with each approach offering different trade-offs between simplicity, interpretability, and accuracy for datasets ranging from small collections to 1000+ images.

Section 1: Simple Color Histogram Clustering (Beginner) The basic approach extracts color distribution histograms from standardized 128×128 pixel images using 8 bins per RGB channel, creating 512-dimensional feature vectors that K-Means clustering groups into visually similar categories.

Section 2: Dominant Color Extraction (Intermediate) The intermediate approach identifies the 3-5 most prominent colors in each image using K-Means on pixel values, creating feature vectors combining RGB values and color proportions for more intuitive, human-interpretable clustering. This method includes visual palette generation showing dominant color distributions, StandardScaler normalization for improved clustering performance, and average color palette visualization per cluster.

Section 3: Pre-trained CNN Features (Advanced) The advanced approach leverages ResNet50 deep learning model (pre-trained on ImageNet) to extract 2048-dimensional semantic feature vectors, then applies PCA dimensionality reduction to 50 components (retaining 75.34% variance) before K-Means clustering. This method processes images in batches of 32, supports GPU acceleration for large datasets, and captures high-level visual patterns beyond just color—including textures, shapes, and object categories.

Use this for:

  • Product image organization by grouping e-commerce visuals into color-based categories for improved site navigation and filtering
  • Fashion catalog segmentation to automatically create collections based on seasonal color palettes or style trends
  • Visual content audit to identify underrepresented color schemes in existing image libraries for better diversity
  • Automated tagging pipelines that assign color-based metadata to images for enhanced searchability
  • A/B testing image variants by clustering similar visuals to understand which color combinations perform best

This is perfect for SEO specialists managing visual content at scale, e-commerce managers organizing product catalogs (500-5000+ items), and content teams building image recommendation systems—particularly valuable when manual categorization becomes impractical, when implementing color-based filtering features for improved user experience, or when analyzing competitor visual strategies to identify color positioning opportunities in your market segment.

What’s Included

  • Three progressive approaches from beginner-friendly histograms through intermediate dominant color extraction to advanced deep learning features with ResNet50
  • Multiple dataset loading options supporting WordPress media libraries, Screaming Frog exports, Kaggle fashion datasets, or custom CSV files with image URLs
  • Comprehensive evaluation metrics including silhouette scores, elbow method optimization, and cluster distribution analysis for determining optimal grouping strategies
  • Visual debugging tools displaying sample images per cluster with average color palettes, making results immediately interpretable for non-technical stakeholders
Academy Resource

Available in Academy

Introduction to Machine Learning for SEOs

This resource is available to academy members.

Access in Academy
Community support
Regular updates