Centroid-based Clustering - MLforSEO

New course by Beatrice Gamba: AI Search & LLMs: Entity SEO and Knowledge Graph Strategies for Brands now live -> Start learning today ✨

Centroid-based Clustering

Organizes data into non-hierarchical clusters based on the arithmetic mean (centroid) of the points. Efficient but sensitive to initial conditions and outliers.

Centroid-based clustering is a method that organizes data into non-hierarchical clusters. The process relies on identifying the centroid of a cluster, which is the arithmetic mean of all data points contained within that cluster.
Algorithms like K-Means fall into this category. While centroid-based algorithms are considered efficient, they are generally sensitive to initial conditions and the presence of outliers in the data. This approach is utilized in marketing for customer segmentation based on numeric data like purchase frequency and for clustering product images.

Sources & References

Introduction to Machine Learning for SEOs Course

academy.mlforseo.com

Explore other Task Types terms

Binary Classification

Classification task with two possible outcomes (e.g., positive or negative sentiment).

Clustering (ML Task)

Grouping data points into clusters based on similarity; an unsupervised learning task.

Density-based Clustering

Groups data points based on density and proximity. Does not require pre-defining the number of…

Distribution-based Clustering

Assumes data is composed of probabilistic distributions (e.g., Gaussian Mixture Model).

Hard Clustering

A type of clustering where data points are assigned exclusively to a single cluster.

Hierarchical Clustering

A clustering approach where data points are recursively merged or split to create a tree-like…

Multi-Class Classification

Classification where data is assigned exclusively to one of three or more options (e.g., categorizing…

Multi-Label Classification

Classification where an input can belong to multiple categories simultaneously (e.g., tagging a blog post…

Soft/Fuzzy Clustering

A type of clustering where data points can belong to multiple topics/clusters with varying probabilities…