Your cart is currently empty!
HDBSCAN is a data-driven, density-based clustering algorithm that identifies dense regions or topics in embedding space. It is categorized as a density-based clustering method. In the architectural stack of BERTopic, HDBSCAN is used after dimensionality reduction via UMAP to identify dense clusters, where each dense cluster corresponds to a potential topic.
A significant application of HDBSCAN is in outlier detection. Algorithms like HDBSCAN are used to find anomalies or data points that do not belong to any cluster. This capability allows for the detection of unusual traffic spikes, spammy backlinks, or irregular customer behavior.
Beyond outlier detection, HDBSCAN is also used for grouping data points based on proximity in low-dimensional space. It can be employed in projects such as grouping competitors based on shared web performance metrics (like domain authority or organic traffic) or for classifying backlink quality.
