Class-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of important terms per cluster.
c-TF-IDF stands for Class-based Term Frequency-Inverse Document Frequency. It is a key component of the BERTopic algorithm.
This weighting scheme is used to refine the clusters created by HDBSCAN. Unlike models like Top2Vec, which rely mainly on semantic embeddings, BERTopic uses c-TF-IDF to highlight the most representative words for each topic. This results in clear topic words and yielding easily interpretable topics, giving BERTopic richer modeling capabilities.
Sources & References
Explore other ML Models & Algorithms terms
B
BERT (Bidirectional Encoder Representations from Transformers)
The foundational language model used for transformer-based embeddings in BERTopic.
B
BERTopic
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic…
B
BERTopic
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic…
B
BIRCH (Balanced Iterative Hierarchical Based Clustering)
A hierarchical clustering method efficient for large datasets and time series.
B
Boyer-Moore
An exact string-matching algorithm and one of the best-known pattern recognition algorithms.
D
DBSCAN
Density-Based Spatial Clustering of Applications with Noise; groups data points based on density. Useful for…
D
Decision Tree
An early, simple model for classification or regression.
D
Distance-based matching
Fuzzy matching methods focusing on "edit distance" rather than exact spelling.
D
DistilBERT (Refined Query Semantic Class Classifier)
A fine-tuned BERT model used for semantic class classification based on queries.
E
Encoder Model
A machine learning model used in Google's two-step process for building and maintaining the Knowledge…
F
Fuzzy Matching / Fuzzy String Matching
A string similarity assessment approach, typically relying on character distance rather than semantics, used to…
G
Gaussian Mixture Models (GMM)
A distribution-based model that summarizes a multivariate probability density function with a mixture of Gaussian…
