c-TF-IDF

Class-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of important terms per cluster.

c-TF-IDF stands for Class-based Term Frequency-Inverse Document Frequency. It is a key component of the BERTopic algorithm.
This weighting scheme is used to refine the clusters created by HDBSCAN. Unlike models like Top2Vec, which rely mainly on semantic embeddings, BERTopic uses c-TF-IDF to highlight the most representative words for each topic. This results in clear topic words and yielding easily interpretable topics, giving BERTopic richer modeling capabilities.

Explore other ML Models & Algorithms terms