A topic modeling algorithm often compared to LDA and BERTopic, requiring text pre-processing and defining the number of topics beforehand.
NMF (Non-negative Matrix Factorization) is a topic modeling algorithm often compared to LDA and BERTopic. It is a classic clustering algorithm used for text clustering.
Similar to LDA, NMF requires that the number of topics must be specified by the user as a hyperparameter. It is suited for analyzing large or more structured text and can offer strong quantitative performance, although it often requires advanced tuning compared to newer embedding-based models.
Sources & References
Explore other ML Models & Algorithms terms
B
BERT (Bidirectional Encoder Representations from Transformers)
The foundational language model used for transformer-based embeddings in BERTopic.
B
BERTopic
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic…
B
BERTopic
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic…
B
BIRCH (Balanced Iterative Hierarchical Based Clustering)
A hierarchical clustering method efficient for large datasets and time series.
B
Boyer-Moore
An exact string-matching algorithm and one of the best-known pattern recognition algorithms.
C
c-TF-IDF
Class-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of…
D
DBSCAN
Density-Based Spatial Clustering of Applications with Noise; groups data points based on density. Useful for…
D
Decision Tree
An early, simple model for classification or regression.
D
Distance-based matching
Fuzzy matching methods focusing on "edit distance" rather than exact spelling.
D
DistilBERT (Refined Query Semantic Class Classifier)
A fine-tuned BERT model used for semantic class classification based on queries.
E
Encoder Model
A machine learning model used in Google's two-step process for building and maintaining the Knowledge…
F
Fuzzy Matching / Fuzzy String Matching
A string similarity assessment approach, typically relying on character distance rather than semantics, used to…
