Accuracy Score
Evaluation & MetricsA metric used in systems that predict query responses; the decision to display a short answer is contingent on comparing this score against a predetermined threshold.
Active Learning
Learning ParadigmsA technique used during custom training to iteratively select the most informative instances for labeling, thus reducing the overall labeling effort for entity extraction.
AI Overview
Core Concepts (AI/ML)AI-generated summaries of highly informational, low-intent queries, offering quick answers to users, or generally, a SERP feature.
Amazon Comprehend
APIsAn NLP API mentioned for various text analysis tasks including entity extraction, sentiment analysis, and keyword/key phrase extraction.
Apps Script
APIsAn integration option that allows for the incorporation of automation with Google Cloud's APIs, useful for regular monitoring of the Knowledge Graph.
Artificial Intelligence (AI)
Core Concepts (AI/ML)The overarching concept related to the design and study of intelligent systems. Early systems relied on symbolic logic and rule-based systems.
Augmented Search Queries
Core Concepts (AI/ML)Queries that expand or modify the original user query to improve search accuracy and relevance by including additional terms or entities related to the initial query.
Bag of Words
Core Concepts (AI/ML)A type of semantic representation of data, which can be extracted from page contents.
BERT (Bidirectional Encoder Representations from Transformers)
ML Models & AlgorithmsThe foundational language model used for transformer-based embeddings in BERTopic.
BERTopic
ML Models & AlgorithmsAn unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic clustering, suitable for large unstructured datasets. An embedding-based topic modeling algorithm that uses BERT embeddings, UMAP for dimensionality reduction, and HDBSCAN for clustering. Excels at semantic coherence, minimal preprocessing, and automatically detecting the number of topics. Effective for short text.
BERTopic
ML Models & AlgorithmsAn unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic clustering, suitable for large unstructured datasets. An embedding-based topic modeling algorithm that uses BERT embeddings, UMAP for dimensionality reduction, and HDBSCAN for clustering. Excels at semantic coherence, minimal preprocessing, and automatically detecting the number of topics. Effective for short text.
Bigram
Core Concepts (AI/ML)A sequence of two adjacent words.
Binary Classification
Task TypesClassification task with two possible outcomes (e.g., positive or negative sentiment).
BIRCH (Balanced Iterative Hierarchical Based Clustering)
ML Models & AlgorithmsA hierarchical clustering method efficient for large datasets and time series.
Bounce Rate
Evaluation & MetricsA GA4 user engagement metric used to monitor patterns in user interaction.
Boyer-Moore
ML Models & AlgorithmsAn exact string-matching algorithm and one of the best-known pattern recognition algorithms.
c-TF-IDF
ML Models & AlgorithmsClass-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of important terms per cluster.
Centroid-based Clustering
Task TypesOrganizes data into non-hierarchical clusters based on the arithmetic mean (centroid) of the points. Efficient but sensitive to initial conditions and outliers.
Generative AI models/APIs used for tasks like content transformation and comparison in entity extraction, but noted for potential unreliability in NLP tasks.
Clustering (ML Task)
Task TypesGrouping data points into clusters based on similarity; an unsupervised learning task.
Coherence Score
Evaluation & MetricsUsed for evaluating the quality of topics produced by algorithms like LDA; high coherence suggests good topic quality.
Confidence Score (Generative AI)
Evaluation & MetricsA measure of certainty provided by a generative AI model regarding its classification.
Content Moderation
NLP (Concepts & Pipeline)Automatically flags or categorizes potentially unsafe or sensitive text (e.g., explicit or hateful content), ensuring brand standards are met.
CPC (Cost Per Click)
Evaluation & MetricsA metric used in keyword analysis and visualizations.
CTR (Click-Through Rate)
Evaluation & MetricsA metric related to user interaction with search results.
DataForSEO API
APIsA set of APIs for keyword research and SERP analysis, including SERP API, Keywords Data API, Traffic Analytics API, Review API, Merchant API, and Labs API.
DBSCAN
ML Models & AlgorithmsDensity-Based Spatial Clustering of Applications with Noise; groups data points based on density. Useful for anomaly detection.
Decision Tree
ML Models & AlgorithmsAn early, simple model for classification or regression.
Deep Learning
Core Concepts (AI/ML)A part of machine learning; Generative AI models like ChatGPT and LLM-based chatbots fall within this category.
Deepseek R1
APIsA newer generative AI chatbot used in entity extraction comparisons.
Density-based Clustering
Task TypesGroups data points based on density and proximity. Does not require pre-defining the number of clusters and is good for finding arbitrarily shaped clusters and outliers.
Dimensionality Reduction
Core Concepts (AI/ML)A process that reduces data, such as high-dimensional vectors, for visualization while preserving semantic structure (e.g., using PCA).
Distance-based matching
ML Models & AlgorithmsFuzzy matching methods focusing on "edit distance" rather than exact spelling.
DistilBERT (Refined Query Semantic Class Classifier)
ML Models & AlgorithmsA fine-tuned BERT model used for semantic class classification based on queries.
Distribution-based Clustering
Task TypesAssumes data is composed of probabilistic distributions (e.g., Gaussian Mixture Model).
Elasticsearch
APIsMentioned as a tool/API example for fuzzy matching and product name standardization.
Embedding
Core Concepts (AI/ML)A numerical representation capturing the meaning of a document or data. Also referred to as a semantic feature vector.
Emotion detection/analysis
NLP (Concepts & Pipeline)A specialized NLP task that detects emotions like sadness, joy, fear, disgust, and anger.
Emotion Scores
Evaluation & MetricsSpecific numerical scores returned by emotion analysis (e.g., sadness score, joy score, fear score, disgust score, anger score).
Encoder Model
ML Models & AlgorithmsA machine learning model used in Google's two-step process for building and maintaining the Knowledge Graph when answering questions.
Entity
Core Concepts (AI/ML)A representation of real-world objects (people, products, places, concepts) that hold value from an SEO perspective.
Entity Attribute (EAV Model)
Core Concepts (AI/ML)Defining properties or characteristics of an entity (e.g., location, niche) used in the EAV model for semantic keyword research.
Entity Attribute Variable (EAV Model)
Core Concepts (AI/ML)The concept encompassing entities, their attributes, and the specific values (variables) associated with those attributes.
Entity Extraction (NER)
NLP (Concepts & Pipeline)A core NLP technique aimed at extraction and classification of key information (named entities) within text data. It falls under supervised ML.
Entity Salience Score
Evaluation & MetricsA score assigned by the Google Natural Language API to each extracted entity, indicating its relative importance or prominence within the analyzed text.
Entity Sentiment Analysis
NLP (Concepts & Pipeline)Combines entity analysis and sentiment analysis to determine the sentiment (positive or negative) expressed about specific entities within the text.
Entity Variables (EAV)
Core Concepts (AI/ML)Specific values an entity attribute can take (e.g., London, Paris for the Location attribute).
Feature Extraction
Core Concepts (AI/ML)The process of converting entities into numerical representations based on term importance (e.g., using TF-IDF).
Fuzzy Matching / Fuzzy String Matching
ML Models & AlgorithmsA string similarity assessment approach, typically relying on character distance rather than semantics, used to identify similar but non-exact matches.
Python libraries/algorithms specifically used for fuzzy string matching.
Gaussian Mixture Models (GMM)
ML Models & AlgorithmsA distribution-based model that summarizes a multivariate probability density function with a mixture of Gaussian distributions. More flexible than k-means.
Gemini (Google)
APIsA generative AI model (LLM) used for tasks like content transformation and extraction of insights/summaries.
Gensim
APIsA library associated with topic modeling algorithms like LDA.
Offer easy access to real-time keyword suggestions across various Google platforms (Search, YouTube, Maps, Merchant).
Google Cloud
APIsThe system hosting various Google APIs, including the Natural Language API and Knowledge Graph API.
Google Cloud AutoML
APIsA tool used to fine-tune pre-trained models on specialized domains/data (e.g., specializing Google's classification for niche medical or legal fields).
A versatile NLP API provided by Google Cloud with modules for entity identification, sentiment analysis, entity sentiment, content moderation, text classification, and syntax analysis.
Allows programmatic access to and leveraging of the Knowledge Graph for applications like entity exploration and popularity measurement.
A Google Cloud API drawing from a library of language structure, grammar, sentiment, and real-world entities, used to extract and analyze entities and entity sentiment from queries and text.
Hard Clustering
Task TypesA type of clustering where data points are assigned exclusively to a single cluster.
HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)
ML Models & AlgorithmsA hierarchical version of DBSCAN; used by BERTopic to identify dense clusters.
Hierarchical Clustering
Task TypesA clustering approach where data points are recursively merged or split to create a tree-like structure (dendrogram).
A platform providing transformer models that can be used for topic modeling and zero-shot classification.
Hyperparameter Tuning
Core Concepts (AI/ML)The process of adjusting configuration settings (like the number of topics or iterations) of an ML algorithm to improve performance and quality of the model's output.
An NLP API with modules for entity identification, sentiment analysis, relations, key phrases, concepts extraction, emotion detection, and metadata extraction.
Information Gain (IG)
Core Concepts (AI/ML)A measure used to evaluate how much new, meaningful information a feature, document, or phrase provides beyond what is already known; it quantifies the reduction in uncertainty/entropy when additional data is introduced.
Jaccard Similarity
ML Models & AlgorithmsA distance-based algorithm measuring similarity based on overlapping n-grams or characters.
K-Means
ML Models & AlgorithmsThe most widely used centroid clustering algorithm. Efficient and scales well for large datasets.
K-Means Clustering
ML Models & AlgorithmsAn unsupervised, vector-based learning algorithm used for clustering entities based on semantic similarity; generally scalable and computationally efficient for moderately large datasets with predefined cluster counts.
KeyBERT
ML Models & AlgorithmsUsed to extract the core most important semantically relevant n-gram and bigram from a keyword.
KeyBERT
ML Models & AlgorithmsAn algorithm that groups keywords based on semantic similarity, often providing unigram, bigram, or trigram clusters.
Keyword Difficulty (KD)
Evaluation & MetricsA metric associated with keywords, used in prioritization and visualization.
Knowledge Graph
Core Concepts (AI/ML)A structured database of facts about people, places, and things that Google and other systems use to understand entities and their relationships.
Knowledge Graph (KG)
Core Concepts (AI/ML)A network of interconnected entities (nodes) and relationships (edges) representing real-world data, often structured databases, used to enhance search results.
LDA (Latent Dirichlet Allocation)
ML Models & AlgorithmsA Bayesian, conditional probabilistic model for soft/fuzzy topic modeling. Documents can belong to multiple topics, but typically requires manual predefinition of the number of topics.
Levenshtein Distance
ML Models & AlgorithmsA distance-based algorithm measuring the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.
Lexical or morphological analysis
NLP (Concepts & Pipeline)The first part of the five phases of compiler design (NLP).
LinkBERT
ML Models & AlgorithmsAn ML model used for identifying internal link opportunities based on the semantic similarity of content, complementing entity-based approaches.
Logistic Regression
ML Models & AlgorithmsAn early, simple classification model.
Looker Studio
APIsA visualization tool used for displaying dashboards, often connecting to data outputs from ML analysis.
LSA (Latent Semantic Analysis)
ML Models & AlgorithmsA decomposition-based clustering algorithm for topic modeling (also known as Singular Value Decomposition or SVD).
Machine Learning (ML)
Core Concepts (AI/ML)Technology used for tasks like generating information gain scores, predicting text, and enabling advanced keyword analysis.
Machine Learning (ML)
Core Concepts (AI/ML)A subset of AI that gained popularity around the 1990s and early 2000s, driven by big data analytics and increased computing power (like GPUs).
Macro-Context
Core Concepts (AI/ML)The broader categorization of a piece of content into general domains (e.g., medicine, sports), establishing an overarching understanding of the general topic.
Mean Shift
ML Models & AlgorithmsA density-based clustering algorithm that shifts data iteratively towards the highest density region.
Metaphone
ML Models & AlgorithmsA phonetic matching technique (phonetic string matching) that excels at handling misspellings and letter absences, especially in languages other than English.
Micro Intent
Core Concepts (AI/ML)Highly specific, often brand-subjective query intent classifications, capturing nuanced user needs beyond general intent categories.
Micro-Context
Core Concepts (AI/ML)A detailed level, focusing on specific terms or phrases relevant within a domain to pinpoint the exact content within the broader macro-context.
Multi-Class Classification
Task TypesClassification where data is assigned exclusively to one of three or more options (e.g., categorizing page type: blog post, FAQ page, landing page).
Multi-Label Classification
Task TypesClassification where an input can belong to multiple categories simultaneously (e.g., tagging a blog post with multiple topics like "analytics" and "SEO").
N-gram
Core Concepts (AI/ML)A contiguous sequence of n items from a sequence of text, used for analysis.
N-gram Matching
ML Models & AlgorithmsFuzzy matching methods based on overlapping substrings (n-grams); efficient for large datasets.
Natural Language Processing (NLP)
NLP (Concepts & Pipeline)A field dealing with processing text; includes tasks like Entity Extraction/NER.
Neural Network
ML Models & AlgorithmsA type of machine learning model used to calculate information gain scores based on the semantic vectors of documents.
An open-source NLP solution/library.
NMF (Non-negative Matrix Factorization)
ML Models & AlgorithmsA topic modeling algorithm often compared to LDA and BERTopic, requiring text pre-processing and defining the number of topics beforehand.
Ontology
Core Concepts (AI/ML)A formal framework that defines concepts, categories, and relationships within a specific domain; serves as a blueprint for organizing and interpreting data.
PCA (Principal Component Analysis)
ML Models & AlgorithmsA dimensionality reduction technique that reduces high-dimensional data (like TF-IDF vectors) to two dimensions (2D) for visualization while preserving semantic structure.
Phonetic matching
ML Models & AlgorithmsFuzzy matching methods focusing on pronunciation rather than exact spelling (e.g., Metaphone, Soundex).
Place API
APIsUsed specifically for accessing Query Autocomplete and Place Autocomplete models on the Google Maps platform.
Predictive Text Models
ML Models & AlgorithmsMachine learning technologies responsible for predicting incomplete words as the user is typing (e.g., Autocomplete).
Programmer Model
ML Models & AlgorithmsA machine learning model used in Google's Q&A process via the Knowledge Graph, translating a natural language question into an executable program.
Query Augmentation
Core Concepts (AI/ML)The expansion and enrichment of keyword data with synonyms and related terms using augmentation techniques, improving content coverage.
A model from the Google Maps Platform used for getting real-time keyword suggestions, specifically for geographical search queries.
Query Context
Core Concepts (AI/ML)Considers surrounding factors (like location, device, or preceding queries) to interpret the user's intent more accurately.
Query Distance
Evaluation & MetricsA measure of how similar queries are to one another, often calculated using fuzzy matching.
Query Path
Core Concepts (AI/ML)The logical progression or chain of queries in a session, showing movement from broader to more specific topics (or vice versa).
Query Sequence
Core Concepts (AI/ML)Examines the order in which a user conducts multiple queries, revealing how they refine or expand their search over time.
A generative AI model used in entity extraction comparisons.
Reinforcement Learning
Learning ParadigmsAn ML category involving learning through trial and error to reach an objective.
Salience (Importance)
Evaluation & MetricsA metric indicating the importance or prominence of an entity in the context of the document.
Search Intent
Core Concepts (AI/ML)Determines the underlying motivation of a search query (e.g., informational, navigational, transactional, commercial investigation).
Search Volume (Volume)
Evaluation & MetricsA traditional keyword metric, referring to the monthly search volume, used in analysis and prioritization.
Semantic Analysis
NLP (Concepts & Pipeline)Phase 3 of NLP (compiler design) aimed at understanding the meaning in a statement. Includes entity analysis, sentiment analysis, and topic modeling.
Semantic Representation
Core Concepts (AI/ML)A form of data extracted from page contents, which could be an embedding, a bag of words, or a histogram.
Sentence-BERT (sBERT)
ML Models & AlgorithmsA supervised, embedding-based approach used for detailed classification and contextual keyword analysis, offering high semantic precision when mapping keywords to topics.
Sentiment Analysis
NLP (Concepts & Pipeline)Analyzes text to identify the dominant emotional opinion (positive, negative, or neutral).
Sentiment Magnitude
Evaluation & MetricsA metric returned by the Google Natural Language API alongside Sentiment Score, used in entity sentiment analysis.
Sentiment Magnitude
Evaluation & MetricsThe measure of the strength of the opinion or sentiment expressed.
Sentiment Score
Evaluation & MetricsA metric returned by the Google Natural Language API, used in entity sentiment analysis.
Sentiment Score (Polarity)
Evaluation & MetricsThe emotional polarity (positive/negative/neutral) expressed in the text, used to assess overall sentiment.
Serpapi
APIsA tool for scraping SERP data, offering a free tier of 100 free searches per month.
Session Context
Core Concepts (AI/ML)Captures the broader context of a user's entire search session, including all queries made and pages visited.
Session Duration
Evaluation & MetricsA user engagement metric (part of GA4 metrics) used in semantic analysis.
Similarity Score
Evaluation & MetricsA score quantifying the likeness between two strings in fuzzy matching (e.g., in Levenshtein distance), typically ranging from 0 to 1.
Soft/Fuzzy Clustering
Task TypesA type of clustering where data points can belong to multiple topics/clusters with varying probabilities (e.g., LDA).
spaCy
APIsAn open-source NLP library used for custom training and deeply fine-tuning NLP models; used in Keyword Clustering.
Stop Words
Core Concepts (AI/ML)Words (like articles or prepositions) that are removed from text analysis to focus on more meaningful terms, often customized for specific content.
String Fuzzy Matching
ML Models & AlgorithmsA supervised, heuristic, string-based method suitable for quick, lightweight, and approximate matching tasks.
Supervised Learning
Learning ParadigmsAn ML approach used when labeled data is available. Entity extraction (NER) falls under this category.
Support Vector Machine (SVM)
ML Models & AlgorithmsAn early model often used for classification tasks.
Syntax analysis (parsing)
NLP (Concepts & Pipeline)Phase 2 of NLP (compiler design) that analyzes grammatical structure.
TF-IDF (Term Frequency-Inverse Document Frequency)
ML Models & AlgorithmsA widely used technique for text vectorization; it converts text data (entities) into numerical vectors, emphasizing the importance of unique terms in the text.
TF-IDF with Cosine Similarity
ML Models & AlgorithmsA vector-based approach that weighs rarer terms higher to calculate similarity; used in fuzzy matching for better context-sensitive results.
Tokenization
Core Concepts (AI/ML)The process of splitting text into tokens (words or phrases) during pre-processing.
Top2Vec
ML Models & AlgorithmsAn embedding-based topic model noted for scaling efficiently to very large datasets, sometimes preferred over BERTopic when speed on large data is crucial.
Topic Modeling
NLP (Concepts & Pipeline)An unsupervised task (clustering) for identifying themes/topics from large sets of unstructured text, often applied to long-form or short-form content.
Trends
Evaluation & MetricsData showing search trends and popularity, useful for identifying emerging keywords.
Trigram
Core Concepts (AI/ML)A sequence of three adjacent words.
Tuples
Core Concepts (AI/ML)The relationship between an entity (subject) and a fact about that entity (predicate/object pair), representing real-world facts within a data graph.
UMAP
ML Models & AlgorithmsUsed in BERTopic for efficient dimensionality reduction of embeddings.
Unigram/Bigram/Trigram/N-gram
Core Concepts (AI/ML)Terms used to describe keyword clusters or patterns (1-word, 2-word, 3-word clusters/phrases) identified during analysis or search intent reverse-engineering.
Unsupervised Learning
Learning ParadigmsAn ML approach used when the model is not told what to look for (no labeled data); the goal is to uncover patterns and unveil data structures. Tasks include Clustering and Dimensionality Reduction.
User Search Behavior
Core Concepts (AI/ML)Analysis of patterns in user interaction (type, click, abandon queries) to understand engagement and interest levels.
Vertex AI
APIsGoogle Cloud's unified and fully-managed machine learning platform that provides tools to build, train, and deploy AI models. The platform via which APIs for autocomplete for Google Merchant, and more are are operating in.

