Glossary of AI/ML Terms for Marketers and SEOs

Showing 145 terms

Accuracy Score

Evaluation & Metrics

A metric used in systems that predict query responses; the decision to display a short answer is contingent on comparing this score against a predetermined threshold.

Active Learning

Learning Paradigms

A technique used during custom training to iteratively select the most informative instances for labeling, thus reducing the overall labeling effort for entity extraction.

AI Overview

Core Concepts (AI/ML)

AI-generated summaries of highly informational, low-intent queries, offering quick answers to users, or generally, a SERP feature.

Amazon Comprehend

APIs

An NLP API mentioned for various text analysis tasks including entity extraction, sentiment analysis, and keyword/key phrase extraction.

Apps Script

APIs

An integration option that allows for the incorporation of automation with Google Cloud's APIs, useful for regular monitoring of the Knowledge Graph.

Artificial Intelligence (AI)

Core Concepts (AI/ML)

The overarching concept related to the design and study of intelligent systems. Early systems relied on symbolic logic and rule-based systems.

Augmented Search Queries

Core Concepts (AI/ML)

Queries that expand or modify the original user query to improve search accuracy and relevance by including additional terms or entities related to the initial query.

Bag of Words

Core Concepts (AI/ML)

A type of semantic representation of data, which can be extracted from page contents.

BERT (Bidirectional Encoder Representations from Transformers)

ML Models & Algorithms

The foundational language model used for transformer-based embeddings in BERTopic.

An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic clustering, suitable for large unstructured datasets. An embedding-based topic modeling algorithm that uses BERT embeddings, UMAP for dimensionality reduction, and HDBSCAN for clustering. Excels at semantic coherence, minimal preprocessing, and automatically detecting the number of topics. Effective for short text.

BERTopic

ML Models & Algorithms

Bigram

Core Concepts (AI/ML)

A sequence of two adjacent words.

Binary Classification

Task Types

Classification task with two possible outcomes (e.g., positive or negative sentiment).

BIRCH (Balanced Iterative Hierarchical Based Clustering)

ML Models & Algorithms

A hierarchical clustering method efficient for large datasets and time series.

Bounce Rate

Evaluation & Metrics

A GA4 user engagement metric used to monitor patterns in user interaction.

Boyer-Moore

ML Models & Algorithms

An exact string-matching algorithm and one of the best-known pattern recognition algorithms.

c-TF-IDF

ML Models & Algorithms

Class-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of important terms per cluster.

Centroid-based Clustering

Task Types

Organizes data into non-hierarchical clusters based on the arithmetic mean (centroid) of the points. Efficient but sensitive to initial conditions and outliers.

ChatGPT / GPT-4 (OpenAI)

APIs

Generative AI models/APIs used for tasks like content transformation and comparison in entity extraction, but noted for potential unreliability in NLP tasks.

Clustering (ML Task)

Task Types

Grouping data points into clusters based on similarity; an unsupervised learning task.

Coherence Score

Evaluation & Metrics

Used for evaluating the quality of topics produced by algorithms like LDA; high coherence suggests good topic quality.

Confidence Score (Generative AI)

Evaluation & Metrics

A measure of certainty provided by a generative AI model regarding its classification.

Content Moderation

NLP (Concepts & Pipeline)

Automatically flags or categorizes potentially unsafe or sensitive text (e.g., explicit or hateful content), ensuring brand standards are met.

CPC (Cost Per Click)

Evaluation & Metrics

A metric used in keyword analysis and visualizations.

CTR (Click-Through Rate)

Evaluation & Metrics

A metric related to user interaction with search results.

DataForSEO API

APIs

A set of APIs for keyword research and SERP analysis, including SERP API, Keywords Data API, Traffic Analytics API, Review API, Merchant API, and Labs API.

DBSCAN

ML Models & Algorithms

Density-Based Spatial Clustering of Applications with Noise; groups data points based on density. Useful for anomaly detection.

Decision Tree

ML Models & Algorithms

An early, simple model for classification or regression.

Deep Learning

Core Concepts (AI/ML)

A part of machine learning; Generative AI models like ChatGPT and LLM-based chatbots fall within this category.

Deepseek R1

APIs

A newer generative AI chatbot used in entity extraction comparisons.

Density-based Clustering

Task Types

Groups data points based on density and proximity. Does not require pre-defining the number of clusters and is good for finding arbitrarily shaped clusters and outliers.

Dimensionality Reduction

Core Concepts (AI/ML)

A process that reduces data, such as high-dimensional vectors, for visualization while preserving semantic structure (e.g., using PCA).

Distance-based matching

ML Models & Algorithms

Fuzzy matching methods focusing on "edit distance" rather than exact spelling.

DistilBERT (Refined Query Semantic Class Classifier)

ML Models & Algorithms

A fine-tuned BERT model used for semantic class classification based on queries.

Distribution-based Clustering

Task Types

Assumes data is composed of probabilistic distributions (e.g., Gaussian Mixture Model).

Elasticsearch

APIs

Mentioned as a tool/API example for fuzzy matching and product name standardization.

Embedding

Core Concepts (AI/ML)

A numerical representation capturing the meaning of a document or data. Also referred to as a semantic feature vector.

Emotion detection/analysis

NLP (Concepts & Pipeline)

A specialized NLP task that detects emotions like sadness, joy, fear, disgust, and anger.

Emotion Scores

Evaluation & Metrics

Specific numerical scores returned by emotion analysis (e.g., sadness score, joy score, fear score, disgust score, anger score).

Encoder Model

ML Models & Algorithms

A machine learning model used in Google's two-step process for building and maintaining the Knowledge Graph when answering questions.

Entity

Core Concepts (AI/ML)

A representation of real-world objects (people, products, places, concepts) that hold value from an SEO perspective.

Entity Attribute (EAV Model)

Core Concepts (AI/ML)

Defining properties or characteristics of an entity (e.g., location, niche) used in the EAV model for semantic keyword research.

Entity Attribute Variable (EAV Model)

Core Concepts (AI/ML)

The concept encompassing entities, their attributes, and the specific values (variables) associated with those attributes.

Entity Extraction (NER)

NLP (Concepts & Pipeline)

A core NLP technique aimed at extraction and classification of key information (named entities) within text data. It falls under supervised ML.

Entity Salience Score

Evaluation & Metrics

A score assigned by the Google Natural Language API to each extracted entity, indicating its relative importance or prominence within the analyzed text.

Entity Sentiment Analysis

NLP (Concepts & Pipeline)

Combines entity analysis and sentiment analysis to determine the sentiment (positive or negative) expressed about specific entities within the text.

Entity Variables (EAV)

Core Concepts (AI/ML)

Specific values an entity attribute can take (e.g., London, Paris for the Location attribute).

Feature Extraction

Core Concepts (AI/ML)

The process of converting entities into numerical representations based on term importance (e.g., using TF-IDF).

Fuzzy Matching / Fuzzy String Matching

ML Models & Algorithms

A string similarity assessment approach, typically relying on character distance rather than semantics, used to identify similar but non-exact matches.

FuzzyWuzzy/RapidFuzz/PolyFuzz

APIs

Python libraries/algorithms specifically used for fuzzy string matching.

Gaussian Mixture Models (GMM)

ML Models & Algorithms

A distribution-based model that summarizes a multivariate probability density function with a mixture of Gaussian distributions. More flexible than k-means.

Gemini (Google)

APIs

A generative AI model (LLM) used for tasks like content transformation and extraction of insights/summaries.

Gensim

APIs

A library associated with topic modeling algorithms like LDA.

Google Autocomplete APIs

APIs

Offer easy access to real-time keyword suggestions across various Google platforms (Search, YouTube, Maps, Merchant).

Google Cloud

APIs

The system hosting various Google APIs, including the Natural Language API and Knowledge Graph API.

Google Cloud AutoML

APIs

A tool used to fine-tune pre-trained models on specialized domains/data (e.g., specializing Google's classification for niche medical or legal fields).

Google Cloud Natural Language API

APIs

A versatile NLP API provided by Google Cloud with modules for entity identification, sentiment analysis, entity sentiment, content moderation, text classification, and syntax analysis.

Google Knowledge Graph Search API

APIs

Allows programmatic access to and leveraging of the Knowledge Graph for applications like entity exploration and popularity measurement.

Google Natural Language API (Google NLP API)

APIs

A Google Cloud API drawing from a library of language structure, grammar, sentiment, and real-world entities, used to extract and analyze entities and entity sentiment from queries and text.

Hard Clustering

Task Types

A type of clustering where data points are assigned exclusively to a single cluster.

HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)

ML Models & Algorithms

A hierarchical version of DBSCAN; used by BERTopic to identify dense clusters.

Hierarchical Clustering

Task Types

A clustering approach where data points are recursively merged or split to create a tree-like structure (dendrogram).

Hugging Face Transformers

APIs

A platform providing transformer models that can be used for topic modeling and zero-shot classification.

Hyperparameter Tuning

Core Concepts (AI/ML)

The process of adjusting configuration settings (like the number of topics or iterations) of an ML algorithm to improve performance and quality of the model's output.

IBM Watson NLU (Natural Language Understanding)

APIs

An NLP API with modules for entity identification, sentiment analysis, relations, key phrases, concepts extraction, emotion detection, and metadata extraction.

Information Gain (IG)

Core Concepts (AI/ML)

A measure used to evaluate how much new, meaningful information a feature, document, or phrase provides beyond what is already known; it quantifies the reduction in uncertainty/entropy when additional data is introduced.

Jaccard Similarity

ML Models & Algorithms

A distance-based algorithm measuring similarity based on overlapping n-grams or characters.

K-Means

ML Models & Algorithms

The most widely used centroid clustering algorithm. Efficient and scales well for large datasets.

K-Means Clustering

ML Models & Algorithms

An unsupervised, vector-based learning algorithm used for clustering entities based on semantic similarity; generally scalable and computationally efficient for moderately large datasets with predefined cluster counts.

KeyBERT

ML Models & Algorithms

Used to extract the core most important semantically relevant n-gram and bigram from a keyword.

KeyBERT

ML Models & Algorithms

An algorithm that groups keywords based on semantic similarity, often providing unigram, bigram, or trigram clusters.

Keyword Difficulty (KD)

Evaluation & Metrics

A metric associated with keywords, used in prioritization and visualization.

Knowledge Graph

Core Concepts (AI/ML)

A structured database of facts about people, places, and things that Google and other systems use to understand entities and their relationships.

Knowledge Graph (KG)

Core Concepts (AI/ML)

A network of interconnected entities (nodes) and relationships (edges) representing real-world data, often structured databases, used to enhance search results.

LDA (Latent Dirichlet Allocation)

ML Models & Algorithms

A Bayesian, conditional probabilistic model for soft/fuzzy topic modeling. Documents can belong to multiple topics, but typically requires manual predefinition of the number of topics.

Levenshtein Distance

ML Models & Algorithms

A distance-based algorithm measuring the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.

Lexical or morphological analysis

NLP (Concepts & Pipeline)

The first part of the five phases of compiler design (NLP).

LinkBERT

ML Models & Algorithms

An ML model used for identifying internal link opportunities based on the semantic similarity of content, complementing entity-based approaches.

Logistic Regression

ML Models & Algorithms

An early, simple classification model.

Looker Studio

APIs

A visualization tool used for displaying dashboards, often connecting to data outputs from ML analysis.

LSA (Latent Semantic Analysis)

ML Models & Algorithms

A decomposition-based clustering algorithm for topic modeling (also known as Singular Value Decomposition or SVD).

Machine Learning (ML)

Core Concepts (AI/ML)

Technology used for tasks like generating information gain scores, predicting text, and enabling advanced keyword analysis.

Machine Learning (ML)

Core Concepts (AI/ML)

A subset of AI that gained popularity around the 1990s and early 2000s, driven by big data analytics and increased computing power (like GPUs).

Macro-Context

Core Concepts (AI/ML)

The broader categorization of a piece of content into general domains (e.g., medicine, sports), establishing an overarching understanding of the general topic.

Mean Shift

ML Models & Algorithms

A density-based clustering algorithm that shifts data iteratively towards the highest density region.

Metaphone

ML Models & Algorithms

A phonetic matching technique (phonetic string matching) that excels at handling misspellings and letter absences, especially in languages other than English.

Micro Intent

Core Concepts (AI/ML)

Highly specific, often brand-subjective query intent classifications, capturing nuanced user needs beyond general intent categories.

Micro-Context

Core Concepts (AI/ML)

A detailed level, focusing on specific terms or phrases relevant within a domain to pinpoint the exact content within the broader macro-context.

Multi-Class Classification

Task Types

Classification where data is assigned exclusively to one of three or more options (e.g., categorizing page type: blog post, FAQ page, landing page).

Multi-Label Classification

Task Types

Classification where an input can belong to multiple categories simultaneously (e.g., tagging a blog post with multiple topics like "analytics" and "SEO").

N-gram

Core Concepts (AI/ML)

A contiguous sequence of n items from a sequence of text, used for analysis.

N-gram Matching

ML Models & Algorithms

Fuzzy matching methods based on overlapping substrings (n-grams); efficient for large datasets.

Natural Language Processing (NLP)

NLP (Concepts & Pipeline)

A field dealing with processing text; includes tasks like Entity Extraction/NER.

Neural Network

ML Models & Algorithms

A type of machine learning model used to calculate information gain scores based on the semantic vectors of documents.

NLTK (Natural Language Toolkit)

APIs

An open-source NLP solution/library.

NMF (Non-negative Matrix Factorization)

ML Models & Algorithms

A topic modeling algorithm often compared to LDA and BERTopic, requiring text pre-processing and defining the number of topics beforehand.

Ontology

Core Concepts (AI/ML)

A formal framework that defines concepts, categories, and relationships within a specific domain; serves as a blueprint for organizing and interpreting data.

PCA (Principal Component Analysis)

ML Models & Algorithms

A dimensionality reduction technique that reduces high-dimensional data (like TF-IDF vectors) to two dimensions (2D) for visualization while preserving semantic structure.

Phonetic matching

ML Models & Algorithms

Fuzzy matching methods focusing on pronunciation rather than exact spelling (e.g., Metaphone, Soundex).

Place API

APIs

Used specifically for accessing Query Autocomplete and Place Autocomplete models on the Google Maps platform.

Predictive Text Models

ML Models & Algorithms

Machine learning technologies responsible for predicting incomplete words as the user is typing (e.g., Autocomplete).

Programmer Model

ML Models & Algorithms

A machine learning model used in Google's Q&A process via the Knowledge Graph, translating a natural language question into an executable program.

Query Augmentation

Core Concepts (AI/ML)

The expansion and enrichment of keyword data with synonyms and related terms using augmentation techniques, improving content coverage.

Query Autocomplete (via Place API)

APIs

A model from the Google Maps Platform used for getting real-time keyword suggestions, specifically for geographical search queries.

Query Context

Core Concepts (AI/ML)

Considers surrounding factors (like location, device, or preceding queries) to interpret the user's intent more accurately.

Query Distance

Evaluation & Metrics

A measure of how similar queries are to one another, often calculated using fuzzy matching.

Query Path

Core Concepts (AI/ML)

The logical progression or chain of queries in a session, showing movement from broader to more specific topics (or vice versa).

Query Sequence

Core Concepts (AI/ML)

Examines the order in which a user conducts multiple queries, revealing how they refine or expand their search over time.

Qwen 2.5 Plus (Alibaba)

APIs

A generative AI model used in entity extraction comparisons.

Reinforcement Learning

Learning Paradigms

An ML category involving learning through trial and error to reach an objective.

Salience (Importance)

Evaluation & Metrics

A metric indicating the importance or prominence of an entity in the context of the document.

Search Intent

Core Concepts (AI/ML)

Determines the underlying motivation of a search query (e.g., informational, navigational, transactional, commercial investigation).

Search Volume (Volume)

Evaluation & Metrics

A traditional keyword metric, referring to the monthly search volume, used in analysis and prioritization.

Semantic Analysis

NLP (Concepts & Pipeline)

Phase 3 of NLP (compiler design) aimed at understanding the meaning in a statement. Includes entity analysis, sentiment analysis, and topic modeling.

Semantic Representation

Core Concepts (AI/ML)

A form of data extracted from page contents, which could be an embedding, a bag of words, or a histogram.

Sentence-BERT (sBERT)

ML Models & Algorithms

A supervised, embedding-based approach used for detailed classification and contextual keyword analysis, offering high semantic precision when mapping keywords to topics.

Sentiment Analysis

NLP (Concepts & Pipeline)

Analyzes text to identify the dominant emotional opinion (positive, negative, or neutral).

Sentiment Magnitude

Evaluation & Metrics

A metric returned by the Google Natural Language API alongside Sentiment Score, used in entity sentiment analysis.

Sentiment Magnitude

Evaluation & Metrics

The measure of the strength of the opinion or sentiment expressed.

Sentiment Score

Evaluation & Metrics

A metric returned by the Google Natural Language API, used in entity sentiment analysis.

Sentiment Score (Polarity)

Evaluation & Metrics

The emotional polarity (positive/negative/neutral) expressed in the text, used to assess overall sentiment.

Serpapi

APIs

A tool for scraping SERP data, offering a free tier of 100 free searches per month.

Session Context

Core Concepts (AI/ML)

Captures the broader context of a user's entire search session, including all queries made and pages visited.

Session Duration

Evaluation & Metrics

A user engagement metric (part of GA4 metrics) used in semantic analysis.

Similarity Score

Evaluation & Metrics

A score quantifying the likeness between two strings in fuzzy matching (e.g., in Levenshtein distance), typically ranging from 0 to 1.

Soft/Fuzzy Clustering

Task Types

A type of clustering where data points can belong to multiple topics/clusters with varying probabilities (e.g., LDA).

spaCy

APIs

An open-source NLP library used for custom training and deeply fine-tuning NLP models; used in Keyword Clustering.

Stop Words

Core Concepts (AI/ML)

Words (like articles or prepositions) that are removed from text analysis to focus on more meaningful terms, often customized for specific content.

String Fuzzy Matching

ML Models & Algorithms

A supervised, heuristic, string-based method suitable for quick, lightweight, and approximate matching tasks.

Supervised Learning

Learning Paradigms

An ML approach used when labeled data is available. Entity extraction (NER) falls under this category.

Support Vector Machine (SVM)

ML Models & Algorithms

An early model often used for classification tasks.

Syntax analysis (parsing)

NLP (Concepts & Pipeline)

Phase 2 of NLP (compiler design) that analyzes grammatical structure.

TF-IDF (Term Frequency-Inverse Document Frequency)

ML Models & Algorithms

A widely used technique for text vectorization; it converts text data (entities) into numerical vectors, emphasizing the importance of unique terms in the text.

TF-IDF with Cosine Similarity

ML Models & Algorithms

A vector-based approach that weighs rarer terms higher to calculate similarity; used in fuzzy matching for better context-sensitive results.

Tokenization

Core Concepts (AI/ML)

The process of splitting text into tokens (words or phrases) during pre-processing.

Top2Vec

ML Models & Algorithms

An embedding-based topic model noted for scaling efficiently to very large datasets, sometimes preferred over BERTopic when speed on large data is crucial.

Topic Modeling

NLP (Concepts & Pipeline)

An unsupervised task (clustering) for identifying themes/topics from large sets of unstructured text, often applied to long-form or short-form content.

Trends

Evaluation & Metrics

Data showing search trends and popularity, useful for identifying emerging keywords.

Trigram

Core Concepts (AI/ML)

A sequence of three adjacent words.

Tuples

Core Concepts (AI/ML)

The relationship between an entity (subject) and a fact about that entity (predicate/object pair), representing real-world facts within a data graph.

UMAP

ML Models & Algorithms

Used in BERTopic for efficient dimensionality reduction of embeddings.

Unigram/Bigram/Trigram/N-gram

Core Concepts (AI/ML)

Terms used to describe keyword clusters or patterns (1-word, 2-word, 3-word clusters/phrases) identified during analysis or search intent reverse-engineering.

Unsupervised Learning

Learning Paradigms

An ML approach used when the model is not told what to look for (no labeled data); the goal is to uncover patterns and unveil data structures. Tasks include Clustering and Dimensionality Reduction.

User Search Behavior

Core Concepts (AI/ML)

Analysis of patterns in user interaction (type, click, abandon queries) to understand engagement and interest levels.

Vertex AI

APIs

Google Cloud's unified and fully-managed machine learning platform that provides tools to build, train, and deploy AI models. The platform via which APIs for autocomplete for Google Merchant, and more are are operating in.