Your cart is currently empty!
Accuracy Score
Evaluation & MetricsA metric used in systems that predict query responses; the decision to display a short answer is contingent on comparing this score against a predetermined threshold.
We just launched our courses -> Start learning today ✨

A metric used in systems that predict query responses; the decision to display a short answer is contingent on comparing this score against a predetermined threshold.
A technique used during custom training to iteratively select the most informative instances for labeling, thus reducing the overall labeling effort for entity extraction.
AI-generated summaries of highly informational, low-intent queries, offering quick answers to users, or generally, a SERP feature.
An NLP API mentioned for various text analysis tasks including entity extraction, sentiment analysis, and keyword/key phrase extraction.
An integration option that allows for the incorporation of automation with Google Cloud's APIs, useful for regular monitoring of the Knowledge Graph.
The overarching concept related to the design and study of intelligent systems. Early systems relied on symbolic logic and rule-based systems.
Queries that expand or modify the original user query to improve search accuracy and relevance by including additional terms or entities related to the initial query.
A type of semantic representation of data, which can be extracted from page contents.
The foundational language model used for transformer-based embeddings in BERTopic.
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic clustering, suitable for large unstructured datasets. An embedding-based topic modeling algorithm that uses BERT embeddings, UMAP for dimensionality reduction, and HDBSCAN for clustering. Excels at semantic coherence, minimal preprocessing, and automatically detecting the number of topics. Effective for short text.
An unsupervised machine learning approach for topic modeling that generates interpretable topics and performs dynamic clustering, suitable for large unstructured datasets. An embedding-based topic modeling algorithm that uses BERT embeddings, UMAP for dimensionality reduction, and HDBSCAN for clustering. Excels at semantic coherence, minimal preprocessing, and automatically detecting the number of topics. Effective for short text.
A sequence of two adjacent words.
Classification task with two possible outcomes (e.g., positive or negative sentiment).
A hierarchical clustering method efficient for large datasets and time series.
A GA4 user engagement metric used to monitor patterns in user interaction.
An exact string-matching algorithm and one of the best-known pattern recognition algorithms.
Class-based Term Frequency-Inverse Document Frequency; used by BERTopic for clearer topic representation and selection of important terms per cluster.
Organizes data into non-hierarchical clusters based on the arithmetic mean (centroid) of the points. Efficient but sensitive to initial conditions and outliers.
Generative AI models/APIs used for tasks like content transformation and comparison in entity extraction, but noted for potential unreliability in NLP tasks.
Grouping data points into clusters based on similarity; an unsupervised learning task.
Used for evaluating the quality of topics produced by algorithms like LDA; high coherence suggests good topic quality.
A measure of certainty provided by a generative AI model regarding its classification.
Automatically flags or categorizes potentially unsafe or sensitive text (e.g., explicit or hateful content), ensuring brand standards are met.
A metric used in keyword analysis and visualizations.
A metric related to user interaction with search results.
A set of APIs for keyword research and SERP analysis, including SERP API, Keywords Data API, Traffic Analytics API, Review API, Merchant API, and Labs API.
Density-Based Spatial Clustering of Applications with Noise; groups data points based on density. Useful for anomaly detection.
An early, simple model for classification or regression.
A part of machine learning; Generative AI models like ChatGPT and LLM-based chatbots fall within this category.
A newer generative AI chatbot used in entity extraction comparisons.
Groups data points based on density and proximity. Does not require pre-defining the number of clusters and is good for finding arbitrarily shaped clusters and outliers.
A process that reduces data, such as high-dimensional vectors, for visualization while preserving semantic structure (e.g., using PCA).
Fuzzy matching methods focusing on "edit distance" rather than exact spelling.
A fine-tuned BERT model used for semantic class classification based on queries.
Assumes data is composed of probabilistic distributions (e.g., Gaussian Mixture Model).
Mentioned as a tool/API example for fuzzy matching and product name standardization.
A numerical representation capturing the meaning of a document or data. Also referred to as a semantic feature vector.
A specialized NLP task that detects emotions like sadness, joy, fear, disgust, and anger.
Specific numerical scores returned by emotion analysis (e.g., sadness score, joy score, fear score, disgust score, anger score).
A machine learning model used in Google's two-step process for building and maintaining the Knowledge Graph when answering questions.
A representation of real-world objects (people, products, places, concepts) that hold value from an SEO perspective.
Defining properties or characteristics of an entity (e.g., location, niche) used in the EAV model for semantic keyword research.
The concept encompassing entities, their attributes, and the specific values (variables) associated with those attributes.
A core NLP technique aimed at extraction and classification of key information (named entities) within text data. It falls under supervised ML.
A score assigned by the Google Natural Language API to each extracted entity, indicating its relative importance or prominence within the analyzed text.
Combines entity analysis and sentiment analysis to determine the sentiment (positive or negative) expressed about specific entities within the text.
Specific values an entity attribute can take (e.g., London, Paris for the Location attribute).
The process of converting entities into numerical representations based on term importance (e.g., using TF-IDF).
A string similarity assessment approach, typically relying on character distance rather than semantics, used to identify similar but non-exact matches.
Python libraries/algorithms specifically used for fuzzy string matching.
A distribution-based model that summarizes a multivariate probability density function with a mixture of Gaussian distributions. More flexible than k-means.
A generative AI model (LLM) used for tasks like content transformation and extraction of insights/summaries.
A library associated with topic modeling algorithms like LDA.
Offer easy access to real-time keyword suggestions across various Google platforms (Search, YouTube, Maps, Merchant).
The system hosting various Google APIs, including the Natural Language API and Knowledge Graph API.
A tool used to fine-tune pre-trained models on specialized domains/data (e.g., specializing Google's classification for niche medical or legal fields).
A versatile NLP API provided by Google Cloud with modules for entity identification, sentiment analysis, entity sentiment, content moderation, text classification, and syntax analysis.
Allows programmatic access to and leveraging of the Knowledge Graph for applications like entity exploration and popularity measurement.
A Google Cloud API drawing from a library of language structure, grammar, sentiment, and real-world entities, used to extract and analyze entities and entity sentiment from queries and text.
A type of clustering where data points are assigned exclusively to a single cluster.
A hierarchical version of DBSCAN; used by BERTopic to identify dense clusters.
A clustering approach where data points are recursively merged or split to create a tree-like structure (dendrogram).
A platform providing transformer models that can be used for topic modeling and zero-shot classification.
The process of adjusting configuration settings (like the number of topics or iterations) of an ML algorithm to improve performance and quality of the model's output.
An NLP API with modules for entity identification, sentiment analysis, relations, key phrases, concepts extraction, emotion detection, and metadata extraction.
A measure used to evaluate how much new, meaningful information a feature, document, or phrase provides beyond what is already known; it quantifies the reduction in uncertainty/entropy when additional data is introduced.
A distance-based algorithm measuring similarity based on overlapping n-grams or characters.
The most widely used centroid clustering algorithm. Efficient and scales well for large datasets.
An unsupervised, vector-based learning algorithm used for clustering entities based on semantic similarity; generally scalable and computationally efficient for moderately large datasets with predefined cluster counts.
Used to extract the core most important semantically relevant n-gram and bigram from a keyword.
An algorithm that groups keywords based on semantic similarity, often providing unigram, bigram, or trigram clusters.
A metric associated with keywords, used in prioritization and visualization.
A structured database of facts about people, places, and things that Google and other systems use to understand entities and their relationships.
A network of interconnected entities (nodes) and relationships (edges) representing real-world data, often structured databases, used to enhance search results.
A Bayesian, conditional probabilistic model for soft/fuzzy topic modeling. Documents can belong to multiple topics, but typically requires manual predefinition of the number of topics.
A distance-based algorithm measuring the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.
The first part of the five phases of compiler design (NLP).
An ML model used for identifying internal link opportunities based on the semantic similarity of content, complementing entity-based approaches.
An early, simple classification model.
A visualization tool used for displaying dashboards, often connecting to data outputs from ML analysis.
A decomposition-based clustering algorithm for topic modeling (also known as Singular Value Decomposition or SVD).
Technology used for tasks like generating information gain scores, predicting text, and enabling advanced keyword analysis.
A subset of AI that gained popularity around the 1990s and early 2000s, driven by big data analytics and increased computing power (like GPUs).
The broader categorization of a piece of content into general domains (e.g., medicine, sports), establishing an overarching understanding of the general topic.
A density-based clustering algorithm that shifts data iteratively towards the highest density region.
A phonetic matching technique (phonetic string matching) that excels at handling misspellings and letter absences, especially in languages other than English.
Highly specific, often brand-subjective query intent classifications, capturing nuanced user needs beyond general intent categories.
A detailed level, focusing on specific terms or phrases relevant within a domain to pinpoint the exact content within the broader macro-context.
Classification where data is assigned exclusively to one of three or more options (e.g., categorizing page type: blog post, FAQ page, landing page).
Classification where an input can belong to multiple categories simultaneously (e.g., tagging a blog post with multiple topics like "analytics" and "SEO").
A contiguous sequence of n items from a sequence of text, used for analysis.
Fuzzy matching methods based on overlapping substrings (n-grams); efficient for large datasets.
A field dealing with processing text; includes tasks like Entity Extraction/NER.
A type of machine learning model used to calculate information gain scores based on the semantic vectors of documents.
An open-source NLP solution/library.
A topic modeling algorithm often compared to LDA and BERTopic, requiring text pre-processing and defining the number of topics beforehand.
A formal framework that defines concepts, categories, and relationships within a specific domain; serves as a blueprint for organizing and interpreting data.
A dimensionality reduction technique that reduces high-dimensional data (like TF-IDF vectors) to two dimensions (2D) for visualization while preserving semantic structure.
Fuzzy matching methods focusing on pronunciation rather than exact spelling (e.g., Metaphone, Soundex).
Used specifically for accessing Query Autocomplete and Place Autocomplete models on the Google Maps platform.
Machine learning technologies responsible for predicting incomplete words as the user is typing (e.g., Autocomplete).
A machine learning model used in Google's Q&A process via the Knowledge Graph, translating a natural language question into an executable program.
The expansion and enrichment of keyword data with synonyms and related terms using augmentation techniques, improving content coverage.
A model from the Google Maps Platform used for getting real-time keyword suggestions, specifically for geographical search queries.
Considers surrounding factors (like location, device, or preceding queries) to interpret the user's intent more accurately.
A measure of how similar queries are to one another, often calculated using fuzzy matching.
The logical progression or chain of queries in a session, showing movement from broader to more specific topics (or vice versa).
Examines the order in which a user conducts multiple queries, revealing how they refine or expand their search over time.
A generative AI model used in entity extraction comparisons.
An ML category involving learning through trial and error to reach an objective.
A metric indicating the importance or prominence of an entity in the context of the document.
Determines the underlying motivation of a search query (e.g., informational, navigational, transactional, commercial investigation).
A traditional keyword metric, referring to the monthly search volume, used in analysis and prioritization.
Phase 3 of NLP (compiler design) aimed at understanding the meaning in a statement. Includes entity analysis, sentiment analysis, and topic modeling.
A form of data extracted from page contents, which could be an embedding, a bag of words, or a histogram.
A supervised, embedding-based approach used for detailed classification and contextual keyword analysis, offering high semantic precision when mapping keywords to topics.
Analyzes text to identify the dominant emotional opinion (positive, negative, or neutral).
A metric returned by the Google Natural Language API alongside Sentiment Score, used in entity sentiment analysis.
The measure of the strength of the opinion or sentiment expressed.
A metric returned by the Google Natural Language API, used in entity sentiment analysis.
The emotional polarity (positive/negative/neutral) expressed in the text, used to assess overall sentiment.
A tool for scraping SERP data, offering a free tier of 100 free searches per month.
Captures the broader context of a user's entire search session, including all queries made and pages visited.
A user engagement metric (part of GA4 metrics) used in semantic analysis.
A score quantifying the likeness between two strings in fuzzy matching (e.g., in Levenshtein distance), typically ranging from 0 to 1.
A type of clustering where data points can belong to multiple topics/clusters with varying probabilities (e.g., LDA).
An open-source NLP library used for custom training and deeply fine-tuning NLP models; used in Keyword Clustering.
Words (like articles or prepositions) that are removed from text analysis to focus on more meaningful terms, often customized for specific content.
A supervised, heuristic, string-based method suitable for quick, lightweight, and approximate matching tasks.
An ML approach used when labeled data is available. Entity extraction (NER) falls under this category.
An early model often used for classification tasks.
Phase 2 of NLP (compiler design) that analyzes grammatical structure.
A widely used technique for text vectorization; it converts text data (entities) into numerical vectors, emphasizing the importance of unique terms in the text.
A vector-based approach that weighs rarer terms higher to calculate similarity; used in fuzzy matching for better context-sensitive results.
The process of splitting text into tokens (words or phrases) during pre-processing.
An embedding-based topic model noted for scaling efficiently to very large datasets, sometimes preferred over BERTopic when speed on large data is crucial.
An unsupervised task (clustering) for identifying themes/topics from large sets of unstructured text, often applied to long-form or short-form content.
Data showing search trends and popularity, useful for identifying emerging keywords.
A sequence of three adjacent words.
The relationship between an entity (subject) and a fact about that entity (predicate/object pair), representing real-world facts within a data graph.
Used in BERTopic for efficient dimensionality reduction of embeddings.
Terms used to describe keyword clusters or patterns (1-word, 2-word, 3-word clusters/phrases) identified during analysis or search intent reverse-engineering.
An ML approach used when the model is not told what to look for (no labeled data); the goal is to uncover patterns and unveil data structures. Tasks include Clustering and Dimensionality Reduction.
Analysis of patterns in user interaction (type, click, abandon queries) to understand engagement and interest levels.
Google Cloud's unified and fully-managed machine learning platform that provides tools to build, train, and deploy AI models. The platform via which APIs for autocomplete for Google Merchant, and more are are operating in.