Coding Script or Notebook
Google Colab
Academy (Access via Course)

Fuzzy Matching: Map Keywords to Seed Terms or Topics (Notebook)

Manually categorizing thousands of keywords into topics or matching them to predefined seed terms is time-consuming and subjective—this streamlined Google Colab notebook automates the process by matching every keyword in your list to the most similar term from a seed keyword list (up to 100 seed terms) using FuzzyWuzzy’s partial_ratio algorithm for substring-based fuzzy matching. Created by Lazarina Stoy for MLforSEO, this workflow is specifically designed for SEO professionals and content strategists who need to organize large keyword exports into predefined topic categories, product lines, or content buckets by automatically finding which seed term each keyword best aligns with based on character-level similarity. The partial_ratio approach excels at hierarchical matching: it detects when one string is contained within another (like matching “running shoes” to the seed term “shoes”) and handles modifier variations, making it ideal for mapping long-tail keywords to broader topic clusters or matching keyword variants to canonical seed terms without requiring exact matches.
The notebook implements a straightforward two-file comparison workflow with minimal configuration. You upload two CSV files: a seed keywords file (seed.csv) containing your predefined topic terms, category labels, or canonical keywords (the list you want to match TO), and a match keywords file (match.csv) containing the keywords you want to categorize or assign (these could be exports from keyword research tools, PPC campaigns, or content audits). Both files require a single “Keywords” column. The algorithm then performs one-to-many matching: for each keyword in match.csv, it compares that keyword against every seed keyword in seed.csv using FuzzyWuzzy’s partial_ratio scoring (which calculates similarity based on the best matching substring, normalized to 0-100 scale). The function identifies the seed keyword with the highest similarity score and assigns it as the best match. Results are output as a three-column CSV (fuzzy_matching_result.csv) showing: the original match keyword, the best matching seed keyword it was assigned to, and the match score percentage—enabling you to filter results by confidence threshold or review low-scoring matches manually. Unlike pairwise within-list comparison tools, this notebook specifically maps keywords TO a predefined taxonomy or seed list, making it a classification tool rather than a clustering tool.
Use this for:
‧ Topic categorization by automatically assigning hundreds or thousands of keywords from research exports to predefined topic buckets, content themes, or site sections based on your seed list
‧ Product mapping in e-commerce by matching long-tail search queries or user queries to canonical product category names for site navigation or internal search optimization
‧ Content tagging automation by matching blog post keywords or meta keywords to a master taxonomy of content topics for CMS organization
‧ PPC campaign organization by assigning keyword exports from broad research to specific ad groups or campaign themes defined in your seed list
‧ Keyword consolidation by matching variations and long-tail terms to their canonical seed keywords for reporting or tracking purposes
‧ Quality control validation by reviewing match scores to identify keywords that don’t fit well into your existing seed term taxonomy (low scores indicate gaps in your category structure)
‧ Hierarchical keyword mapping by using broader seed terms (“shoes,” “clothing,” “accessories”) to automatically categorize detailed product or search keywords into primary site categories
This is perfect for SEO specialists, content managers, and e-commerce teams managing large keyword universes (500+ terms) who need to organize or categorize keywords against a predefined taxonomy, topic list, or product catalog—particularly valuable when processing keyword research exports that need to be sorted into existing content buckets, when mapping user search queries to site sections for analytics or internal search optimization, or when assigning keywords to specific campaigns or content themes based on substring similarity to seed terms without requiring manual review of every keyword or perfect exact matches.

What’s Included

  • Two-file matching workflow maps keywords from a match list to best-fit terms in a predefined seed list (up to 100 seed terms), enabling automatic categorization into existing taxonomies
  • Partial_ratio algorithm from FuzzyWuzzy detects substring matches and modifier variations, making it effective for matching long-tail keywords to broader seed terms
  • Three-column output CSV shows each match keyword, its best seed match assignment, and similarity score for confidence-based filtering and manual review of edge cases
Academy Resource

Available in Academy

Semantic ML-enabled Keyword Research

This resource is available to academy members.

Access in Academy
Community support
Regular updates