Coding Script or Notebook
Google Colab
Free (Access via Email)

Automated Redirect Mapping with Triple Fuzzy Matching for Site Migrations (Notebook)

Site migrations often require mapping hundreds or thousands of old URLs to new equivalents—this specialized Google Colab notebook automates redirect mapping by comparing pre-migration and post-migration Screaming Frog crawls using three different fuzzy matching algorithms (PolyFuzz with TF-IDF, RapidFuzz, FuzzyWuzzy) to find the best new destination for each old URL. Created by Lazarina Stoy for MLforSEO, this workflow goes beyond simple URL slug matching by comparing multiple content signals—page titles, H1 tags, H2 tags, and meta descriptions—across both datasets to identify matches based on semantic similarity rather than URL structure alone. The triple-algorithm approach provides confidence validation: when all three methods agree on a match, you can implement redirects with high certainty; when results diverge, the URLs require manual review.
The notebook implements a comprehensive multi-signal matching pipeline specifically designed for Screaming Frog exports. You upload two CSV files (pre-migration and post-migration crawls) containing six critical columns: Address (URL), Title 1, H1-1, H2-1, H2-2, and Meta Description 1. For each column, the algorithm runs three independent fuzzy matching processes. PolyFuzz uses TF-IDF vectorization and cosine similarity (good for capturing document-level semantic similarity), RapidFuzz uses Levenshtein ratio (fast and good for character-level differences like URL slug variations), and FuzzyWuzzy provides a third opinion using token set ratio (handles word order differences well). Each match receives a similarity score (0-100) and gets classified into one of three categories: “Exact Match” (100% identical—likely already implemented or very confident redirects), “Partial Match” (50-99% similar—strong redirect candidates requiring minimal review), or “No Suitable Match Found” (<50%—requires manual investigation or content creation). The output is three separate CSV files (one per algorithm) showing Pre-Migration value, Post-Migration match, similarity score, match type, and the column that generated the match—enabling side-by-side comparison and prioritization. Use this for: ‧ Site migration redirect planning by automatically identifying the best new URL for each old URL based on content similarity across multiple signals ‧ Reducing manual redirect mapping time from days to hours by algorithmically matching hundreds or thousands of URLs instead of manual review ‧ Confidence scoring for redirect decisions—matches where all three algorithms agree with high scores (80%+) can be implemented automatically ‧ Identifying problem cases where algorithms disagree or all score low (<50%), flagging these URLs for manual review or new content creation ‧ Multi-signal matching that considers URL, title, headings, and meta descriptions rather than just URL slug comparison—finding matches even when site structure completely changes ‧ Quality control by comparing three different algorithms' results to catch edge cases or false positives that a single method might miss ‧ Post-migration validation by running the same process on post-migration crawls to verify redirects are implemented correctly This is perfect for SEO professionals and migration specialists managing website migrations, rebrands, or CMS changes where URL structures change significantly—particularly valuable for large-scale migrations (500+ URLs) where manual redirect mapping is impractical and URL structures differ too much for simple pattern matching, but content similarity can reveal the correct redirect targets through intelligent fuzzy matching across multiple on-page elements.

What’s Included

  • Triple-algorithm validation using PolyFuzz (TF-IDF/cosine similarity), RapidFuzz (Levenshtein ratio), and FuzzyWuzzy (token set ratio) for comprehensive match confidence
  • Multi-signal matching across six critical elements (URL, title, H1, H2s, meta description) rather than URL-only comparison—finding matches even when URL structure completely changes
  • Three-tier match classification (Exact Match 100%, Partial Match 50-99%, No Suitable Match <50%) enables prioritized implementation: automating high-confidence redirects while flagging edge cases
  • Screaming Frog-ready workflow specifically designed for standard SEO crawl exports with automatic column detection and validation

Get Instant Access

Enter your email and we’ll send you the download link immediately.

No spam, ever
Instant delivery