Coding Script or Notebook
Google Colab
Academy (Access via Course)

Advanced Entity Analysis with Multiple ML Techniques for Semantic Keyword Research

Entity extraction is just the starting point—this comprehensive Google Colab notebook (Part II of Lazarina Stoy’s Semantic Keyword Research course on MLforSEO) demonstrates how to analyze extracted entities using multiple machine learning techniques to uncover semantic patterns, relationships, and clustering in keyword data. Rather than treating entities as isolated mentions, this advanced workflow enriches entity data with search metrics, queries Google’s Knowledge Graph for additional context, applies K-Means clustering to group semantically related entities, and builds network graphs showing entity relationships—turning raw entity lists into actionable semantic intelligence for content strategy and SEO.
The notebook provides production-ready code for five distinct analytical approaches. First, it includes utility functions to merge entity data with keyword search volume metrics and extract n-gram patterns showing how entities commonly appear in queries. Second, it demonstrates Google Knowledge Graph API integration with both simplified and comprehensive data extraction options, adding structured knowledge about entities (descriptions, types, Wikipedia links, images) to your analysis—useful for identifying which entities are well-established versus emerging. Third, it implements K-Means clustering with TF-IDF vectorization to automatically group semantically similar entities (the example shows 20 clusters like “crowdfunding platforms,” “business loans,” “invoice financing”), complete with PCA-based 2D visualization showing cluster boundaries. Fourth, it builds semantic relationship graphs using cosine similarity thresholds (configurable, default 0.3) to identify which entities co-occur or share semantic context, visualized with NetworkX where node sizes reflect mention frequency—the example extracts 3,542 relationships showing how entities like “crowdfunding,” “equity investment,” and “business loan” interconnect within the keyword corpus.
Use this for:
‧ Discovering semantic entity clusters in your keyword research to identify content themes and topical groupings automatically
‧ Enriching entity data with structured information from Google’s Knowledge Graph (descriptions, entity types, Wikipedia links) for better context
‧ Building entity relationship networks that reveal which concepts are semantically connected in your keyword universe for topic modeling
‧ Calculating total search volume and keyword counts per entity to prioritize which entities drive the most search traffic
‧ Identifying n-gram patterns showing how entities typically appear in queries (e.g., “crowdfunding platform,” “crowdfunding sites,” “crowdfunding equity”)
‧ Creating data-driven content strategies based on entity clustering rather than manual keyword grouping for more sophisticated semantic SEO
This is perfect for advanced SEO strategists and content marketers working on semantic keyword research who need to move beyond simple keyword lists to understand the underlying entity relationships, topical clusters, and semantic structure within their keyword corpus—particularly valuable for large keyword sets (1000+ keywords) where manual analysis is impractical.

What’s Included

  • Five integrated ML techniques for entity analysis: search metrics mapping, Knowledge Graph enrichment, K-Means clustering, n-gram extraction, and relationship network graphs
  • Complete clustering workflow with TF-IDF vectorization, K-Means algorithm, and PCA visualization showing 2D projections of semantic entity clusters with configurable cluster counts
  • Semantic relationship graph builder using cosine similarity and NetworkX that maps entity co-occurrence patterns with node sizing based on mention frequency—real example shows 3,542 extracted relationships
Academy Resource

Available in Academy

Introduction to Machine Learning for SEOs

This resource is available to academy members.

Access in Academy
Community support
Regular updates