How to do Text Classification with Google’s Natural Language API in Google Sheets (Apps Script)

text classification Google NLP API tutorial mlforseo

Overwhelmed by mountains of text data? In the world of digital marketing, especially in organic search and SEO, we are often faced with an overwhelming amount of information. Imagine implementing a classification Machine Learning algorithm, that can automatically classify documents (or other texts) in seconds!

With Google’s Natural Language API and Google Sheets, you can classify content in over 1,300+ categories in seconds. In the following guide, I will show you how, even without any coding experience, you can unlock the power of the text classification module of this API.

About the method: How does text classification work?

Text classification in machine learning deals with sorting text into predefined categories, or classes. The aim of the task is to automatically learn patterns in labeled examples (like emails marked spam or not spam) and then categorize new unseen text based on those patterns.

It is a supervised machine learning project, meaning that the model’s task is to predict the correct label of a given input data, following the training of the model. During training, the models is provided with a dataset pre-labeled examples, which enables it to become able to then classify new, unseen text into the categories.

The underlying technology (or algorithm used) is often either traditional model like Naive Bayes, Decision trees, or Support Vector Machines, or a deep learning neural network. Often word embeddings are also used to uncover semantic relationships of the words.

Text classification models help organize information, filter content, and gain insights from large amounts of text data.

About the model: Google Cloud’s Natural Language API

Imagine a vast library filled with knowledge about language structure, grammar structure, sentiment, and real world entities. That’s essentially what the Natural Language API draws from. It’s trained on massive amounts of text data, allowing it to:

  • Identify key syntax elements and structures: It recognises nouns, verbs, phrases, and relationships between words, just like we do when we read, and can create more complex parsing structures like syntax trees.
  • Understand context and entities: It goes beyond individual words, considering the surrounding text and even real-world knowledge to grasp the meaning of text via entity analysis.
  • Detect sentiment: It can sense emotions like joy, anger, or sadness expressed in the text, not only at a document level (the entire text) but also at the entity level (sentiment associated with a specific entity mentioned in the text).
  • Classify text: It has pre-training applied to identify whether the text you analyze aligns with either of more than 1,300+ categories

How it works

Google Cloud’s Natural Language API’s Content Classification module analyses a document and returns a content category that applies to the text found in the document. To classify the content in a document, call the classifyText method.

A complete list of content categories returned for the classifyText method are found in Google Cloud’s documentation files, depending on the model’s version: Version 1 content categories and Version 2 content categories.

Which version to choose

The V2 model is a newer model with better performance that supports both V1 and V2 categories, making it more comprehensive, which is why I recommend using it.

The model essentially requires you to provide a text for it to analyze, and classify into one of the categories on its list. For each classification the model provides, there is an associated confidence score, which reflects how confident the model is that the text aligns to the category indicated.

Additional resources

Check out the additional resources by Google Cloud to practice working with this API, and the text classification module specifically:

Step-by-step guide on using the Natural Language API Text Classification Module in Google Sheets via Apps Script

Prerequisites

Get your API key

Having selected your Google Cloud project, navigate to the APIs and Services menu > Credentials.

Screenshot 2024 02 25 at 11.51.54

Then, click on the Create Credentials button from the navigation next to the page title, then select API Key from the drop-down menu.

image 3

This is the easiest to use, but least secure method of authentication – you might consider alternatives for more complex projects.

What is the difference between API, OAuth client ID and Service account authentication?

In short, API key authentication is like a Public key for basic access (like a library card), OAuth client ID allows for more user-specific access requiring authorization (like a bank card with PIN), while Service account authentication is the most secure access for applications without users (like a company credit card).

Once you click on the Create API key button, there will be a pop-up menu that will indicate that the API key is being created, after which it will appear on the screen for you to copy.

image 4

You can always navigate back to this section of your project, and reveal the API key at a later stage, using the Show Key button. If you ever need to edit or delete the API key, you can do so from the drop-down menu.

image 5

Extract and organise the text content you want to classify

The next step is to decide on and organise the content you want to classify into Google Sheets.

What content can you classify with the Natural Language API?

Broadly speaking, this API model works best on documents (or otherwise long-form content), which can be both first-party textual data such as documents, emails, articles, social media posts, and so on, or HTML content, or shortly – content of web pages by providing the source HTML. You can use the type field to indicate the document type, otherwise – it will be automatically selected.

For a no-code content scraping approach, I recommend using Screaming Frog’s custom extraction function. The approach works in three simple steps:

  • Finding the content selector: Navigate to the page/website section, from which you want to scrape content, and identify the selector that contains the content form the HTML
Screenshot 2024 02 25 at 14.12.46
  • Configuring the crawl: From the Copy menu, copy the setting you will use (e.g. Selector, X-path, etc.) and paste that into the custom extraction module in Screaming Frog, before starting your crawl.
Screenshot 2024 02 25 at 14.13.19
  • Content Extraction: Run the crawl as usual, and find the data in the specified column, as per your extraction settings. Export your data in your desired format, e.g. Google Sheets, csv, or other.
Screenshot 2024 02 25 at 14.14.57

With this approach, you can quickly get a dataset of scraped content from web pages, or the HTML, depending on the extraction method you select.

What language should the content be in for the Natural Language API to work?

The API automatically detects the language of the content, unless one is specifically provided in the source code. There are tens of languages, supported by the Natural Language API (see Language Support). Unsupported languages will return an error in the JSON response.

You can also scrape content via alternative methods, using Python or third-party tools.

Once, you have your content extracted, you can move on to the next step.

Make a copy of the Google Sheets Template and paste your content and API key

To prepare the data for analysis, we need to do two things – organize the content for analysis, and paste the API key in the script.

Paste your API key

In Google Sheets, open the Extensions menu, and click on Apps Script.

Open the classify.gs script attached, and select the text that says enterAPIkey. Replace it with your Google Cloud API project key. Then click on the disk icon to Save, and return to the Google Sheet file.

image 7

Paste and prepare your content for analysis

Paste your content for analysis in the Working Sheet, keeping the URL and content.

image 8

You can add any columns that you want to this file or perform any data cleaning or organisation operations you wish.

Make sure to the columns Classification Label and Confidence, where we will be pasting the results from the analysis.

Run the analysis to classify the provided text

To run the analysis, enter in the Classification Label column the formula below, replacing “text” with the cell, where the content you want to classify is.

=analyzeTextClassification(text)
image 9

Press enter, and drag and drop for the remaining rows.

drag and drop

You can now review the output of how Google Cloud Natural Language API has classified your content, including the classification label and confidence score.

image 10

Tip: Paste as values to avoid making unnecessary calls to the API

To avoid the formula automatically re-running the call to the API, before you start editing the file for better organisation, or for visualisation purposes, ensure to copy the output, and paste it with the Paste values only setting – this will ensure the values are kept and any formulas from the two columns are deleted.

Screenshot 2024 02 25 at 15.05.10

Visualise the text classification data (optional)

Although this step is optional, it is highly recommended that you visualize this data. For this purpose, I’ve created a handy Text Classification Looker Studio Dashboard Template, which allows you to:

  • view all of your content classification labels, organised per primary, secondary, tertiary, and so on
  • View classification labels at a glance with a summary of entries
  • Deep dive into the structure of classification labels and understand the make-up of different primary and secondary labels
  • View and filter with advanced filters (including Regex) individual page URLs, content, classification label
  • Filter out classification labels with low confidence scores
text classification Looker studio dashboard

Tip: Use the template to blend the classification data with GA4/GSC

I highly recommend you don’t stop here, but instead blend this template with data from your GA4 or Google Search Console, to extract performance insights for specific content categories. You can also blend the classification data with data from Screaming Frog crawls to compare and contract the classification labels by the NLP API with your own tagging and categorisation structures on your website as a means to enhance or enrich your content categories.

Why use Google’s Natural Language API text classification in Google Sheets

Here are just some of the benefits of using the text classification module of Google’s Natural Language API in Google Sheets

  • Quickly understand top content areas: Effortlessly organise, analyse, and gain insights from your text data using Google’s Natural Language API and Google Sheets.
  • No coding required: The pre-built template and step-by-step guide make it easy for anyone to get started, regardless of technical expertise.
  • Quickly merge insights with pre-existing tags or categories: Improve or replace your existing tags or site categories by comparing and contrasting them with the model’s output.
  • Boost efficiency by harnessing automation: The process allows you to automate repetitive tasks, uncover hidden patterns, and make data-driven decisions with ease.

Learn how to implement the generated insights into your Organic Search strategy

Getting the data is one thing, learning how to analyse it and use it as part of Organic Search strategy is another. See the follow-up resources to learn how to harness this data to improve your strategy:

See what else you can do with to this API

As mentioned at the start, the Natural Language API has several additional capabilities that include entity extraction, entity sentiment analysis, document sentiment analysis, and syntax analysis. Explore other step-by-step guides on this topic by visiting the resources, linked below:

Author

  • photo blur

    Lazarina Stoy is a Digital Marketing Consultant with expertise in SEO, Machine Learning, and Data Science, and the founder of MLforSEO. Lazarina’s expertise lies in integrating marketing and technology to improve organic visibility strategies and implement process automation. A University of Strathclyde alumna, her work spans across sectors like B2B, SaaS, and big tech, with notable projects for AWS, Extreme Networks, neo4j, Skyscanner, and other enterprises. Lazarina champions marketing automation, by creating resources for SEO professionals and speaking at industry events globally on the significance of automation and machine learning in digital marketing. Her contributions to the field are recognized in publications like Search Engine Land, Wix, and Moz, to name a few. As a mentor on GrowthMentor and a guest lecturer at the University of Strathclyde, Lazarina dedicates her efforts to education and empowerment within the industry.

    View all posts

Share this post on social media:


Leave a Reply

Your email address will not be published. Required fields are marked *