Nadav Harari, Head of SEO at VentureKite, created something remarkable. With a little help from a popular AI chatbot (and a healthy dose of Python programming), he built a tool that uses Hugging Face to semantically analyze topics for search engine optimization. It’s a real-world example of machine learning for SEO.
Best of all, he’s sharing his creation with the world. We interviewed Harari about his creation. Our conversation is transcripted below, where he explains what the tool is and how it works.
If you prefer written step-by-step instructions on how to set up and use our SEO tool, we’ve included that below as well.
Our Free AI SEO Tool: Abridged Transcript
Jim Markus: I’m Jim Markus. I’m the website portfolio manager at VentureKite, and I have with me today…
Nadav Harari: I’m Nadav Harari, the head of SEO at VentureKite.
Jim Markus: So, essentially, our audience will be able to view a tool that you created from scratch with the help of ChatGPT and some of your own programming knowledge.
Nadav Harari: Thank you, Jim. Mainly, my focus was to come up with a process that will allow me to create a content gap analysis, not using the keyword level as we are used to using with a bunch of famous or well-known SEO tools like Ahrefs and SEMrush, but in the semantic level.
So, you can get, let’s say, a few titles of a competitor’s website. And you can easily, using a machine learning model that was pre-trained for that purpose to assess whether you already covered the topic that you just tested or not. And also, to provide a similarity score between zero to one — that way you can assess better whether you already covered the topic or not.
This machine learning model was trained, as you can see here, over more than 1 billion training pairs so it can understand context, maybe intent, synonyms, plurals, singulars, other nouns, gender. Everything. This can allow you to actually perform a conduct gap analysis at another level. And this should switch the manual process that many of us use now, which is to type in Google, let’s say, whichever site you’re testing, colon, your site, and then you type some topic. And you can hover over the results to see if this topic actually exists on your website.
Jim Markus: Perfect. Yeah, and as you mentioned, the tools that are kind of industry standard right now use keyword analysis. It’s pretty common for you to be able to say, “My competitor uses this keyword. Do I use this keyword? Do we compete in this keyword?”
What makes this interesting is that as search gets much more complex and advanced, and you know, Google’s algorithms get a lot better at saying, well, you know, this article isn’t just about this keyword, it’s about this topic and it’s about this.
That’s how we need to be thinking, and that’s what you created is a tool to help you evaluate that.
Nadav Harari: Soon, I’m going to show the process and how you can use it for your benefit. I just want to add that those SEO tools compare keywords, if you are ranked for a keyword or not, right?
So, let’s say you published an article yesterday, and Google didn’t even have the chance to crawl your site, or not to mention index it or rank it, right? So the SEO tool will tell you that you are not ranked for that keyword.
But using the process I’m going to show you, you can just add a bunch of URLs, all your URLs from your site map and match them. Match their titles, and their topics, against a competitor’s topic.
Jim Markus: Perfect, so you’re not waiting for ranking, so you don’t need to wait for Google or any third party. You, you know, obviously, your site map updates when you publish, so you’ll have that information immediately available.
Nadav Harari: Yeah, exactly. I can start to show you how it works. So we have mainly two parts. We have this spreadsheet that you can create a copy of your own. And also, we have a Python script that uses the machine learning model that I just demonstrated.
Jim Markus: Fantastic. And to, you know, repeat back for clarity in case anyone’s not familiar. Column F. And again, just after column G to the right as well. These scores are rated on a scale of zero to one, and what Nadav was saying is anything under the 0.45 doesn’t show up, so it’ll be blank for you. It’s a good way to see, hey, there’s a blank spot.
We likely don’t cover the same topic on our site, or at least within the URLs that you’ve uploaded to the topic that’s in your competitor’s title. Anything closer to one, that .94, you know, 0.85. These are much more likely to be very similar to topics that are covered elsewhere.
I mean, this is unbelievable.
Nadav Harari: Yeah, really, really helpful and I want to show you another use case that may interest you. Up until that point, we discussed content gap analysis with competitors, but what if you can create a content gap analysis internally within your own URLs against the same topics, right?
Let’s go to row seven, right? We have the 10 best web development frameworks, the best PHP frameworks for web development, right?
It’s very similar, right? You can use that as a coding cluster. Let’s say this one, best PHP frameworks, top PHP alternatives, right? It’s the same topic. Best certifications, right? It’s already a relatively high score.
PHP interview questions. So we have a content cluster related to PHP, go, let’s see “how to become a data scientist,” “how to learn data science”, right? It’s very similar to “become a data analyst.” Let’s see the score here. 0.7, right?
Jim Markus: Perfect. Like you mentioned, if you’re looking for content clusters, this is a good way to identify those, but you’re also looking for areas you might be cannibalizing your own traffic, which is, I mean, if you’re seeing things in the nineties, that’s probably a good thing to consider.
Awesome. Any final thoughts on this before we wrap up this call?
Nadav Harari: Yeah, I can share some more examples. But, I mean, everyone should try this. Just make a copy of the spreadsheet. Create an account in Google Colab, copy the script, and paste it there. Create your own credentials using Google Cloud the way I showed you. Just run it on your website or for a few competitors.
Just, you know, test the waters, see how it works for you.
Jim Markus: Great. And you also have the app script as well for the sheet. Is that connected to the sheet? When they make their copy. Great. Yeah. Okay. So we’ll also put this into an article or a blog post that you can find, with detailed instructions, in case you weren’t taking avid notes during the call.
But I think this is a wonderful introduction, and you created this again with, you asked ChatGPT for ideas and then you kind of built it around that. Is that right?
Nadav Harari: I started with ChatGPT 4 to get some ideas, but the ideas I got weren’t really good. So I started digging in Google to find other solutions. And then I encountered the hugging face machine learning transformer, which is like, really, that’s the job.
Jim Markus: Yeah, seems like the heart of what you created. Perfect. Thank you again for showing off everything here, and I will connect you with all the readers, where they can find detailed instructions on how to use this for themselves.
So thank you so much for making time to chat.
Nadav Harari: Thank you, Jim.
How to Use Our Free SEO Tool: Step by Step
Here are step-by-step instructions for how to set up and use our SEO tool. For a more complete explanation, be sure to watch our video, which provides a visual demonstration of installations and setup.
What is it?
My Semantic content gap analysis process is based on Hugging Face’s pre-trained machine learning model that was trained on a large and diverse dataset of over 1 billion sentence pairs. Using this ML technology enables you to compare each competitor article title against ALL article titles on your website to retrieve the top 50 most semantically similar titles on your site (with a score of 0 to 1) in descending order, for any given competitor title.
Screenshot showing the model in action on Hugging Face’s website.
The model understands that “table tennis table” is a synonym for “ping pong table” and “ping pong tables” and thus gives those terms the highest scores.
Why use it:
Currently, if you want to check if you already cover competitor’s topics on your website you need to search Google using [site:yourdomain.com “ a topic”] operator and browse through the results manually. Alternatively, you may use the content gap feature in Ahrefs/Semrush.
Why the methods above are not good enough:
- Using the site: operator may not show URLs of articles that were published recently and were not able to rank.
- Using Google’s operator [site:] is manual work that is not suitable for checking 100s of your competitor’s topics.
- The content gap feature in Ahrefs/SEMrush finds the keywords your competitors rank for, but you don’t. This disregards the topic as a whole. I.e. The SEO tool may flag that the keyword “Best table tennis paddle” is missing even though you already published an article about “Best ping pong paddle”.
Prerequisites and Example of the Final Result
Prerequisites:
- Make a copy of my Google sheet.
- Create an account in Google Colab, and copy-paste the python script I generated with ChatGPT.
- Create a project in Google Cloud and enable the Google sheets API so the Python script can read and write to your Google sheet.
Column A contains competitor’s titles while column B and and C (and others columns to the right) contain the most similar titles on our site in descending order.
Example: Cells B2 and C2 in the screenshot above show the most similar titles on golfspan.com against cell A2, a competitor title. Cells B3 and C3 show the most similar titles against cell A3, and so on.
Setting up the Google Sheet and How to Use
Sheet1 tab:
- Place URLs of a competitor in column A and ALL your website’s URLs in column F. (You copy and paste them from your XML sitemap)
- Run the AppsScript I generated with ChatGPT to retrieve the status codes and titles of any URL to columns B and C (competitor) and columns G and F (your website). I added features to retrieve the title even if the URL is 3XX redirecting. In case the URL returns 404, the code transforms the URL slug into a title.
- Column D and column I have a Google sheets formula that cleans the title from any brand name after the vertical bar (|), ugly HTML entities like " (“) & (&) and you can even choose to remove a specific brand name by replacing the part “Add yourdomain.com here” in the formula with the term you want to remove.
Why? Because we want to compare raw topics (titles) without any influence such as brand name or other entities. - So now that we have a list of clean titles in column D and in column I in Sheet1, they will automatically be populated, respectively, in Column A and column B in Sheet2.
Sheet2 tab:
This is where my ChatGPT generated Python script will be used to compare each title in column A (competitor) against ALL titles in column B (your titles) and retrieve the results to column D and to all the columns to the right (up to 50 similar titles on your site).
Column A contains competitor’s titles and column B contains ALL titles on our site. After running the Python script Sheet2 tab will be populated with competitors titles in column D and all titles from your site in column E and to the right for each competitor title, from the most similar (highest score) to the least similar.
Setting up Credentials to use Google Sheets API in Google Cloud
In order to run the Python script that reads and writes to our Google Sheet, you need to create a project, enable Google sheets API, and generate an API key in the form of a JSON file.
To use the Google Sheets API with a Python script, you need to follow these steps:
- Create a project in the Google Cloud Platform Console:
a. Go to the Google Cloud Platform Console.
b. Click on the project drop-down and select “New Project” from the top right corner.
c. In the “New Project” window, enter a project name, select an organization (if applicable), and a billing account, then click on “Create”.
- Enable the Google Sheets API:
a. Once the project is created, click on the hamburger menu (three horizontal lines) in the top left corner and select “APIs & Services” > “Dashboard”.
b. Click on “+ ENABLE APIS AND SERVICES” at the top.
c. In the API Library, search for “Google Sheets API” and select it.
d. Click on the “Enable” button.
- Create service account credentials:
a. In the Google Sheets API page, click on “Create credentials”.
b. In the “Add credentials to your project” page, select “Google Sheets API” for “Which API are you using?” and “Other non-UI (e.g. cron job, daemon)” for “Where will you be calling the API from?”. Choose “Application data” for “What data will you be accessing?” and click “Next”.
c. Select “Service account” and click “Next”.
d. Enter a name for the service account, choose a role (e.g., “Editor” to have read and write access), and click “Done”.
- Generate a JSON key file:
a. In the “APIs & Services” > “Credentials” page, you will see the newly created service account. Click on the pencil icon to edit it.
b. In the “Service Account details” page, click on “Add Key” and select “JSON”.
c. A JSON key file will be generated and downloaded to your computer. Keep this file safe, as it contains sensitive information.
- Share the Google Sheet with the service account:
a. Open the JSON key file using a text editor and find the “client_email” field.
b. Share the Google Sheet with the email address in the “client_email” field, granting the desired permissions (e.g., “Editor” to allow read and write access).
- Use the JSON key file in your Python script
Python Script That Reads & Writes to Your Google Sheet
Click here to access the script.
I generated this script with ChatGPT. I went back and forth to fix the code and also improved it to suit my needs. In order to run the script you would need:
1. To copy-paste your own sheet ID into the variable sheet_id = ‘YOUR_SHEET_ID’
2. Upload the JSON file you generated in the previous section, when you are being asked while running the script.
The code:
- Enables you to upload the JSON file with your Google sheets API credentials
- It then matches each competitor title against ALL the titles on your site and retrieves competitor’s titles to column D and up to 50 titles with a score above 0.45 (I don’t need titles with lower scores as they are not similar) to column E and to the right in Sheet2.
- I also integrated a time delay time.sleep(1) function that puts a delay of 1 sec to prevent Google sheets API blocks.
- If the title is identical to a given title (similarity score = 1) then the title will not be retrieved. Why? In case you want to perform the content gap analysis on your own site, to find content clusters and consolidation opportunities you don’t need to get identical titles retrieved.
Wrapping Up
We hope this free AI-powered SEO tool brings value to you and helps with your website’s content gap analysis.
If you have any questions regarding this tool and how to use it, you can reach out directly to Nadav Harari, the creator: nadav@venturekite.com
Zoë is the senior editor for Productivity Spot. She is also Head of Content for VentureKite. Her goal is to ensure all content published on Productivity Spot is clear, understandable, and informative for our loyal readers.
-
Zoe Biehlhttps://productivityspot.com/author/zoebiehl/
-
Zoe Biehlhttps://productivityspot.com/author/zoebiehl/
-
Zoe Biehlhttps://productivityspot.com/author/zoebiehl/
-
Zoe Biehlhttps://productivityspot.com/author/zoebiehl/