SlideShare a Scribd company logo
Scaling Keyword Research to
Find Content Gaps
Hamlet Batista
#TTTLive
@hamletbatista
Hamlet Batista
Founder/CEO @ RankSense
Hamlet Batista is CEO and founder of
RankSense, an agile SEO platform for
online retailers and manufacturers.
He holds US patents on innovative
SEO technologies, started doing SEO
as a successful affiliate marketer
back in 2002, and believes great SEO
results should not take 6 months.
#TTTLive
@hamletbatista
How Low Can #1 Go?
Moz’s Feb 2020 report finds ten
organic blue links pushed further
down the page.
This is a refresh from a 2013
research study.
https://moz.com/blog/how-low-can-
number-one-go-2020
#TTTLive
@hamletbatista
What is an Organic Listing in 2020?
In his response to the article, Google’s
Danny Sullivan contends organic
listings are no longer just the ten plain
blue links.
Users expectations from Google have
changed over time and Google has
adapted to them.
https://twitter.com/dannysullivan/status/12327
45667119865856
#TTTLive
@hamletbatista
Keyword Research in 2013
Track the keyword rankings
03
● Position tracking
● Share of voice
● SERP Pixel tracking
Build content rich web pages to match
the keywords02
● Content word length
● Social media promotion
● Compelling headlines
Research keywords/topics
01
● Low competition
● Relevant
● High search volume
#TTTLive
@hamletbatista
Keyword Research in 2020
As the ten blue web links get pushed down the SERP, our
research should focus on the features replacing them.
https://moz.com/learn/seo/serp-features
#TTTLive
@hamletbatista
Agenda 1. What are content formats?
2. Mapping content formats to
SERP features
3. Using SERP features to research
content formats gaps
4. Automating the process with
Python
#TTTLive
@hamletbatista
What are content formats?
Content templates:
1. Article
2. Forum post
3. Product page
4. Tool/calculator
5. Directory listing
6. Etc.
Content formats:
1. Video
2. Image
3. List (ordered, unordered)
4. Table
5. Answers
6. Reviews
7. Etc.
#TTTLive
@hamletbatista
#TTTLive
@hamletbatista
How to detect content formats in
web pages?
We can find missed content format opportunities using structured
data:
1. If there is relevant content and no structured data, there is
opportunity to add it
2. If there is structured data and no relevant content, there is
opportunity to add the content
#TTTLive
@hamletbatista
#TTTLive
@hamletbatista
Mapping SEMrush SERP features to
content formats
#TTTLive
@hamletbatista
Checking for EmbedURL
JSONPath to detect Video
#TTTLive
@hamletbatista
Checking for EmbedURL
JSONPath to detect Video
#TTTLive
@hamletbatista
Let’s automate this!
Here is our technical plan:
1. Extract keywords (and pages) with high impressions and no clicks
2. Extract SERP features for those keywords
3. Use our Feature->Format (JSONPaths) map to identify content
format expected
4. Check if page includes format
5. Report content formats missing
#TTTLive
@hamletbatista
Extracting underperforming
keywords and pages from
Google Search Console
#TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Using code from TTT webinar
https://trafficthinktank.com/cours
es/automation-for-seo/
1. !git clone
https://github.com/hamletbatista/google-
searchconsole
2. !pip3 install google-searchconsole/
#TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Configure Search Console API
https://developers.google.co
m/webmaster-tools
1. Activate Search Console API in Compute Engine
https://console.cloud.google.com/apis/api/webmasters.
googleapis.com/overview?project=&folder=&organizati
onId=
2. Create New Credentials / Help me choose (Search
Console API, Other UI, User data)
https://console.cloud.google.com/apis/credentials/wizar
d?api=iamcredentials.googleapis.com&project=
3. Download client_id.json
#TTTLive
@hamletbatista
Extract keywords with high
impressionsand no clicks
Upload client_id.json
from google.colab import files
files.upload()
# run once
import searchconsole
account =
searchconsole.authenticate(client_config="client_id.json",
serialize='credentials.json', from_colab=True)
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get keywords and pages
webproperty = account['https://www.domain.com/']
#Last 7 days of GSC data
query = webproperty.query.range(start='today', days=-7).dimension('page', 'query')#.limit(100)
r = query.get()
import pandas as pd
df = pd.DataFrame(r.rows)
df.head()
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get keywords and pages
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Filter by high impressions and no clicks
high_potential = df.query("clicks == 0.0 & impressions > 10 & position < 20")
high_potential
#TTTLive
@hamletbatista
Extracting SERP features from
SEMrush
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Using code from SEMrush webinar
https://www.semrush.com/blog/weekly
-wisdom-hamlet-batista-python-
javascript-marketers/
1. Extracting data from SEMRush
2. You can find SEMrush API reference here
https://www.semrush.com/api-analytics/
3. You can find your API key here
https://www.semrush.com/api-use/
4. Fk > All SERP Features triggered by a keyword. List of
available SERP Features
5. Ph > Keyword bringing users to the website via Google's
top 20 organic search results.
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get SERP features Gist
https://gist.github.com/hamletb
atista/ed5e810b56acf0f8490e29
050caa4351
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get SERP features names
(from indices) Gist
https://gist.github.com/hamletb
atista/74730874b7e0540cd51d3
ab749f18ffd
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get SERP features names by keywords from SEMrush
df["SERP Feature by Keyword Names"] = df["SERP Features by Keyword"].apply(lambda x: ",".join(get_feature_names(x)) )
#TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Let’s merge SEMrush features with our Google Search Console data!
We merge on query and Keyword columns.
new_df = pd.merge(high_potential, df, how="right", left_on="query", right_on="Keyword")
#TTTLive
@hamletbatista
Checking if pages include expected
content formats
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Using third party libraries:
requests, extract and jsonpah-
ng
1. Extract all structured data
from the page
2. Map expected formats to
JSONPaths
1. !pip install extruct==0.7.3
2. !pip install rdflib==4.2.2
I needed to revert the latest
version due to an error.
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Extract structured data
import extruct
import requests
import pprint
from w3lib.html import get_base_url
pp = pprint.PrettyPrinter(indent=2)
r = requests.get('https://www.cnn.com/videos/health/2020/04/25/elmo-sesame-street-people-wearing-masks-gupta-sot-town-hall-
vpx.cnn')
base_url = get_base_url(r.text, r.url)
data = extruct.extract(r.text, base_url=base_url)
pp.pprint(data)
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Extract structured data
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
JSONPath selectors
1. $..acceptedAnswer
2. $..address
3. $..review
4. $..embedUrl
5. $..employmentType
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Does the page include our content
formats?
https://gist.github.com/hamletbatista
/f77d6cd6343b240f6451116a5a7c08b6
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Does the page include our content
formats?
This function uses the content formats and expected SERP
features to calculate the opportunity gaps.
https://gist.github.com/hamletbatista/157e7cad373113e976
4e280f106bdac5
We consider an opportunity if there a SERP feature
requested (for example, a video carousel), and there is no
corresponding content format in the page (no video in
the structured data).
We count opportunities as 1.
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
#TTTLive
@hamletbatista
Report missing
content formats
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Yellow colored spots
represent opportunity
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Visualize our content gap matrix
import plotly.graph_objects as go
columns = ["image", "video", "local_business",
"review", "top_story", "faq", "job"]
data=go.Heatmap(z=gap_df[columns], x=columns, y=gap_df.url)
#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Visualize our content gaps as a binary heatmap
fig = go.Figure(data)
fig.update_xaxes(side=”top")
#TTTLive
@hamletbatista
Resources to learn more
#TTTLive
@hamletbatista
Resources to learn more
Python Introduction for SEOs
https://www.searchenginejournal.com/introduc
tion-to-python-seo-spreadsheets/342779/
Search-driven Content Strategy
https://www.slideshare.net/stephaniebeadell/s
earchdriven-content-strategy-mozcon-2018-
105014924
Query Syntax
http://www.blindfiveyearold.com/query-syntax
SEO Automation Course
https://trafficthinktank.com/courses/automatio
n-for-seo/
#TTTLive
@hamletbatista
About RankSense
#TTTLive
@hamletbatista
About RankSense
Automate tedious SEO tasks in Google Sheets.
Import the sheets and deploy them as
experiments to Cloudflare.
Learn which changes effective.
https://www.ranksense.com

More Related Content

What's hot (19)

PPT
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
patrickstox
 
PDF
So you think you know canonical tags - Sean Butcher Brighton SEO presentation
Sean Butcher
 
PDF
Debugging rendering problems at scale
Giacomo Zecchini
 
PDF
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
Distilled
 
PPTX
Split Testing for SEO - 9 Months of Learning
Dominic Woodman
 
PDF
A Deep Dive Into SEO Tactics For Modern Javascript Frameworks
Hamlet Batista
 
PDF
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Onely
 
PDF
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
Oliver Brett
 
PPTX
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
Catalyst
 
PPTX
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
Distilled
 
PPTX
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
Keith Goode
 
PPTX
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
Charly Wargnier
 
PDF
The State of the Web: Pagination and Infinite Scroll
Adam Gent
 
PDF
Crafting Expertise, Authority and Trust with Entity-Based Content Strategy - ...
Jamie Indigo
 
PPT
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
patrickstox
 
PPTX
#CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.
Mel Sciorra
 
PPTX
Combatting Crawl Bloat & Pruning Your Content Effectively
Charlie Whitworth
 
PDF
BrightonSEO - How to use XPath with eCommerce Websites
Janet Plumpton
 
PPTX
Technical SEO "Overoptimization"
Hamlet Batista
 
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
patrickstox
 
So you think you know canonical tags - Sean Butcher Brighton SEO presentation
Sean Butcher
 
Debugging rendering problems at scale
Giacomo Zecchini
 
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
Distilled
 
Split Testing for SEO - 9 Months of Learning
Dominic Woodman
 
A Deep Dive Into SEO Tactics For Modern Javascript Frameworks
Hamlet Batista
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Onely
 
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
Oliver Brett
 
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
Catalyst
 
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
Distilled
 
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
Keith Goode
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
Charly Wargnier
 
The State of the Web: Pagination and Infinite Scroll
Adam Gent
 
Crafting Expertise, Authority and Trust with Entity-Based Content Strategy - ...
Jamie Indigo
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
patrickstox
 
#CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.
Mel Sciorra
 
Combatting Crawl Bloat & Pruning Your Content Effectively
Charlie Whitworth
 
BrightonSEO - How to use XPath with eCommerce Websites
Janet Plumpton
 
Technical SEO "Overoptimization"
Hamlet Batista
 

Similar to Scaling Keyword Research to Find Content Gaps (20)

PDF
TechSEO Boost 2018: Python for SEOs
Catalyst
 
PDF
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze D...
Amsive
 
PDF
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
Izzi Smith
 
PPTX
Google Webmaster Tools
Jade Webster
 
PPT
Diagnosing Technical Issues With Search Engine Optimization
Nine By Blue
 
PDF
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
SEO in 2018: Rand Fishkin #UtahDMC 2018
Utah Digital Marketing Collective
 
PDF
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
Distilled
 
PPT
Chewy Trewella - Google Searchtips
sounddelivery
 
PDF
Top 8 SEO Trends to Watch Out in 2019
SEOsmann Consulting
 
PDF
You Don't Know SEO
Michael King
 
PDF
Modern SEO Players Guide
Michael King
 
PDF
SEO - A Beginning to End Look at SEO in a World of AI and Voice Search, Steve...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Search engine optimization
ABDEL RAHMAN KARIM
 
PPTX
Analytics in the Age of Mobile - 2017 MnSearch Summit
MnSearch, The Minnesota Search Engine Marketing Association
 
PDF
Search Marketing's Evolution: 2018 and Beyond
We Are Marketing
 
PDF
SEO for the CEO - What C-level Executives Need to Know About Search
Theo Lynn
 
PDF
Seo Dcu 7th March
clarecurtin
 
PDF
This pdf is about search engine optimize
IElevate
 
PDF
digital marketing in search engine optimize
harshitakhurana93
 
TechSEO Boost 2018: Python for SEOs
Catalyst
 
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze D...
Amsive
 
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
Izzi Smith
 
Google Webmaster Tools
Jade Webster
 
Diagnosing Technical Issues With Search Engine Optimization
Nine By Blue
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
SEO in 2018: Rand Fishkin #UtahDMC 2018
Utah Digital Marketing Collective
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
Distilled
 
Chewy Trewella - Google Searchtips
sounddelivery
 
Top 8 SEO Trends to Watch Out in 2019
SEOsmann Consulting
 
You Don't Know SEO
Michael King
 
Modern SEO Players Guide
Michael King
 
SEO - A Beginning to End Look at SEO in a World of AI and Voice Search, Steve...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Search engine optimization
ABDEL RAHMAN KARIM
 
Analytics in the Age of Mobile - 2017 MnSearch Summit
MnSearch, The Minnesota Search Engine Marketing Association
 
Search Marketing's Evolution: 2018 and Beyond
We Are Marketing
 
SEO for the CEO - What C-level Executives Need to Know About Search
Theo Lynn
 
Seo Dcu 7th March
clarecurtin
 
This pdf is about search engine optimize
IElevate
 
digital marketing in search engine optimize
harshitakhurana93
 
Ad

More from Hamlet Batista (13)

PDF
Automated Duplicate Content Consolidation with Google Cloud Functions
Hamlet Batista
 
PDF
SEO Meets Automation
Hamlet Batista
 
PDF
Automating Google Lighthouse
Hamlet Batista
 
PDF
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Hamlet Batista
 
PPTX
Agile SEO: Faster SEO Results
Hamlet Batista
 
PPTX
Python for Data-driven Storytelling
Hamlet Batista
 
PPTX
Scaling automated quality text generation for enterprise sites
Hamlet Batista
 
PPTX
Data and Evidence-driven SEO
Hamlet Batista
 
PPTX
Advanced Data-Driven SEO
Hamlet Batista
 
PPTX
Why Pay for Performance When You Can Lead the World To Your Door for Free?
Hamlet Batista
 
PPTX
Gettin' It Up And Keepin' It Up in Google
Hamlet Batista
 
PPTX
Batista, Hamlet, Beyond The Usual Link Building
Hamlet Batista
 
PPT
White Hat Cloaking
Hamlet Batista
 
Automated Duplicate Content Consolidation with Google Cloud Functions
Hamlet Batista
 
SEO Meets Automation
Hamlet Batista
 
Automating Google Lighthouse
Hamlet Batista
 
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Hamlet Batista
 
Agile SEO: Faster SEO Results
Hamlet Batista
 
Python for Data-driven Storytelling
Hamlet Batista
 
Scaling automated quality text generation for enterprise sites
Hamlet Batista
 
Data and Evidence-driven SEO
Hamlet Batista
 
Advanced Data-Driven SEO
Hamlet Batista
 
Why Pay for Performance When You Can Lead the World To Your Door for Free?
Hamlet Batista
 
Gettin' It Up And Keepin' It Up in Google
Hamlet Batista
 
Batista, Hamlet, Beyond The Usual Link Building
Hamlet Batista
 
White Hat Cloaking
Hamlet Batista
 
Ad

Recently uploaded (20)

PPTX
Piper 2025 Financial Year Shareholder Presentation
Piper Industries
 
PDF
The Rise of Penfolds Wine_ From Australian Vineyards to Global Fame.pdf
Enterprise world
 
PPTX
The Ultimate Guide to Customer Journey Mapping
RUPAL AGARWAL
 
PDF
SMLE slides.pdf pediatric medical history
hananmahjoob18
 
PDF
Followers to Fees - Social media for Speakers
Corey Perlman, Social Media Speaker and Consultant
 
PPTX
Appreciations - July 25.pptxsdsdsddddddsssss
anushavnayak
 
PDF
Agentic AI: The Autonomous Upgrade Your AI Stack Didn’t Know It Needed
Amnic
 
PDF
How BrainManager.io Boosts Productivity.
brainmanagerious
 
PPTX
Integrative Negotiation: Expanding the Pie
badranomar1990
 
PDF
Using Innovative Solar Manufacturing to Drive India's Renewable Energy Revolu...
Insolation Energy
 
PPTX
PUBLIC RELATIONS N6 slides (4).pptx poin
chernae08
 
PDF
Gregory Felber - A Dedicated Researcher
Gregory Felber
 
PDF
🚀 Mohit Bansal_ Driving Urban Evolution Through GMI Infra (1).pdf
Mohit Bansal GMI
 
PPTX
Andrew C. Belton, MBA Experience Portfolio July 2025
Andrew C. Belton
 
DOCX
Andrew C. Belton, MBA Resume - July 2025
Andrew C. Belton
 
PDF
Driving the Energy Transition India’s Top Renewable Energy Solution Providers...
Essar Group
 
PPTX
Social Media Marketing for Business Growth
vidhi622006
 
PDF
Retinal Disorder Treatment Market 2030: The Impact of Advanced Diagnostics an...
Kumar Satyam
 
PPTX
The Rise of Artificial Intelligence pptx
divyamarya13
 
PDF
A Study on Analysing the Financial Performance of AU Small Finance and Ujjiva...
AI Publications
 
Piper 2025 Financial Year Shareholder Presentation
Piper Industries
 
The Rise of Penfolds Wine_ From Australian Vineyards to Global Fame.pdf
Enterprise world
 
The Ultimate Guide to Customer Journey Mapping
RUPAL AGARWAL
 
SMLE slides.pdf pediatric medical history
hananmahjoob18
 
Followers to Fees - Social media for Speakers
Corey Perlman, Social Media Speaker and Consultant
 
Appreciations - July 25.pptxsdsdsddddddsssss
anushavnayak
 
Agentic AI: The Autonomous Upgrade Your AI Stack Didn’t Know It Needed
Amnic
 
How BrainManager.io Boosts Productivity.
brainmanagerious
 
Integrative Negotiation: Expanding the Pie
badranomar1990
 
Using Innovative Solar Manufacturing to Drive India's Renewable Energy Revolu...
Insolation Energy
 
PUBLIC RELATIONS N6 slides (4).pptx poin
chernae08
 
Gregory Felber - A Dedicated Researcher
Gregory Felber
 
🚀 Mohit Bansal_ Driving Urban Evolution Through GMI Infra (1).pdf
Mohit Bansal GMI
 
Andrew C. Belton, MBA Experience Portfolio July 2025
Andrew C. Belton
 
Andrew C. Belton, MBA Resume - July 2025
Andrew C. Belton
 
Driving the Energy Transition India’s Top Renewable Energy Solution Providers...
Essar Group
 
Social Media Marketing for Business Growth
vidhi622006
 
Retinal Disorder Treatment Market 2030: The Impact of Advanced Diagnostics an...
Kumar Satyam
 
The Rise of Artificial Intelligence pptx
divyamarya13
 
A Study on Analysing the Financial Performance of AU Small Finance and Ujjiva...
AI Publications
 

Scaling Keyword Research to Find Content Gaps

  • 1. Scaling Keyword Research to Find Content Gaps Hamlet Batista
  • 2. #TTTLive @hamletbatista Hamlet Batista Founder/CEO @ RankSense Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months.
  • 3. #TTTLive @hamletbatista How Low Can #1 Go? Moz’s Feb 2020 report finds ten organic blue links pushed further down the page. This is a refresh from a 2013 research study. https://moz.com/blog/how-low-can- number-one-go-2020
  • 4. #TTTLive @hamletbatista What is an Organic Listing in 2020? In his response to the article, Google’s Danny Sullivan contends organic listings are no longer just the ten plain blue links. Users expectations from Google have changed over time and Google has adapted to them. https://twitter.com/dannysullivan/status/12327 45667119865856
  • 5. #TTTLive @hamletbatista Keyword Research in 2013 Track the keyword rankings 03 ● Position tracking ● Share of voice ● SERP Pixel tracking Build content rich web pages to match the keywords02 ● Content word length ● Social media promotion ● Compelling headlines Research keywords/topics 01 ● Low competition ● Relevant ● High search volume
  • 6. #TTTLive @hamletbatista Keyword Research in 2020 As the ten blue web links get pushed down the SERP, our research should focus on the features replacing them. https://moz.com/learn/seo/serp-features
  • 7. #TTTLive @hamletbatista Agenda 1. What are content formats? 2. Mapping content formats to SERP features 3. Using SERP features to research content formats gaps 4. Automating the process with Python
  • 8. #TTTLive @hamletbatista What are content formats? Content templates: 1. Article 2. Forum post 3. Product page 4. Tool/calculator 5. Directory listing 6. Etc. Content formats: 1. Video 2. Image 3. List (ordered, unordered) 4. Table 5. Answers 6. Reviews 7. Etc.
  • 10. #TTTLive @hamletbatista How to detect content formats in web pages? We can find missed content format opportunities using structured data: 1. If there is relevant content and no structured data, there is opportunity to add it 2. If there is structured data and no relevant content, there is opportunity to add the content
  • 12. #TTTLive @hamletbatista Mapping SEMrush SERP features to content formats
  • 15. #TTTLive @hamletbatista Let’s automate this! Here is our technical plan: 1. Extract keywords (and pages) with high impressions and no clicks 2. Extract SERP features for those keywords 3. Use our Feature->Format (JSONPaths) map to identify content format expected 4. Check if page includes format 5. Report content formats missing
  • 17. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Using code from TTT webinar https://trafficthinktank.com/cours es/automation-for-seo/ 1. !git clone https://github.com/hamletbatista/google- searchconsole 2. !pip3 install google-searchconsole/
  • 18. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Configure Search Console API https://developers.google.co m/webmaster-tools 1. Activate Search Console API in Compute Engine https://console.cloud.google.com/apis/api/webmasters. googleapis.com/overview?project=&folder=&organizati onId= 2. Create New Credentials / Help me choose (Search Console API, Other UI, User data) https://console.cloud.google.com/apis/credentials/wizar d?api=iamcredentials.googleapis.com&project= 3. Download client_id.json
  • 19. #TTTLive @hamletbatista Extract keywords with high impressionsand no clicks Upload client_id.json from google.colab import files files.upload() # run once import searchconsole account = searchconsole.authenticate(client_config="client_id.json", serialize='credentials.json', from_colab=True)
  • 20. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Get keywords and pages webproperty = account['https://www.domain.com/'] #Last 7 days of GSC data query = webproperty.query.range(start='today', days=-7).dimension('page', 'query')#.limit(100) r = query.get() import pandas as pd df = pd.DataFrame(r.rows) df.head()
  • 21. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Get keywords and pages
  • 22. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Filter by high impressions and no clicks high_potential = df.query("clicks == 0.0 & impressions > 10 & position < 20") high_potential
  • 24. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Using code from SEMrush webinar https://www.semrush.com/blog/weekly -wisdom-hamlet-batista-python- javascript-marketers/ 1. Extracting data from SEMRush 2. You can find SEMrush API reference here https://www.semrush.com/api-analytics/ 3. You can find your API key here https://www.semrush.com/api-use/ 4. Fk > All SERP Features triggered by a keyword. List of available SERP Features 5. Ph > Keyword bringing users to the website via Google's top 20 organic search results.
  • 25. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Get SERP features Gist https://gist.github.com/hamletb atista/ed5e810b56acf0f8490e29 050caa4351
  • 26. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Get SERP features names (from indices) Gist https://gist.github.com/hamletb atista/74730874b7e0540cd51d3 ab749f18ffd
  • 27. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Get SERP features names by keywords from SEMrush df["SERP Feature by Keyword Names"] = df["SERP Features by Keyword"].apply(lambda x: ",".join(get_feature_names(x)) )
  • 28. #TTTLive @hamletbatista Extract keywords with high impressions and no clicks Let’s merge SEMrush features with our Google Search Console data! We merge on query and Keyword columns. new_df = pd.merge(high_potential, df, how="right", left_on="query", right_on="Keyword")
  • 29. #TTTLive @hamletbatista Checking if pages include expected content formats
  • 30. #TTTLive @hamletbatista Checking if pages include expected content formats Using third party libraries: requests, extract and jsonpah- ng 1. Extract all structured data from the page 2. Map expected formats to JSONPaths 1. !pip install extruct==0.7.3 2. !pip install rdflib==4.2.2 I needed to revert the latest version due to an error.
  • 31. #TTTLive @hamletbatista Checking if pages include expected content formats Extract structured data import extruct import requests import pprint from w3lib.html import get_base_url pp = pprint.PrettyPrinter(indent=2) r = requests.get('https://www.cnn.com/videos/health/2020/04/25/elmo-sesame-street-people-wearing-masks-gupta-sot-town-hall- vpx.cnn') base_url = get_base_url(r.text, r.url) data = extruct.extract(r.text, base_url=base_url) pp.pprint(data)
  • 32. #TTTLive @hamletbatista Checking if pages include expected content formats Extract structured data
  • 33. #TTTLive @hamletbatista Checking if pages include expected content formats JSONPath selectors 1. $..acceptedAnswer 2. $..address 3. $..review 4. $..embedUrl 5. $..employmentType
  • 34. #TTTLive @hamletbatista Checking if pages include expected content formats Does the page include our content formats? https://gist.github.com/hamletbatista /f77d6cd6343b240f6451116a5a7c08b6
  • 35. #TTTLive @hamletbatista Checking if pages include expected content formats
  • 36. #TTTLive @hamletbatista Checking if pages include expected content formats Does the page include our content formats? This function uses the content formats and expected SERP features to calculate the opportunity gaps. https://gist.github.com/hamletbatista/157e7cad373113e976 4e280f106bdac5 We consider an opportunity if there a SERP feature requested (for example, a video carousel), and there is no corresponding content format in the page (no video in the structured data). We count opportunities as 1.
  • 37. #TTTLive @hamletbatista Checking if pages include expected content formats
  • 39. #TTTLive @hamletbatista Checking if pages include expected content formats Yellow colored spots represent opportunity
  • 40. #TTTLive @hamletbatista Checking if pages include expected content formats Visualize our content gap matrix import plotly.graph_objects as go columns = ["image", "video", "local_business", "review", "top_story", "faq", "job"] data=go.Heatmap(z=gap_df[columns], x=columns, y=gap_df.url)
  • 41. #TTTLive @hamletbatista Checking if pages include expected content formats Visualize our content gaps as a binary heatmap fig = go.Figure(data) fig.update_xaxes(side=”top")
  • 43. #TTTLive @hamletbatista Resources to learn more Python Introduction for SEOs https://www.searchenginejournal.com/introduc tion-to-python-seo-spreadsheets/342779/ Search-driven Content Strategy https://www.slideshare.net/stephaniebeadell/s earchdriven-content-strategy-mozcon-2018- 105014924 Query Syntax http://www.blindfiveyearold.com/query-syntax SEO Automation Course https://trafficthinktank.com/courses/automatio n-for-seo/
  • 45. #TTTLive @hamletbatista About RankSense Automate tedious SEO tasks in Google Sheets. Import the sheets and deploy them as experiments to Cloudflare. Learn which changes effective. https://www.ranksense.com

Editor's Notes

  • #2: Make sure you use the font, “Poppins” throughout this deck. You’ll have been sent this but can also download for free here: https://fonts.google.com/specimen/Poppins?selection.family=Poppins:100,100i,200,200i,300,300i,400,400i,500,500i,600,600i,700,700i,800,800i,900,900i
  • #3: This is your speaker bio page - we have individually designed images for each of you that we can add to these pages if you like. Make sure you use the font, “Poppins” throughout this deck. You’ll have been sent this but can also download for free here: https://fonts.google.com/specimen/Poppins?selection.family=Poppins:100,100i,200,200i,300,300i,400,400i,500,500i,600,600i,700,700i,800,800i,900,900i
  • #13: https://docs.google.com/spreadsheets/d/1t8ddNjqi6jRo-a0oK1VXrIyAvJIloV5GpZgwiwXFWUc/edit#gid=0
  • #14: https://jsonpath.com/
  • #15: https://jsonpath.com/