SlideShare a Scribd company logo
The New Content SEO
FLOQ - Amanda King
Sydney SEO Conference
14 April 2023
The New Content
SEO
What we’ll talk about
1. A quick refresher
2. Have keywords ever actually
been a thing Google used?
3. How Google reads content
may not be what you think
4. So what do we do about all
this?
5. Who tf am I?
The New Content SEO - Sydney SEO Conference 2023
A quick refresher
A brief refresher on how Google crawls the Internet
It’s three separate stages: crawl,
index, serve; with sub-processes
for scoring and ranking.
Content analysis is included in the
indexing engine, content relevancy
is in the serving engine.
While this is an old patent (2011) the
fundamentals still apply for this
reminder.
Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023
https://developers.google.com/search/docs/fundamentals/how-search-works
● Query Deserves Freshness is a system
● Helpful Content is a system
● MUM & BERT are systems
○ “Bidirectional Encoder Representations from
Transformers (BERT) is an AI system Google uses
that allows us to understand how combinations of
words express different meanings and intent.”
The search engine ranking engine works
in systems
https://developers.google.com/search/docs/appearance/ranking-systems-guide
Have keywords ever actually been
a thing Google used?
While Google is a
machine, it’s moved
fundamentally beyond
keywords…and has since
at least 2015.
Why hasn’t SEO?
Queries very quickly
become entities
“[...]identifying queries in query data;
determining, in each of the queries,
(i) an entity-descriptive portion that
refers to an entity and (ii) a suffix;
determining a count of a number of
times the one or more queries were
submitted“
- patent granted in 2015, submitted in
2012
Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
Google acknowledges query-only based
matching is pretty terrible.
“Direct “Boolean” matching of query terms has well known limitations,
and in particular does not identify documents that do not have the query
terms, but have related words [...]The problem here is that conventional
systems index documents based on individual terms, rather than on
concepts. Concepts are often expressed in phrases [...] Accordingly,
there is a need for an information retrieval system and methodology that
can comprehensively identify phrases in a large scale corpus, index
documents according to phrases, search and rank documents in
accordance with their phrases, and provide additional clustering and
descriptive information about the documents. [...]”
- Information retrieval system for archiving multiple document
versions, granted 2017 (link)
So it decided to make it’s search engine
concept and phrase-based.
“The system is adapted to identify phrases that have
sufficiently frequent and/or distinguished usage in the
document collection to indicate that they are “valid” or “good”
phrases [...]The system is further adapted to identify phrases
that are related to each other, based on a phrase's ability to
predict the presence of other phrases in a document.”
- Information retrieval system for archiving multiple
document versions, granted 2017 (link)
“Rather than simply
searching for content that
matches individual words,
BERT comprehends how a
combination of words
expresses a complex idea.”
Source: https://blog.google/products/search/how-ai-powers-great-search-results/
MUM takes this a step further
● About 1,000 times more powerful than BERT
● Trained across 75 languages for greater context
● Recognises this across different types of media (video,
text, etc)
https://blog.google/products/search/introducing-mum/
How Google reads content may
not be what you think
Step 1
Indexing
Indexing is the stage where content
is analysed, so how does Google
do it?
BERT is a technique for
pre-training natural
language classification. So
how does natural language
processing work, once it
has a corpus of data?
Source: https://blog.google/products/search/search-language-understanding-bert/
Is there anything in this process that even looks like “keywords”?
1. Parsing: Tokenisation, parts of speech, stemming
(for Google, lemmatization)
2. Topic Modelling: entity detection, relation detection
3. Understanding
4. Onto the next engine, ranking
So the broad strokes steps in the
indexation process are
● Semantic distance
● Keyword-seed affinity
● Category-seed affinity
● Category-seed affinity to
threshold
Parsing is intrinsically
categorisation
https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
How natural language processing usually works: tokenization and subwords
Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
● N-grams: important to find the
primary concepts of the
sentence by identifying and
excluding stop words
● “Running” “runs” “ran” = same
base — “run”
This gets broken down even
further
https://patents.google.com/patent/US8423350B1/
Google does a lot of things when detecting
entities and relationships
● Identifying aspects to define entities based on popularity
and diversity, granted in 2011 (link)
● Finding the entity associated with a query before returning
a result, using input from human quality raters to confirm
objective fact associated with an entity, granted in 2015
(link)
● Understanding the context of the query, entity and related
answer you’re searching for, granted in 2019 (link)
● Aims to understand user generated content signals in
relation to a webpage, granted in 2022 (link)
Google does a lot of things when detecting
entities and relationships
● Understanding the best way to present an entity in a
results page, granted in 2016 (link)
● Managing and identifying disambiguation in entities,
granted in 2016 (link)
● Build entities through co-occurring ”methodology based
on phrases” and store lower information gain
documents in a secondary index, granted in 2020 (link)
● Understanding context from previous query results and
behaviour, granted in 2016 (link)
Step 2
Scoring
In their own description of their
ranking & scoring engine, Google
offers 5 buckets:
● Meaning
● Relevance
● Quality
● Usability
● Context
Scoring is all those 200+ factors we talk
about…
Google has cited everything from internal links, external links, pogo sticking, “user
behaviour”, proximity of the query terms to each other, context, attributes, and more
Just a few of the patents related to scoring:
● Evaluating quality based on neighbor features (link)
● Entity confidence (link)
● Search operation adjustment and re-scoring (link)
● Evaluating website properties by partitioning user feedback (link)
● Providing result-based query suggestions (link)
● Multi-process scoring (link)
● Block spam blog posts with “low link-based score” (link)
It actually looks like
they have a
classification engine
for entities as well
This patent was filed in 2010,
granted in 2014. Likely a basis
for the Knowledge Graph.
(US8838587B1)
https://patents.google.com/patent/US8838587B1/en
“...link structure may be
unavailable, unreliable, or
limited in scope, thus,
limiting the value of using
PageRank in ascertaining
the relative quality of some
documents.” (circa 2005)
https://patents.google.com/patent/US7962462B1/en
There’s more than one document scoring function, which are weighted, and has been since the beginning
How Google ranks content
● Based on historical behaviour from similar searches in
aggregate (application)
● Based on external links (link)
● Based on your own previous searches (link)
● Based on or not it should directly provide the answer via
Knowledge Graph (link)
● Phrase- and entity-based co-occurrence threshold
scores (link)
● Understanding intent based on contextual information
(link)
Helpful Content Update & Information
Gain Score (granted Jun 2022)
● The information gain score might be personal to you
and the results you’ve already seen
● Featured snippets may be different from one search to
another based on the information gain score of your
second search
● Pre-training a ML model on a first set of data shown to
users in aggregate, getting an information gain score,
and using that to generate new results in SERPs.
https://patents.google.com/patent/US20200349181A1/en
What is “information gain”?
“Information gain, as the ratio of actual co-occurrence rate to
expected co-occurrence rate, is one such prediction
measure. Two phrases are related where the prediction
measure exceeds a predetermined threshold. In that case,
the second phrase has significant information gain with
respect to the first phrase.“
- Phrase-based searching in an information retrieval
system, granted 2009 (link)
So, basically, it’s
quantifying to what
degree you talk about all
the topics Google sees as
related to your main
subject.
If information gain is such a
strong concept in which
results Google chooses
which content to show, why
do so few folks talk about it?
https://patents.google.com/patent/US7962462B1/en
So what do we do about all this?
When is the last time
you’ve done a full
content inventory?
What I mean when I say content inventory
https://www.portent.com/onetrick
Redo keyword research and overlay
entities
● Pull content for at least the top 10 search results
ranking for your target keyword
● Dump them into Diffbot (https://demo.nl.diffbot.com/) or
the Natural Language AI demo
(https://cloud.google.com/natural-language)
● Note the entities and salience
● Run your target page
● Understand the differences
● Update your content accordingly
Start with keyword research, find co-
occuring terms
● Pull content for at least the top 10 search results
ranking for your target keyword
● Look at TF-IDF calculators to reverse engineer the topic
correlation (Ryte has a paid one)
● Note the terms included
● Run your target page
● Understand the differences
● Update your content accordingly
Break old content habits
● FAQ on product pages
● Consolidate super-granularly targeted blog articles
● Think outside of the blog folder — the semantic
relationship can carry through to the directory order of
the website as well
● Internal linking can be a secret weapon
● Fit content to purpose: not everything needs a 3,000
word in-depth article
Measure what really
matters to the business
— traffic and revenue
from organic.
Who tf am I?
Amanda King is a human
● Over a decade in the
SEO industry
● Traveled to 40+
countries
● Business- and
product-focussed
● Knows CRO, Data,
UX
● Always open to
learning something
new
● Slightly obsessed
with tea
Thank you
Amanda King
t. @amandaecking
i. @floq.co / @amandaecking
w. floq.co

More Related Content

PPTX
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
PDF
BrightonSEO April 2023 Similar AI: Automation recipes for SEO success
PPTX
Slawski New Approaches for Structured Data:Evolution of Question Answering
PPTX
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
PPTX
Internal Linking - The Topic Clustering Way edited.pptx
PDF
Brighton SEO 2023 - ML Lessons For Total Search.pdf
PPTX
BrightonSEO Summer 2021 - The Underrated Value of Internal Links
PDF
40 Deep #SEO Insights for 2023
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
BrightonSEO April 2023 Similar AI: Automation recipes for SEO success
Slawski New Approaches for Structured Data:Evolution of Question Answering
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
Internal Linking - The Topic Clustering Way edited.pptx
Brighton SEO 2023 - ML Lessons For Total Search.pdf
BrightonSEO Summer 2021 - The Underrated Value of Internal Links
40 Deep #SEO Insights for 2023

What's hot (20)

PDF
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
PPTX
The Big SEO Migration - Learnings from a first time hiker
PPTX
William slawski-google-patents- how-do-they-influence-search
PPTX
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
PPTX
Semantic search Bill Slawski DEEP SEA Con
PPTX
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
PPTX
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
PDF
TECHNICAL SEO QA - SHINING A LIGHT ON INVISIBLE WORK (BrightonSEO April 2022)
PDF
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
PPTX
Content writers: will AI take your job?
PDF
Data Driven Approach to Scale SEO at BrightonSEO 2023
PPTX
How to get your SEO work prioritised in house - Maddie McCartney.pptx
PDF
How to rethink the traditional SEO workspace to promote team wellbeing and pr...
PDF
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
PPTX
Don't be a cannibal
PPTX
Entity seo
PDF
eCommerce Internal Linking - Into the Spider-Verse (BrightonSEO edition)
PPTX
Influencing Discovery, Indexing Strategies For Complex Websites
PPTX
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
PPTX
Crawl Budget: Everything you Need to Know
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
The Big SEO Migration - Learnings from a first time hiker
William slawski-google-patents- how-do-they-influence-search
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Semantic search Bill Slawski DEEP SEA Con
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
TECHNICAL SEO QA - SHINING A LIGHT ON INVISIBLE WORK (BrightonSEO April 2022)
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
Content writers: will AI take your job?
Data Driven Approach to Scale SEO at BrightonSEO 2023
How to get your SEO work prioritised in house - Maddie McCartney.pptx
How to rethink the traditional SEO workspace to promote team wellbeing and pr...
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
Don't be a cannibal
Entity seo
eCommerce Internal Linking - Into the Spider-Verse (BrightonSEO edition)
Influencing Discovery, Indexing Strategies For Complex Websites
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
Crawl Budget: Everything you Need to Know
Ad

Similar to The New Content SEO - Sydney SEO Conference 2023 (20)

PDF
You Don't Know SEO
PDF
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
PDF
Search Solutions 2011: Successful Enterprise Search By Design
PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
PPT
Google
PPTX
How Google is Reading and Indexing Content in 2016
PDF
Google Search Appliance Version 2.0 Webinar - May 2012
PDF
Not Your Mom's SEO
PDF
Sweeny ux-seo om-cap 2014_v3
PPTX
Google indexing
PDF
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
PDF
Design the Search Experience
PDF
Quality not quantity
PPTX
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
PDF
Pratical Deep Dive into the Semantic Web - #smconnect
PDF
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
PDF
Search V Next Final
PDF
A SURVEY ON SEARCH ENGINES
PDF
A Survey On Search Engines
PPT
Taxonomies And Search Aiim Mn
You Don't Know SEO
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Search Solutions 2011: Successful Enterprise Search By Design
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Google
How Google is Reading and Indexing Content in 2016
Google Search Appliance Version 2.0 Webinar - May 2012
Not Your Mom's SEO
Sweeny ux-seo om-cap 2014_v3
Google indexing
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
Design the Search Experience
Quality not quantity
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Pratical Deep Dive into the Semantic Web - #smconnect
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
Search V Next Final
A SURVEY ON SEARCH ENGINES
A Survey On Search Engines
Taxonomies And Search Aiim Mn
Ad

Recently uploaded (20)

PPTX
QR Codes Qr codecodecodecodecocodedecodecode
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PPTX
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PPTX
nagasai stick diagrams in very large scale integratiom.pptx
PPTX
innovation process that make everything different.pptx
PPTX
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
PPTX
CSharp_Syntax_Basics.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PPTX
ENCOR_Chapter_11 - ‌BGP implementation.pptx
PPTX
Introduction to Information and Communication Technology
PPTX
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
www-codemechsolutions-com-whatwedo-cloud-application-migration-services.pdf
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
ENCOR_Chapter_10 - OSPFv3 Attribution.pptx
QR Codes Qr codecodecodecodecocodedecodecode
introduction about ICD -10 & ICD-11 ppt.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
Decoding a Decade: 10 Years of Applied CTI Discipline
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
An introduction to the IFRS (ISSB) Stndards.pdf
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
nagasai stick diagrams in very large scale integratiom.pptx
innovation process that make everything different.pptx
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
CSharp_Syntax_Basics.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ENCOR_Chapter_11 - ‌BGP implementation.pptx
Introduction to Information and Communication Technology
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
The Internet -By the Numbers, Sri Lanka Edition
www-codemechsolutions-com-whatwedo-cloud-application-migration-services.pdf
Triggering QUIC, presented by Geoff Huston at IETF 123
international classification of diseases ICD-10 review PPT.pptx
ENCOR_Chapter_10 - OSPFv3 Attribution.pptx

The New Content SEO - Sydney SEO Conference 2023

  • 1. The New Content SEO FLOQ - Amanda King Sydney SEO Conference 14 April 2023
  • 2. The New Content SEO What we’ll talk about 1. A quick refresher 2. Have keywords ever actually been a thing Google used? 3. How Google reads content may not be what you think 4. So what do we do about all this? 5. Who tf am I?
  • 5. A brief refresher on how Google crawls the Internet It’s three separate stages: crawl, index, serve; with sub-processes for scoring and ranking. Content analysis is included in the indexing engine, content relevancy is in the serving engine. While this is an old patent (2011) the fundamentals still apply for this reminder. Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023 https://developers.google.com/search/docs/fundamentals/how-search-works
  • 6. ● Query Deserves Freshness is a system ● Helpful Content is a system ● MUM & BERT are systems ○ “Bidirectional Encoder Representations from Transformers (BERT) is an AI system Google uses that allows us to understand how combinations of words express different meanings and intent.” The search engine ranking engine works in systems https://developers.google.com/search/docs/appearance/ranking-systems-guide
  • 7. Have keywords ever actually been a thing Google used?
  • 8. While Google is a machine, it’s moved fundamentally beyond keywords…and has since at least 2015.
  • 10. Queries very quickly become entities “[...]identifying queries in query data; determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a suffix; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
  • 11. Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean” matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 12. So it decided to make it’s search engine concept and phrase-based. “The system is adapted to identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are “valid” or “good” phrases [...]The system is further adapted to identify phrases that are related to each other, based on a phrase's ability to predict the presence of other phrases in a document.” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 13. “Rather than simply searching for content that matches individual words, BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/
  • 14. MUM takes this a step further ● About 1,000 times more powerful than BERT ● Trained across 75 languages for greater context ● Recognises this across different types of media (video, text, etc) https://blog.google/products/search/introducing-mum/
  • 15. How Google reads content may not be what you think
  • 16. Step 1 Indexing Indexing is the stage where content is analysed, so how does Google do it?
  • 17. BERT is a technique for pre-training natural language classification. So how does natural language processing work, once it has a corpus of data? Source: https://blog.google/products/search/search-language-understanding-bert/
  • 18. Is there anything in this process that even looks like “keywords”?
  • 19. 1. Parsing: Tokenisation, parts of speech, stemming (for Google, lemmatization) 2. Topic Modelling: entity detection, relation detection 3. Understanding 4. Onto the next engine, ranking So the broad strokes steps in the indexation process are
  • 20. ● Semantic distance ● Keyword-seed affinity ● Category-seed affinity ● Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
  • 21. How natural language processing usually works: tokenization and subwords Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
  • 22. ● N-grams: important to find the primary concepts of the sentence by identifying and excluding stop words ● “Running” “runs” “ran” = same base — “run” This gets broken down even further https://patents.google.com/patent/US8423350B1/
  • 23. Google does a lot of things when detecting entities and relationships ● Identifying aspects to define entities based on popularity and diversity, granted in 2011 (link) ● Finding the entity associated with a query before returning a result, using input from human quality raters to confirm objective fact associated with an entity, granted in 2015 (link) ● Understanding the context of the query, entity and related answer you’re searching for, granted in 2019 (link) ● Aims to understand user generated content signals in relation to a webpage, granted in 2022 (link)
  • 24. Google does a lot of things when detecting entities and relationships ● Understanding the best way to present an entity in a results page, granted in 2016 (link) ● Managing and identifying disambiguation in entities, granted in 2016 (link) ● Build entities through co-occurring ”methodology based on phrases” and store lower information gain documents in a secondary index, granted in 2020 (link) ● Understanding context from previous query results and behaviour, granted in 2016 (link)
  • 25. Step 2 Scoring In their own description of their ranking & scoring engine, Google offers 5 buckets: ● Meaning ● Relevance ● Quality ● Usability ● Context
  • 26. Scoring is all those 200+ factors we talk about… Google has cited everything from internal links, external links, pogo sticking, “user behaviour”, proximity of the query terms to each other, context, attributes, and more Just a few of the patents related to scoring: ● Evaluating quality based on neighbor features (link) ● Entity confidence (link) ● Search operation adjustment and re-scoring (link) ● Evaluating website properties by partitioning user feedback (link) ● Providing result-based query suggestions (link) ● Multi-process scoring (link) ● Block spam blog posts with “low link-based score” (link)
  • 27. It actually looks like they have a classification engine for entities as well This patent was filed in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en
  • 28. “...link structure may be unavailable, unreliable, or limited in scope, thus, limiting the value of using PageRank in ascertaining the relative quality of some documents.” (circa 2005) https://patents.google.com/patent/US7962462B1/en
  • 29. There’s more than one document scoring function, which are weighted, and has been since the beginning
  • 30. How Google ranks content ● Based on historical behaviour from similar searches in aggregate (application) ● Based on external links (link) ● Based on your own previous searches (link) ● Based on or not it should directly provide the answer via Knowledge Graph (link) ● Phrase- and entity-based co-occurrence threshold scores (link) ● Understanding intent based on contextual information (link)
  • 31. Helpful Content Update & Information Gain Score (granted Jun 2022) ● The information gain score might be personal to you and the results you’ve already seen ● Featured snippets may be different from one search to another based on the information gain score of your second search ● Pre-training a ML model on a first set of data shown to users in aggregate, getting an information gain score, and using that to generate new results in SERPs. https://patents.google.com/patent/US20200349181A1/en
  • 32. What is “information gain”? “Information gain, as the ratio of actual co-occurrence rate to expected co-occurrence rate, is one such prediction measure. Two phrases are related where the prediction measure exceeds a predetermined threshold. In that case, the second phrase has significant information gain with respect to the first phrase.“ - Phrase-based searching in an information retrieval system, granted 2009 (link)
  • 33. So, basically, it’s quantifying to what degree you talk about all the topics Google sees as related to your main subject.
  • 34. If information gain is such a strong concept in which results Google chooses which content to show, why do so few folks talk about it? https://patents.google.com/patent/US7962462B1/en
  • 35. So what do we do about all this?
  • 36. When is the last time you’ve done a full content inventory?
  • 37. What I mean when I say content inventory https://www.portent.com/onetrick
  • 38. Redo keyword research and overlay entities ● Pull content for at least the top 10 search results ranking for your target keyword ● Dump them into Diffbot (https://demo.nl.diffbot.com/) or the Natural Language AI demo (https://cloud.google.com/natural-language) ● Note the entities and salience ● Run your target page ● Understand the differences ● Update your content accordingly
  • 39. Start with keyword research, find co- occuring terms ● Pull content for at least the top 10 search results ranking for your target keyword ● Look at TF-IDF calculators to reverse engineer the topic correlation (Ryte has a paid one) ● Note the terms included ● Run your target page ● Understand the differences ● Update your content accordingly
  • 40. Break old content habits ● FAQ on product pages ● Consolidate super-granularly targeted blog articles ● Think outside of the blog folder — the semantic relationship can carry through to the directory order of the website as well ● Internal linking can be a secret weapon ● Fit content to purpose: not everything needs a 3,000 word in-depth article
  • 41. Measure what really matters to the business — traffic and revenue from organic.
  • 42. Who tf am I?
  • 43. Amanda King is a human ● Over a decade in the SEO industry ● Traveled to 40+ countries ● Business- and product-focussed ● Knows CRO, Data, UX ● Always open to learning something new ● Slightly obsessed with tea
  • 44. Thank you Amanda King t. @amandaecking i. @floq.co / @amandaecking w. floq.co

Editor's Notes

  • #5: This is a lot of information and I don’t have all the answers - there’s a lot of patents and patent diving I’ve done, so if things get dry, I apologise. You can do a shot for every time I say “system” or “entity”.
  • #7: https://status.search.google.com/ Crawling, indexing, ranking, serving
  • #11: I may
  • #12: Google is vector based: If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x.
  • #13: Original applied in 2005, granted in 2010: https://patents.google.com/patent/US7702618B1/en (Google really started to become popular in 2000) Discussing how they would build their knowledge graph, essentially Indexing system: 1) identification of phrases and related phrases, 2) indexing of documents with respect to phrases 3) generation and maintenance of a phrase-based taxonomy. co-occurrence matrix for the good phrases is maintained
  • #14: If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x. third stage of the indexing operation is to prune the good phrase list using a predictive measure derived from the co-occurrence matrix Unlike existing systems which use predetermined or hand selected phrases, the good phrase list reflects phrases that actual are being used in the corpus. Further, since the above process of crawling and indexing is repeated periodically as new documents are added to the document collection, the indexing system automatically detects new phrases as they enter the lexicon The next step is to determine which related phrases together form a cluster of related phrases. A cluster is a set of related phrases in which each phrase has high information gain with respect to at least one other phrase. In one embodiment, clusters are identified as follows. “ First, rather than a strictly—and often arbitrarily—defined hierarchy of topics and concepts, this approach recognizes that topics, as indicated by related phrases, form a complex graph of relationships, where some phrases are related to many other phrases, and some phrases have a more limited scope, and where the relationships can be mutual (each phrase predicts the other phrase) or one-directional (one phrase predicts the other, but not vice versa). The result is that clusters can be characterized “local” to each good phrase, and some clusters will then overlap by having one or more common related phrases.” “The indexing of documents by phrases and use of the clustering information provides yet another advantage of the indexing system, which is the ability to determine the topics that a document is about based on the related phrase information.”
  • #16: There’s also Palm, calm and lamda (one google engineer even claimed lamda was sentient)
  • #18: This is where content analysis is included
  • #19: BERT comes in during the topic modelling phase, it’s not the entirety of the indexation process. Define corpus - the documents on the internet they can crawl
  • #20: Remember natural language processing is not unique to Google. There are entire fields dedicated to it, it’s an entire branch of AI and computational linguistics.
  • #22: The semantic distance between words can be estimated as the number of vertices that connect the two words.
  • #23: Tokenisation is essentially converting a sentence into “tokens” to turn an unstructured string into elements that can be understood by machine learning. BERT has found shortcuts in the system of tokenisation through predictive modelling, matching and skipping, allowing the process to be about 5x faster than previous models to tokenise text.
  • #25: Popularity score - search history frequency, click through rate, dwell time; diversity score is based on how similar the unranked document is to already ranked documents.
  • #32: Based on historical behaviour from similar searches in aggregate (application) “The system may also comprise a profile database that stores profiles associated with specific remote devices for use by the results ranker in ordering the categories. In addition, the system may comprise a relevance filter that stores data about other search queries received from other remote devices, the data including distributions of previously determined correlations between the other search queries and one or more different categories of information.” Image 8 Based on your own previous searches (link) How quickly you went from choosing one result to another Whether or not you go back to the same source multiple times over time Whether you choose a particular result more than the general population Your declared demographics Your declared location (link) If you’ve made a bunch of the same types of searches (weather in britain, weather in spain), “sibling scores” (link) Whether or not it should directly provide the answer via Knowledge Graph (link) Whether or not it should have a zero result with a quick fact (link) Whether or not text or another presentation of information makes sense (link) Whether or not to return a “card”, like for movies showing at a particular theatre (link)
  • #35: Raising the threshold over 1.0 serves to reduce the possibility that two otherwise unrelated phrases co-occur more than randomly predicted
  • #36: Don’t have the answer for you there, I just like posing rhetorical questions.
  • #40: This process is manual, but hopefully before the end of the financial year I’ll have a more automated process you can steal What is entity salience? entity salience refers to the prominence of an entity within the content. Entity research and entity salience tell you what people who are ranking are talking about; co-occuring terms tell you what google is expecting folks to talk about — sometimes there’s a gap.
  • #41: Google uses TF-IDF to assign terms to an entity, amongst many other things. https://patents.google.com/patent/US8589399B1/en So why don’t we use TF-IDF to reverse engineer that? This isn’t about keyword density
  • #42: Adding FAQ (ongoing) leading indicators strong with product pages with 83% more traffic YoY than overall product category in organic (-1.7% v -10% YoY) Blog consolidation: redirected about 60% of blog content - maintained traffic parity with overall organic traffic to the website: win for the business (less overhead) Thinking outside the blog folder: Optus — 24% uplift in conversion when content was a part of the user journey