SlideShare a Scribd company logo
How to automate all
your SEO projects
@VincentTerrasi
OVH
Planning
• Each Day :
• Advanced Reporting
• Anomalies Detection
• Log Analysis
• Webperf with SiteSpeed.io
• Each Week :
• Ranking monitoring
• Opportunities Detection
• Hot Topic Detection
• Each Quarter :
• Semantic Analysis
Time is precious
Automate
everything
1. RStudio Server
2. Shiny Server
3. Jupyter Notebook
4. Dataiku
5. OpenSource
searchConsoleR
Docker
ATinternetR oncrawlR
Rstudio
Server
Shiny Server Dataiku
DataLake
Scheduled
Email
Notebook DataAPIShiny Apps DataViz
Reports
1. RStudio Server
Automate all your SEO projects
Why R ?
Scriptable
Big Community
Mac / PC / Unix
Open Source
Free
 10 000 packages
Rgui
WheRe ? How ?
Rstudio
https://www.cran.r-project.org
1
2
3
4
RStudio Server
OVH – Instance Cloud
• Docker on Ubuntu 16.04 Server
• From the docker window, run:
• sudo docker run -d -p 8787:8787 rocker/rstudio
• e.g. http://yourIP:8787, and you should be greeted by the RStudio
welcome screen.
Log in using:
• username: rstudio
• password: rstudio
RStudio Server - Install
• install.packages("httr")
• install.packages("RCurl")
• install.packages("stringr")
• install.packages("stringi")
• install.packages("openssl")
• install.packages("Rmpi")
• install.packages("doMpi")
R – Scraper – Packages
R – Scraper – RCurl
seocrawler <- function( url ) {
useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko)
Version/6.0 Mobile/10A5376e Safari/8536.25“
h <- basicTextGatherer()
html <- getURL(url
,followlocation = TRUE
,ssl.verifypeer = FALSE
,httpheader = c('User-Agent' = useragent)
,headerfunction = h$update
)
return(html)
}
R – Scraper – Header
ind0 <- grep("HTTP/",h$value(NULL))
df$StatusCode <- tail(h$value(NULL)[ind0],1)
ind1 <- grep("^Content-Type",h$value(NULL))
df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1))
ind2 <- grep("Last-Modified",h$value(NULL))
df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1))
ind3 <- grep("Content-Language",h$value(NULL))
df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1))
ind4 <- grep("Location",h$value(NULL))
df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
R – Scraper – Xpath
doc <- htmlParse(html, asText=TRUE,encoding="UTF-8")
• H1 <- head(xpathSApply(doc, "//h1", xmlValue),1)
• H2 <- head(xpathSApply(doc, "//h2", xmlValue),1)
• robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1)
• canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1)
• DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
How to automate all your SEO projects
How-to go parallel in R
R – Scraper – OpenMpi
• MPI : Message Passing Interface is a specification for an API for passing
messages between different computers.
• Programming with MPI
• Difficult because of Rmpi package defines about 110 R functions
• Needs a parallel programming system to do the actual work in parallel
• The doMPI package acts as an adaptor to the Rmpi package, which in
turn is an R interface to an implementation of MPI
• Very easy to install Open MPI, and Rmpi on Debian / Ubuntu
• You can test with one computer
R – Scraper – Install OpenMPI
sudo yum install openmpi openmpi-devel openmpi-libs
sudo ldconfig /usr/lib64/openmpi/lib/
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“
install.packages("Rmpi",
configure.args =
c("--with-Rmpi-include=/usr/include/openmpi-x86_64/",
"--with-Rmpi-libpath=/usr/lib64/openmpi/lib/",
"--with-Rmpi-type=OPENMPI"))
install.packages (“doMPI“)
R – Scraper – Test doMpi
library(doMPI)
#start your cluster
cl <- startMPIcluster(count=20)
registerDoMPI(cl)
#
max <- dim(mydataset)[1]
x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i)
#close your cluster
closeCluster(cl)
• Venn Matrix :
http://blog.mrbioinfo.com/
R – Semantic Analysis – Intro
R – Semantic Analysis – Data
How to automate all your SEO projects
R – Semantic Analysis – eVenn
evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
R – Semantic Analysis – Filter
fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt"
#read csv
DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE)
#find
DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
R – Semantic Analysis – nGram
library(text2vec)
it <- itoken( DF_PotentialKeywords[['Keywords']],
preprocess_function = tolower,
tokenizer = word_tokenizer,
progessbar = F )
# 2 and 3 grams
vocab <- create_vocabulary(it, ngram = c(2L, 3L))
DF_SEO_vocab <- data.frame(vocab$vocab)
DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms,
freq=DF_SEO_vocab$terms_counts) %>%
arrange(-freq) %>%
top_n(30)
How to automate all your SEO projects
• Dplyr
• Readxl
• SearchConsoleR
• googleAuthR
• googleAnalyticsR
R – Packages SEO
Thanks to Mark Edmondson
R – SearchConsoleR
library(googleAuthR)
library(searchConsoleR)
# get your password on google console api
options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com")
options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX")
## change this to the website you want to download data for. Include http
website <- "https://data-seo.fr"
## data is in search console reliably 3 days ago, so we donwnload from then
## today - 3 days
start <- Sys.Date() - 3
## one days data, but change it as needed
end <- Sys.Date() - 3
R – SearchConsoleR
## what to download, choose between data, query, page, device, country
download_dimensions <- c('date','query')
## what type of Google search, choose between 'web', 'video' or 'image'
type <- c('web')
## Authorize script with Search Console.
## First time you will need to login to Google but should auto-refresh after that so can be put in
## Authorize script with an account that has access to website.
googleAuthR::gar_auth()
## first time stop here and wait for authorisation
## get the search analytics data
data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions =
download_dimensions, searchType = type)
How to automate all your SEO projects
• Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax)
• MIN Minute field 0 to 59
• HOUR Hour field 0 to 23
• DOM Day of Month 1-31
• MON Month field 1-12
• DOW Day Of Week 0-6
• CMD Command Any command to be executed.
• $ crontab –e
• Run the R script filePath.R at 23:15 for every day of the year :
15 23 * * * Rscript filePath.R
R – CronTab – Method 1
• R Package : https://github.com/bnosac/cronR
R – Cron – Method 2
library(cronR)
cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20',
days_of_week = c(1, 2))
cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20')
cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20',
days_of_week = c(0, 3, 5))
OR
Automated Reports
2. Shiny Server
Creating webapps with R
Shiny Server - Why
Shiny Server – Where and How
• ShinyApps.io
• A local server
• Hosted on your server
• docker run --rm -p 3838:3838
-v /srv/shinyapps/:/srv/shiny-server/
-v /srv/shinylog/:/var/log/
rocker/shiny
• If you have an app in /srv/shinyapps/appdir, you can run the app
by visiting http://yourIP:3838/appdir/.
Shiny Server - Install
Shiny – ui.R
fluidPage(
titlePanel("Compute your internal pagerank"),
sidebarLayout(
sidebarPanel(
a("data-seo.com", href="https://data-seo.com"),
tags$hr(),
p('Step 1 : Export your outlinks data from ScreamingFrog'),
fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)',
accept = c('text/csv'), multiple = FALSE
),
tags$hr(),
downloadButton('downloadData', 'Download CSV')
),
mainPanel(
h3(textOutput("caption")),
tags$hr(),
tableOutput('contents')
)
)
)
Shiny – server.R
function(input, output, session) {
....
output$contents <- renderTable({
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
logsSummary
}
})
output$downloadData <- downloadHandler(
filename = "extract.csv",
content = function(file) {
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
write.csv2(logsSummary,file, row.names = FALSE)
}
}
)
}
https://mark.shinyapps.io/GA-dashboard-demo
Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo
• Interactive trend graphs.
• Auto-updating Google Analytics data.
• Zoomable day-of-week heatmaps.
• Top Level Trends via Year on Year, Month on Month
and Last Month vs Month Last Year data modules.
• A MySQL connection for data blending your own data with GA data.
• An easy upload option to update a MySQL database.
• Analysis of the impact of marketing events via Google's CausalImpact.
• Detection of unusual time-points using Twitter's Anomaly Detection.
Shiny – Use case
How to automate all your SEO projects
Automated KPI reporting
3. Jupyter Notebook
Sharing source code with your SEO team
Jupyter Notebook Example
• Reproducibility
• Quality
• Discoverability
• Learning
Jupyter Notebook – Why ?
Step 1 — Installing Python 2.7 and Pip
$ sudo apt-get update
$ sudo apt-get -y install python2.7 python-pip python-dev
Step 2 — Installing Ipython and Jupyter Notebook
$ sudo apt-get -y install ipython ipython-notebook
$ sudo -H pip install jupyter
Step 3 — Running Jupyter Notebook
$ jupyter notebook
Jupyter Notebook Install
Notebook Example
• https://github.com/voltek62/RNotebook-SEO
• Semantic Analysis for SEO
• Scraper for SEO
Jupyter Notebook Examples
Process Validation
Documentation
4. Dataiku
Use AML to find the best algorithm
Automated Machine Learning
• Benchmarking
• Detecting Target Leakage
• Diagnostics
• Automation
$ adduser vincent sudo
$ sudo apt-get install default-jre
$ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz
$ tar xzf dataiku-dss-4.0.1.tar.gz
$ cd dataiku-dss-4.0.1
>> install all prerequites
$ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java
>> install dataiku
$ ./installer.sh -d DATA_DIR -p 11000
$ DATA_DIR/bin/dss start
http://<your server address>:11000.
Dataiku- Install on Instance Cloud
Go to the DSS data dir
$ cd DATADIR
Stop DSS
$ ./bin/dss stop
Run the installation script
$ ./bin/dssadmin install-R-integration
$ ./bin/dss start
Dataiku- Install R
Install R Package
Use-Case :
Detect Featured
Snippet
• Get all your featured snippet with Ranxplorer
• Get SERP for each keywords with Ranxplorer
• Use homemade scraper to enrich data :
• 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location'
• 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion'
• 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp'
• 'Wordcount' 'Size' 'ResponseTime'
• 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion'
• 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘
• Use AML to find importance features
Dataiku : Featured Snippet
Dataiku : Flow
Dataiku : Input / Output
Dataiku : Code Recipe
How to automate all your SEO projects
How to automate all your SEO projects
Dataiku : Visual Recipes
Dataiku : Plugin recipes
Dataiku : My Plugins
• SEMrush
• SearchConsole
• Majestic
• Visiblis [ongoing]
A DSS plugin is a zip file.
Inside DSS, click the top right gear → Administration → Plugins → Store.
https://github.com/voltek62/Dataiku-SEO-Plugins
Dataiku : AML
Dataiku : Import a project
• Learn from the success of others with AML
• Use all methods at your disposal to show Google you are the
answer to the question. ( Title, H1, H2, … )
Dataiku : Results
Automated Machine Learning
How to automate all your SEO projects
• Yes, you can because :
• Great advertising
• Get customers for specific features and trainings
Open Source & SEO ?
• Showing your work
• Attract talent
• Teaching the next generation
• Automated Reports with Rstudio Server
• Automated KPI reporting with Shiny Server
• Process Validation Documentation with Jupyter Notebook
• Automated Machine Learning with Dataiku
Take away
Now, machines can learn and adapt,
it is time to take advantage of the
opportunity to create new jobs.
Data-SEO, Data-Doctor, Data-Journalist …
Thank you!
Vincent Terrasi
@vincentterrasi
Get all my last discoveries and updates
Ad

More Related Content

What's hot (20)

Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
Itamar
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
Itamar
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
javier ramirez
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
Itamar
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Scrapy-101
Scrapy-101Scrapy-101
Scrapy-101
Snehil Verma
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
Federico Panini
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
Ivan Wallarm
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
Aleksander Stensby
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
markstory
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPress
Taylor Lovett
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
Jimmy Lai
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
Greg Brown
 
The JSON REST API for WordPress
The JSON REST API for WordPressThe JSON REST API for WordPress
The JSON REST API for WordPress
Taylor Lovett
 
Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
Itamar
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
Itamar
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
javier ramirez
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
Itamar
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
Federico Panini
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
Ivan Wallarm
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
Aleksander Stensby
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
markstory
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPress
Taylor Lovett
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
Jimmy Lai
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
Greg Brown
 
The JSON REST API for WordPress
The JSON REST API for WordPressThe JSON REST API for WordPress
The JSON REST API for WordPress
Taylor Lovett
 

Similar to How to automate all your SEO projects (20)

[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
Leo Lorieri
 
Logstash
LogstashLogstash
Logstash
琛琳 饶
 
Make BDD great again
Make BDD great againMake BDD great again
Make BDD great again
Yana Gusti
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby Team
Arto Artnik
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
biicode
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
Jérôme Petazzoni
 
TopicMapReduceComet log analysis by using splunk
TopicMapReduceComet log analysis by using splunkTopicMapReduceComet log analysis by using splunk
TopicMapReduceComet log analysis by using splunk
akashkale0756
 
2019 11-bgphp
2019 11-bgphp2019 11-bgphp
2019 11-bgphp
dantleech
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
Diego Freniche Brito
 
Spicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QASpicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QA
Alban Gérôme
 
Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)
dantleech
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration ToolsContainer (Docker) Orchestration Tools
Container (Docker) Orchestration Tools
Dhilipsiva DS
 
Docker, c'est bonheur !
Docker, c'est bonheur !Docker, c'est bonheur !
Docker, c'est bonheur !
Alexandre Salomé
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3
kognate
 
Developing and Deploying PHP with Docker
Developing and Deploying PHP with DockerDeveloping and Deploying PHP with Docker
Developing and Deploying PHP with Docker
Patrick Mizer
 
Development Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesDevelopment Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP Libraries
Pantheon
 
Puppi. Puppet strings to the shell
Puppi. Puppet strings to the shellPuppi. Puppet strings to the shell
Puppi. Puppet strings to the shell
Alessandro Franceschi
 
Time tested php with libtimemachine
Time tested php with libtimemachineTime tested php with libtimemachine
Time tested php with libtimemachine
Nick Galbreath
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code
維佋 唐
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
Perl Careers
 
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
Leo Lorieri
 
Make BDD great again
Make BDD great againMake BDD great again
Make BDD great again
Yana Gusti
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby Team
Arto Artnik
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
biicode
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
Jérôme Petazzoni
 
TopicMapReduceComet log analysis by using splunk
TopicMapReduceComet log analysis by using splunkTopicMapReduceComet log analysis by using splunk
TopicMapReduceComet log analysis by using splunk
akashkale0756
 
2019 11-bgphp
2019 11-bgphp2019 11-bgphp
2019 11-bgphp
dantleech
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
Diego Freniche Brito
 
Spicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QASpicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QA
Alban Gérôme
 
Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)
dantleech
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration ToolsContainer (Docker) Orchestration Tools
Container (Docker) Orchestration Tools
Dhilipsiva DS
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3
kognate
 
Developing and Deploying PHP with Docker
Developing and Deploying PHP with DockerDeveloping and Deploying PHP with Docker
Developing and Deploying PHP with Docker
Patrick Mizer
 
Development Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesDevelopment Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP Libraries
Pantheon
 
Time tested php with libtimemachine
Time tested php with libtimemachineTime tested php with libtimemachine
Time tested php with libtimemachine
Nick Galbreath
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code
維佋 唐
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
Perl Careers
 
Ad

More from Vincent Terrasi (14)

SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
Vincent Terrasi
 
IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEO
Vincent Terrasi
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
Vincent Terrasi
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Vincent Terrasi
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
Vincent Terrasi
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEO
Vincent Terrasi
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?
Vincent Terrasi
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking Factors
Vincent Terrasi
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !
Vincent Terrasi
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?
Vincent Terrasi
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
Vincent Terrasi
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
Vincent Terrasi
 
Meetup Data-science OVH
Meetup Data-science OVHMeetup Data-science OVH
Meetup Data-science OVH
Vincent Terrasi
 
SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
SEO CAMP'us Paris 2024 - Déploiement de l'IA générative privée dans les organ...
Vincent Terrasi
 
IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEO
Vincent Terrasi
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
Vincent Terrasi
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Vincent Terrasi
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
Vincent Terrasi
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEO
Vincent Terrasi
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?
Vincent Terrasi
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking Factors
Vincent Terrasi
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !
Vincent Terrasi
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?
Vincent Terrasi
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
Vincent Terrasi
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
Vincent Terrasi
 
Ad

Recently uploaded (20)

MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 

How to automate all your SEO projects

  • 1. How to automate all your SEO projects @VincentTerrasi OVH
  • 2. Planning • Each Day : • Advanced Reporting • Anomalies Detection • Log Analysis • Webperf with SiteSpeed.io • Each Week : • Ranking monitoring • Opportunities Detection • Hot Topic Detection • Each Quarter : • Semantic Analysis Time is precious Automate everything
  • 3. 1. RStudio Server 2. Shiny Server 3. Jupyter Notebook 4. Dataiku 5. OpenSource
  • 4. searchConsoleR Docker ATinternetR oncrawlR Rstudio Server Shiny Server Dataiku DataLake Scheduled Email Notebook DataAPIShiny Apps DataViz Reports
  • 5. 1. RStudio Server Automate all your SEO projects
  • 6. Why R ? Scriptable Big Community Mac / PC / Unix Open Source Free  10 000 packages
  • 7. Rgui WheRe ? How ? Rstudio https://www.cran.r-project.org
  • 10. • Docker on Ubuntu 16.04 Server • From the docker window, run: • sudo docker run -d -p 8787:8787 rocker/rstudio • e.g. http://yourIP:8787, and you should be greeted by the RStudio welcome screen. Log in using: • username: rstudio • password: rstudio RStudio Server - Install
  • 11. • install.packages("httr") • install.packages("RCurl") • install.packages("stringr") • install.packages("stringi") • install.packages("openssl") • install.packages("Rmpi") • install.packages("doMpi") R – Scraper – Packages
  • 12. R – Scraper – RCurl seocrawler <- function( url ) { useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25“ h <- basicTextGatherer() html <- getURL(url ,followlocation = TRUE ,ssl.verifypeer = FALSE ,httpheader = c('User-Agent' = useragent) ,headerfunction = h$update ) return(html) }
  • 13. R – Scraper – Header ind0 <- grep("HTTP/",h$value(NULL)) df$StatusCode <- tail(h$value(NULL)[ind0],1) ind1 <- grep("^Content-Type",h$value(NULL)) df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1)) ind2 <- grep("Last-Modified",h$value(NULL)) df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1)) ind3 <- grep("Content-Language",h$value(NULL)) df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1)) ind4 <- grep("Location",h$value(NULL)) df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
  • 14. R – Scraper – Xpath doc <- htmlParse(html, asText=TRUE,encoding="UTF-8") • H1 <- head(xpathSApply(doc, "//h1", xmlValue),1) • H2 <- head(xpathSApply(doc, "//h2", xmlValue),1) • robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1) • canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1) • DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
  • 17. R – Scraper – OpenMpi • MPI : Message Passing Interface is a specification for an API for passing messages between different computers. • Programming with MPI • Difficult because of Rmpi package defines about 110 R functions • Needs a parallel programming system to do the actual work in parallel • The doMPI package acts as an adaptor to the Rmpi package, which in turn is an R interface to an implementation of MPI • Very easy to install Open MPI, and Rmpi on Debian / Ubuntu • You can test with one computer
  • 18. R – Scraper – Install OpenMPI sudo yum install openmpi openmpi-devel openmpi-libs sudo ldconfig /usr/lib64/openmpi/lib/ export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“ install.packages("Rmpi", configure.args = c("--with-Rmpi-include=/usr/include/openmpi-x86_64/", "--with-Rmpi-libpath=/usr/lib64/openmpi/lib/", "--with-Rmpi-type=OPENMPI")) install.packages (“doMPI“)
  • 19. R – Scraper – Test doMpi library(doMPI) #start your cluster cl <- startMPIcluster(count=20) registerDoMPI(cl) # max <- dim(mydataset)[1] x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i) #close your cluster closeCluster(cl)
  • 20. • Venn Matrix : http://blog.mrbioinfo.com/ R – Semantic Analysis – Intro
  • 21. R – Semantic Analysis – Data
  • 23. R – Semantic Analysis – eVenn evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
  • 24. R – Semantic Analysis – Filter fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt" #read csv DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE) #find DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
  • 25. R – Semantic Analysis – nGram library(text2vec) it <- itoken( DF_PotentialKeywords[['Keywords']], preprocess_function = tolower, tokenizer = word_tokenizer, progessbar = F ) # 2 and 3 grams vocab <- create_vocabulary(it, ngram = c(2L, 3L)) DF_SEO_vocab <- data.frame(vocab$vocab) DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms, freq=DF_SEO_vocab$terms_counts) %>% arrange(-freq) %>% top_n(30)
  • 27. • Dplyr • Readxl • SearchConsoleR • googleAuthR • googleAnalyticsR R – Packages SEO Thanks to Mark Edmondson
  • 28. R – SearchConsoleR library(googleAuthR) library(searchConsoleR) # get your password on google console api options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com") options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX") ## change this to the website you want to download data for. Include http website <- "https://data-seo.fr" ## data is in search console reliably 3 days ago, so we donwnload from then ## today - 3 days start <- Sys.Date() - 3 ## one days data, but change it as needed end <- Sys.Date() - 3
  • 29. R – SearchConsoleR ## what to download, choose between data, query, page, device, country download_dimensions <- c('date','query') ## what type of Google search, choose between 'web', 'video' or 'image' type <- c('web') ## Authorize script with Search Console. ## First time you will need to login to Google but should auto-refresh after that so can be put in ## Authorize script with an account that has access to website. googleAuthR::gar_auth() ## first time stop here and wait for authorisation ## get the search analytics data data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type)
  • 31. • Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax) • MIN Minute field 0 to 59 • HOUR Hour field 0 to 23 • DOM Day of Month 1-31 • MON Month field 1-12 • DOW Day Of Week 0-6 • CMD Command Any command to be executed. • $ crontab –e • Run the R script filePath.R at 23:15 for every day of the year : 15 23 * * * Rscript filePath.R R – CronTab – Method 1
  • 32. • R Package : https://github.com/bnosac/cronR R – Cron – Method 2 library(cronR) cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20', days_of_week = c(1, 2)) cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20') cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20', days_of_week = c(0, 3, 5)) OR
  • 34. 2. Shiny Server Creating webapps with R
  • 36. Shiny Server – Where and How • ShinyApps.io • A local server • Hosted on your server
  • 37. • docker run --rm -p 3838:3838 -v /srv/shinyapps/:/srv/shiny-server/ -v /srv/shinylog/:/var/log/ rocker/shiny • If you have an app in /srv/shinyapps/appdir, you can run the app by visiting http://yourIP:3838/appdir/. Shiny Server - Install
  • 38. Shiny – ui.R fluidPage( titlePanel("Compute your internal pagerank"), sidebarLayout( sidebarPanel( a("data-seo.com", href="https://data-seo.com"), tags$hr(), p('Step 1 : Export your outlinks data from ScreamingFrog'), fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)', accept = c('text/csv'), multiple = FALSE ), tags$hr(), downloadButton('downloadData', 'Download CSV') ), mainPanel( h3(textOutput("caption")), tags$hr(), tableOutput('contents') ) ) )
  • 39. Shiny – server.R function(input, output, session) { .... output$contents <- renderTable({ if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) logsSummary } }) output$downloadData <- downloadHandler( filename = "extract.csv", content = function(file) { if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) write.csv2(logsSummary,file, row.names = FALSE) } } ) }
  • 40. https://mark.shinyapps.io/GA-dashboard-demo Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo • Interactive trend graphs. • Auto-updating Google Analytics data. • Zoomable day-of-week heatmaps. • Top Level Trends via Year on Year, Month on Month and Last Month vs Month Last Year data modules. • A MySQL connection for data blending your own data with GA data. • An easy upload option to update a MySQL database. • Analysis of the impact of marketing events via Google's CausalImpact. • Detection of unusual time-points using Twitter's Anomaly Detection. Shiny – Use case
  • 43. 3. Jupyter Notebook Sharing source code with your SEO team
  • 45. • Reproducibility • Quality • Discoverability • Learning Jupyter Notebook – Why ?
  • 46. Step 1 — Installing Python 2.7 and Pip $ sudo apt-get update $ sudo apt-get -y install python2.7 python-pip python-dev Step 2 — Installing Ipython and Jupyter Notebook $ sudo apt-get -y install ipython ipython-notebook $ sudo -H pip install jupyter Step 3 — Running Jupyter Notebook $ jupyter notebook Jupyter Notebook Install
  • 48. • https://github.com/voltek62/RNotebook-SEO • Semantic Analysis for SEO • Scraper for SEO Jupyter Notebook Examples
  • 50. 4. Dataiku Use AML to find the best algorithm
  • 51. Automated Machine Learning • Benchmarking • Detecting Target Leakage • Diagnostics • Automation
  • 52. $ adduser vincent sudo $ sudo apt-get install default-jre $ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz $ tar xzf dataiku-dss-4.0.1.tar.gz $ cd dataiku-dss-4.0.1 >> install all prerequites $ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java >> install dataiku $ ./installer.sh -d DATA_DIR -p 11000 $ DATA_DIR/bin/dss start http://<your server address>:11000. Dataiku- Install on Instance Cloud
  • 53. Go to the DSS data dir $ cd DATADIR Stop DSS $ ./bin/dss stop Run the installation script $ ./bin/dssadmin install-R-integration $ ./bin/dss start Dataiku- Install R
  • 56. • Get all your featured snippet with Ranxplorer • Get SERP for each keywords with Ranxplorer • Use homemade scraper to enrich data : • 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location' • 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion' • 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp' • 'Wordcount' 'Size' 'ResponseTime' • 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion' • 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘ • Use AML to find importance features Dataiku : Featured Snippet
  • 58. Dataiku : Input / Output
  • 59. Dataiku : Code Recipe
  • 62. Dataiku : Visual Recipes
  • 63. Dataiku : Plugin recipes
  • 64. Dataiku : My Plugins • SEMrush • SearchConsole • Majestic • Visiblis [ongoing] A DSS plugin is a zip file. Inside DSS, click the top right gear → Administration → Plugins → Store. https://github.com/voltek62/Dataiku-SEO-Plugins
  • 66. Dataiku : Import a project
  • 67. • Learn from the success of others with AML • Use all methods at your disposal to show Google you are the answer to the question. ( Title, H1, H2, … ) Dataiku : Results
  • 70. • Yes, you can because : • Great advertising • Get customers for specific features and trainings Open Source & SEO ? • Showing your work • Attract talent • Teaching the next generation
  • 71. • Automated Reports with Rstudio Server • Automated KPI reporting with Shiny Server • Process Validation Documentation with Jupyter Notebook • Automated Machine Learning with Dataiku Take away
  • 72. Now, machines can learn and adapt, it is time to take advantage of the opportunity to create new jobs. Data-SEO, Data-Doctor, Data-Journalist …
  • 74. Vincent Terrasi @vincentterrasi Get all my last discoveries and updates

Editor's Notes

  • #4: COMMENT ?
  • #7: R est un langage informatique dédié aux statistiques et à la science des données. L'implémentation la plus connue du langage R est le logiciel GNU R.
  • #13: Header de la response HTTP : collect the contents of the header of an HTTP response
  • #26: Itoken : This function creates iterators over input objects to vocabularies, corpora, or DTM and TCM matrices. This iterator is usually used in following functions : create_vocabulary, create_corpus, create_dtm, vectorizers,create_tcm. See them for details. create_vocabulary : This function collects unique terms and corresponding statistics. See the below for details.
  • #34: Email ,…..
  • #36: Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  • #37: Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  • #38: Install one line
  • #39: 2 fichiers UI.R et server.R
  • #48: Changer crawler par scraper
  • #52: Benchmarking : AML can quickly present a lot of models using the same training set Detecting Target Leakage: AML builds candidate models extremely fast in an automated way Diagnostics: Diagnostics can be automatically generated such as learning curves, feature importances, etc. Automation : Tasks like exploratory data analysis, pre-processing of data, model selection and putting models into production can be automated.