SlideShare a Scribd company logo
Take back control of your 
Web Tracking 
www.dataiku.com 
@ClementStenac 
CTO, Dataiku
Give me dashboards ! 
www.dataiku.com
Choose one 
www.dataiku.com 
Raw data 
Do what you want 
Your money 
Access to raw data is a premium feature
Who cares about raw data ? 
• SAAS analytics are full-featured 
• Custom variables to link with your backend data 
• Did you really join all data for your 
future needs ? 
• Do you have access / want to push to the JS 
all necessary data ? 
• What kinds of analysis will you do later on ? 
www.dataiku.com
A real example 
Segmentation and tracking user-satisfaction 
www.dataiku.com 
Raw 
tracking 
data 
User-level 
stats 
User base 
segmentation 
Metrics per 
segments 
Tracking over time 
TB 
GB
User-level data 
www.dataiku.com
Clustering 
www.dataiku.com
Labeling 
www.dataiku.com 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Here you need your 
business intelligence 
Foreigner 
Discovering The 
Site 
Fan who loves 
to comment 
Home Page 
Wanderer 
Dark Bot 
(Competitor?)
Compute metrics per segment 
738k sessions 
0.83€ per session 
0.73€ acquisition costs 
www.dataiku.com 
938k sessions 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Here you need to 
cross with your CRM 
Foreigner 
Discovering The 
Site 
Fan that loves 
to comment 
Home Page 
Wanderer 
Dark Bot 
(Competitor?) 
0.3€ per session 
0.23€ acquisition costs 
`` 
` 
13k sessions 
1.3€ per session 
0.23€ acquisition costs 
938k sessions 
0.3€ per session 
0.23€ acquisition costs 
68k sessions 
0.3€ per session 
1.23€ acquisition costs 
1k sessions 
0€ per session 
0€ acquisition costs
Track metrics over time 
www.dataiku.com 
Using your already-computed segments 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Fan that loves 
to comment 
Home Page 
Wanderer 
Foreigner 
Discovering The 
Site 
Dark Bot 
(Competitor?) 
Damn 
our latest 
release 
has diverging 
effects 
on segments
A few other examples 
• Churn prediction and explanation 
• Customer lifetime value prediction 
www.dataiku.com
www.dataiku.com 
OK 
I WANT TO 
DO IT
So, I have these Apache logs 
• First level of web tracking 
• "Nothing required" 
www.dataiku.com
Are backend logs a solution ? 
Challenge 1 : Identify a visitor 
www.dataiku.com 
• IP ? 
• NAT / Proxy 
• Not everyone has a public IP address 
• IP + user-agent ? 
• Big companies !
Are backend logs a solution ? 
Challenge 2 : Re-create sessions 
• Using expiration times 
• Advanced SQL / Hive / … 
www.dataiku.com 
makes this easier
Are backend logs a solution ? 
Challenge 3 : single-page webapps 
• Track behaviour within each page 
• Track events, not pages 
Also: getting logs from IT is sometimes another challenge  
www.dataiku.com
Client-side tracking 
• visitor_id and session_id handled with cookies 
• Tracking page loads and various events 
• Historically, "tracking" = fetching a 1x1 image 
• AJAX 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
JS tracking code 
Tracking calls
Are cookies good for your (web) health ? 
• Each cookie belongs to a domain 
www.dataiku.com 
(and its subdomains) 
• Who can write a cookie ? 
– The HTTP server, who becomes owner 
(via the Set-Cookie HTTP header) 
– JS code running on the "owner" domain 
• Who can read a cookie ? 
– The owner HTTP server (sent by the browser) 
– JS code running on the "owner" domain
First-party cookies 
• Set by the originating server (HTTP) or JS code 
• Belong to the originating domain 
• Sent by HTTP to the originating domain only 
• Readable by JS code 
www.dataiku.com 
www.website.com 
Browser 
Contents 
Cookies for www.website.com: 
None 
tracker.com 
GET / 
Cookies: none 
Fetch tracking script 
Tracking JS code: read cookies for www.website.com 
Tracking JS code: create visitor id and set cookie
First-party cookies 
• Set by the originating server (HTTP) or JS code 
• Belong to the originating domain 
• Sent by HTTP to the originating domain only 
• Readable by JS code 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
GET /track?visitor_id=d37ecba 
Cookies: None 
JS code: send AJAX request to tracker.com with visitor_id 
Cookies for www.website.com: 
visitor_id=d37ecba
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
GET / 
Cookies: none 
Fetch tracking script 
Contents 
Cookies for www.website.com: 
None 
Cookies for tracker.com: 
None
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
www.dataiku.com 
www.website.com 
Browser 
Cookies for www.website.com: 
None 
Tracker code: assign visitor_id 
tracker.com 
Cookies for tracker.com: 
None 
GET /track 
Cookies: None 
200 OK 
Set-Cookie: visitor_id=33d7
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
Tracker code: read visitor_id 
GET /track 
Cookies: visitor_id=33d7 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
Cookies for tracker.com: 
visitor_id=33d7 
200 OK 
Cookies for www.website.com: 
None
Why each ? 
www.dataiku.com 
First party cookie 
• Tracks on a single website 
• Requires JS code for tracking 
• Reduced privacy impact: 
No exchange of information 
between sites 
• Usage: track your user's 
behaviour 
Third party cookie 
• Tracks across all websites 
using the same tracker 
• More frowned upon 
• Usage: generally, ads 
but also multi-website 
Rarely blocked 
(used for logins) 
Blocked by up to 
40% visitors
What are your obligations ? 
With ALL cookies 
• You should ask user whether he wants cookies 
• Even non-tracking related cookies 
• Yes, even login-related ones 
www.dataiku.com
What are your obligations ? 
With third party cookies 
• Obey the Do-Not-Track header 
www.dataiku.com 
www.website.com 
Browser 
Tracker code: DO NOTHING 
tracker.com 
GET /track 
Cookies: None 
DNT: 1 
200 OK
What are your obligations ? 
With third party cookies 
• Provide an opt-out URL 
• Allows the user to /optin , /optout or /status 
See in action : www.youronlinechoices.com 
www.dataiku.com
What are your obligations ? 
With third party cookies 
• Provide a P3P policy 
• Else, older IE blocks you 
"What are you doing with my data ?" 
www.dataiku.com 
Looks like this: 
CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Tracking in mobile apps 
www.dataiku.com 
• Preserve battery 
– Each network call is costly 
– Do not track everything synchronously 
• Network access is intermittent 
– Queue events and wait for network access
So, what are my choices ? 
• You might really want to be your own web tracker 
• Most used open source Webtracker : 
www.dataiku.com 
Piwik 
• Provides both raw data and nice dashboards 
– MySQL backend 
– Raw data via API 
– Slightly less suited for analytics
www.dataiku.com 
WT1 
YOUR OWN 
TRACKER 
IN MINUTES
WT1 
An open source (Apache License) server 
to build your own web tracking 
https://github.com/dataiku/wt1 
• Designed to provide you with raw data, 
directly usable for analytics 
• Very high performance and scalability 
www.dataiku.com
Features 
www.dataiku.com 
• 1st or 3rd party cookies 
– Handling of DNT and opt-out 
– Helps handling P3P 
• Track events or pages with key-value data 
• Visitor-scope and session-scope variables 
• "Live view" debugging console
Features 
www.dataiku.com 
• Dashboards: None  
• Events processing and storage 
– Filesystem, S3 
– Event queues: Flume 
– Custom processors 
• JSON API for custom tracking 
• iOS library
Architecture 
www.dataiku.com 
Client-side 
JS tracker 
iOS 
library 
• 1st or 3rd 
party cookies 
• Event-level tracking 
• Automatic batching 
• Queuing to deal with 
network interruptions 
WT1 Server 
Raw storage 
• Filesystem 
• S3 
JSON POST 
Event processors: 
• Real-time aggregations 
• Custom code 
Event queues 
• Flume 
• Kafka, RabbitMQ, … 
• Java 
• > 20K events / second 
• Handles DNT, P3P, opt-out, …
Future work 
www.dataiku.com 
• Android library 
• More event queues supported OOTB 
– Kafka 
– RabbitMQ 
• Avro storage
Thank you ! 
www. .com 
Clément Stenac 
clement.stenac@dataiku.com 
@ClementStenac 
www.dataiku.com

More Related Content

What's hot (19)

PDF
Before Kaggle
Pierre Gutierrez
 
PPTX
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
Sri Ambati
 
PDF
Clickstream & Social Media Analysis using Apache Spark
TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science
 
PDF
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Data Con LA
 
PDF
Unifying Events and Logs into the Cloud
Eduardo Silva Pereira
 
PDF
Google BigQuery for Everyday Developer
Márton Kodok
 
PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
PDF
How BigQuery broke my heart
Gabriel Hamilton
 
PDF
Big Data Usecases
Vishal Shukla
 
PDF
What is support_engineer_in_treasuredata
Treasure Data, Inc.
 
PDF
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Data Con LA
 
PDF
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Databricks
 
PDF
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
PPTX
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
PDF
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
 
PDF
Paytm labs soyouwanttodatascience
Adam Muise
 
PDF
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
Patrick Chanezon
 
PDF
Open Source DataViz with Apache Superset
Carl W. Handlin
 
Before Kaggle
Pierre Gutierrez
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
Sri Ambati
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Data Con LA
 
Unifying Events and Logs into the Cloud
Eduardo Silva Pereira
 
Google BigQuery for Everyday Developer
Márton Kodok
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
How BigQuery broke my heart
Gabriel Hamilton
 
Big Data Usecases
Vishal Shukla
 
What is support_engineer_in_treasuredata
Treasure Data, Inc.
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Data Con LA
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Databricks
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
 
Paytm labs soyouwanttodatascience
Adam Muise
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
Patrick Chanezon
 
Open Source DataViz with Apache Superset
Carl W. Handlin
 

Viewers also liked (20)

PDF
The paradox of big data - dataiku / oxalide APEROTECH
Dataiku
 
PDF
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku
 
PPTX
Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku
 
PDF
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku
 
PDF
Dataiku productive application to production - pap is may 2015
Dataiku
 
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
PPTX
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Dataiku
 
PPTX
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
Dataiku
 
PDF
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku
 
PDF
How to Build Successful Data Team - Dataiku ?
Dataiku
 
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 
PDF
The US Healthcare Industry
Dataiku
 
PDF
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
PDF
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
PPTX
Flink Case Study: Capital One
Flink Forward
 
PDF
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
500 Startups
 
PDF
What Makes Content Memorable?
Bruce Kasanoff
 
PDF
Activate Tech and Media Outlook 2016
Activate
 
PPTX
Tips, Tools and Templates To Build Your Content Marketing Strategy
Michael Brenner
 
PPTX
How To Plan And Build A Successful Content Marketing Strategy
Michael Brenner
 
The paradox of big data - dataiku / oxalide APEROTECH
Dataiku
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku
 
Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku
 
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Dataiku
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
Dataiku
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku
 
How to Build Successful Data Team - Dataiku ?
Dataiku
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 
The US Healthcare Industry
Dataiku
 
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
Flink Case Study: Capital One
Flink Forward
 
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
500 Startups
 
What Makes Content Memorable?
Bruce Kasanoff
 
Activate Tech and Media Outlook 2016
Activate
 
Tips, Tools and Templates To Build Your Content Marketing Strategy
Michael Brenner
 
How To Plan And Build A Successful Content Marketing Strategy
Michael Brenner
 
Ad

Similar to OWF 2014 - Take back control of your Web tracking - Dataiku (20)

PDF
OWF14 - Big Data Track : Take back control of your web tracking Go further by...
Paris Open Source Summit
 
PPTX
Web Analytics Primer
Chad Richeson
 
PDF
Web前端性能优化 2014
Yubei Li
 
PDF
Tracking and business intelligence
Sebastian Schleicher
 
PPTX
10 Things You Can Do to Speed Up Your Web App Today
Chris Love
 
PPTX
Datasets, APIs, and Web Scraping
Damian T. Gordon
 
PDF
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 
PDF
Website & Internet + Performance testing
Roman Ananev
 
PPTX
Door Of Internet
Kuldeep Padhiyar
 
PDF
Web Performance Optimization (WPO)
Betclic Everest Group Tech Team
 
PDF
Altitude SF 2017: The power of the network
Fastly
 
PDF
20 tips for website performance
Andrew Siemer
 
PPTX
Developing with Sitecore Personalize SDK.pptx
Dylan Young
 
PDF
Introduction to Search Engine.pdf
Praveen Kurup
 
PDF
Introduction to Search Engine.pdf
Praveen Kurup
 
PPTX
Scrapinghub Deck for Startups
Scrapinghub
 
PPTX
External JavaScript Widget Development Best Practices (updated) (v.1.1)
Volkan Özçelik
 
PPTX
SEO for Large Websites
Dominic Woodman
 
PPTX
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
Robert Conti Jr.
 
PPTX
10 things you can do to speed up your web app today stir trek edition
Chris Love
 
OWF14 - Big Data Track : Take back control of your web tracking Go further by...
Paris Open Source Summit
 
Web Analytics Primer
Chad Richeson
 
Web前端性能优化 2014
Yubei Li
 
Tracking and business intelligence
Sebastian Schleicher
 
10 Things You Can Do to Speed Up Your Web App Today
Chris Love
 
Datasets, APIs, and Web Scraping
Damian T. Gordon
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 
Website & Internet + Performance testing
Roman Ananev
 
Door Of Internet
Kuldeep Padhiyar
 
Web Performance Optimization (WPO)
Betclic Everest Group Tech Team
 
Altitude SF 2017: The power of the network
Fastly
 
20 tips for website performance
Andrew Siemer
 
Developing with Sitecore Personalize SDK.pptx
Dylan Young
 
Introduction to Search Engine.pdf
Praveen Kurup
 
Introduction to Search Engine.pdf
Praveen Kurup
 
Scrapinghub Deck for Startups
Scrapinghub
 
External JavaScript Widget Development Best Practices (updated) (v.1.1)
Volkan Özçelik
 
SEO for Large Websites
Dominic Woodman
 
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
Robert Conti Jr.
 
10 things you can do to speed up your web app today stir trek edition
Chris Love
 
Ad

More from Dataiku (15)

PDF
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Dataiku
 
PDF
Applied Data Science Course Part 2: the data science workflow and basic model...
Dataiku
 
PDF
Applied Data Science Course Part 1: Concepts & your first ML model
Dataiku
 
PPTX
04Juin2015_Symposium_Présentation_Coyote_Dataiku
Dataiku
 
PDF
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Dataiku
 
PPTX
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
PPTX
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku
 
PPTX
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
Dataiku
 
PDF
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku
 
PDF
Dataiku - google cloud platform roadshow - october 2013
Dataiku
 
PDF
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 
PPTX
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku
 
PPTX
Data Disruption for Insurance - Perspective from th
Dataiku
 
PPTX
Dataiku - From Big Data To Machine Learning
Dataiku
 
PPTX
Online Games Analytics - Data Science for Fun
Dataiku
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Dataiku
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Dataiku
 
Applied Data Science Course Part 1: Concepts & your first ML model
Dataiku
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
Dataiku
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Dataiku
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
Dataiku
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku
 
Dataiku - google cloud platform roadshow - october 2013
Dataiku
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku
 
Data Disruption for Insurance - Perspective from th
Dataiku
 
Dataiku - From Big Data To Machine Learning
Dataiku
 
Online Games Analytics - Data Science for Fun
Dataiku
 

Recently uploaded (20)

PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 

OWF 2014 - Take back control of your Web tracking - Dataiku

  • 1. Take back control of your Web Tracking www.dataiku.com @ClementStenac CTO, Dataiku
  • 2. Give me dashboards ! www.dataiku.com
  • 3. Choose one www.dataiku.com Raw data Do what you want Your money Access to raw data is a premium feature
  • 4. Who cares about raw data ? • SAAS analytics are full-featured • Custom variables to link with your backend data • Did you really join all data for your future needs ? • Do you have access / want to push to the JS all necessary data ? • What kinds of analysis will you do later on ? www.dataiku.com
  • 5. A real example Segmentation and tracking user-satisfaction www.dataiku.com Raw tracking data User-level stats User base segmentation Metrics per segments Tracking over time TB GB
  • 8. Labeling www.dataiku.com Search for a specific Topic Newcomer from Google News Here you need your business intelligence Foreigner Discovering The Site Fan who loves to comment Home Page Wanderer Dark Bot (Competitor?)
  • 9. Compute metrics per segment 738k sessions 0.83€ per session 0.73€ acquisition costs www.dataiku.com 938k sessions Search for a specific Topic Newcomer from Google News Here you need to cross with your CRM Foreigner Discovering The Site Fan that loves to comment Home Page Wanderer Dark Bot (Competitor?) 0.3€ per session 0.23€ acquisition costs `` ` 13k sessions 1.3€ per session 0.23€ acquisition costs 938k sessions 0.3€ per session 0.23€ acquisition costs 68k sessions 0.3€ per session 1.23€ acquisition costs 1k sessions 0€ per session 0€ acquisition costs
  • 10. Track metrics over time www.dataiku.com Using your already-computed segments Search for a specific Topic Newcomer from Google News Fan that loves to comment Home Page Wanderer Foreigner Discovering The Site Dark Bot (Competitor?) Damn our latest release has diverging effects on segments
  • 11. A few other examples • Churn prediction and explanation • Customer lifetime value prediction www.dataiku.com
  • 12. www.dataiku.com OK I WANT TO DO IT
  • 13. So, I have these Apache logs • First level of web tracking • "Nothing required" www.dataiku.com
  • 14. Are backend logs a solution ? Challenge 1 : Identify a visitor www.dataiku.com • IP ? • NAT / Proxy • Not everyone has a public IP address • IP + user-agent ? • Big companies !
  • 15. Are backend logs a solution ? Challenge 2 : Re-create sessions • Using expiration times • Advanced SQL / Hive / … www.dataiku.com makes this easier
  • 16. Are backend logs a solution ? Challenge 3 : single-page webapps • Track behaviour within each page • Track events, not pages Also: getting logs from IT is sometimes another challenge  www.dataiku.com
  • 17. Client-side tracking • visitor_id and session_id handled with cookies • Tracking page loads and various events • Historically, "tracking" = fetching a 1x1 image • AJAX www.dataiku.com www.website.com Browser tracker.com JS tracking code Tracking calls
  • 18. Are cookies good for your (web) health ? • Each cookie belongs to a domain www.dataiku.com (and its subdomains) • Who can write a cookie ? – The HTTP server, who becomes owner (via the Set-Cookie HTTP header) – JS code running on the "owner" domain • Who can read a cookie ? – The owner HTTP server (sent by the browser) – JS code running on the "owner" domain
  • 19. First-party cookies • Set by the originating server (HTTP) or JS code • Belong to the originating domain • Sent by HTTP to the originating domain only • Readable by JS code www.dataiku.com www.website.com Browser Contents Cookies for www.website.com: None tracker.com GET / Cookies: none Fetch tracking script Tracking JS code: read cookies for www.website.com Tracking JS code: create visitor id and set cookie
  • 20. First-party cookies • Set by the originating server (HTTP) or JS code • Belong to the originating domain • Sent by HTTP to the originating domain only • Readable by JS code www.dataiku.com www.website.com Browser tracker.com GET /track?visitor_id=d37ecba Cookies: None JS code: send AJAX request to tracker.com with visitor_id Cookies for www.website.com: visitor_id=d37ecba
  • 21. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) www.dataiku.com www.website.com Browser tracker.com GET / Cookies: none Fetch tracking script Contents Cookies for www.website.com: None Cookies for tracker.com: None
  • 22. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) www.dataiku.com www.website.com Browser Cookies for www.website.com: None Tracker code: assign visitor_id tracker.com Cookies for tracker.com: None GET /track Cookies: None 200 OK Set-Cookie: visitor_id=33d7
  • 23. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) Tracker code: read visitor_id GET /track Cookies: visitor_id=33d7 www.dataiku.com www.website.com Browser tracker.com Cookies for tracker.com: visitor_id=33d7 200 OK Cookies for www.website.com: None
  • 24. Why each ? www.dataiku.com First party cookie • Tracks on a single website • Requires JS code for tracking • Reduced privacy impact: No exchange of information between sites • Usage: track your user's behaviour Third party cookie • Tracks across all websites using the same tracker • More frowned upon • Usage: generally, ads but also multi-website Rarely blocked (used for logins) Blocked by up to 40% visitors
  • 25. What are your obligations ? With ALL cookies • You should ask user whether he wants cookies • Even non-tracking related cookies • Yes, even login-related ones www.dataiku.com
  • 26. What are your obligations ? With third party cookies • Obey the Do-Not-Track header www.dataiku.com www.website.com Browser Tracker code: DO NOTHING tracker.com GET /track Cookies: None DNT: 1 200 OK
  • 27. What are your obligations ? With third party cookies • Provide an opt-out URL • Allows the user to /optin , /optout or /status See in action : www.youronlinechoices.com www.dataiku.com
  • 28. What are your obligations ? With third party cookies • Provide a P3P policy • Else, older IE blocks you "What are you doing with my data ?" www.dataiku.com Looks like this: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
  • 29. Tracking in mobile apps www.dataiku.com • Preserve battery – Each network call is costly – Do not track everything synchronously • Network access is intermittent – Queue events and wait for network access
  • 30. So, what are my choices ? • You might really want to be your own web tracker • Most used open source Webtracker : www.dataiku.com Piwik • Provides both raw data and nice dashboards – MySQL backend – Raw data via API – Slightly less suited for analytics
  • 31. www.dataiku.com WT1 YOUR OWN TRACKER IN MINUTES
  • 32. WT1 An open source (Apache License) server to build your own web tracking https://github.com/dataiku/wt1 • Designed to provide you with raw data, directly usable for analytics • Very high performance and scalability www.dataiku.com
  • 33. Features www.dataiku.com • 1st or 3rd party cookies – Handling of DNT and opt-out – Helps handling P3P • Track events or pages with key-value data • Visitor-scope and session-scope variables • "Live view" debugging console
  • 34. Features www.dataiku.com • Dashboards: None  • Events processing and storage – Filesystem, S3 – Event queues: Flume – Custom processors • JSON API for custom tracking • iOS library
  • 35. Architecture www.dataiku.com Client-side JS tracker iOS library • 1st or 3rd party cookies • Event-level tracking • Automatic batching • Queuing to deal with network interruptions WT1 Server Raw storage • Filesystem • S3 JSON POST Event processors: • Real-time aggregations • Custom code Event queues • Flume • Kafka, RabbitMQ, … • Java • > 20K events / second • Handles DNT, P3P, opt-out, …
  • 36. Future work www.dataiku.com • Android library • More event queues supported OOTB – Kafka – RabbitMQ • Avro storage
  • 37. Thank you ! www. .com Clément Stenac [email protected] @ClementStenac www.dataiku.com

Editor's Notes

  • #2: Web tracking is important, right ? You must understand how your users behave on your website One of the core points of lean So, let's not do it anymore and let others do it !
  • #3: A huge number of SAAS solutions – provide great dashboards Chances are good that you should use one of them ! Talk about encouraging you to do it yourself but you should probably start with hosted solution for startup.
  • #4: You generally have to choose between "cheap" (or free) solutions Free: Google Analytics  entry point to sell ads. Not bad but you should know what it's about.
  • #5: Example add data: complaints / support calls History prior to setting up *this* tracking Analysis: ML, not inaccessible and for elites
  • #6: Track user satisfaction metrics over time *by behaviour* Not science fiction Raw -> User: recreate *features* for users. Time-baed aggregations
  • #8: What Olivier Grisel just said 
  • #30: Just a few quick remarks
  • #34: Fairly standard if you are used to web trackers GA-like API