Web Mining Overview
Web Mining Overview
Data Mining
Lecture 1
http://www.simplyhired.com
Extracting structured data
http://www.fatlens.com
Web Mining topics
Web graph analysis
Power Laws and The Long Tail
Structured data extraction
Web advertising
Systems Issues
Ads vs. search results
Ads vs. search results
Search advertising is the revenue
model
Multi-billion-dollar industry
Advertisers pay for clicks on their ads
Interesting problems
What ads to show for a search?
If I’m an advertiser, which search terms
should I bid on and how much to bid?
Web Mining topics
Web graph analysis
Power Laws and The Long Tail
Structured data extraction
Web advertising
Systems Issues
Two Approaches to Analyzing
Data
Machine Learning approach
Emphasizes sophisticated algorithms
e.g., Support Vector Machines
Data sets tend to be small, fit in memory
Data Mining approach
Emphasizes big data sets (e.g., in the
terabytes)
Data cannot even fit on a single disk!
Necessarily leads to simpler algorithms
Philosophy
In many cases, adding more data
leads to better results that improving
algorithms
Netflix
Google search
Google ads
More on my blog:
Datawocky (datawocky.com)
Systems architecture
CPU
Machine Learning, Statistics
Memory
Disk
Very Large-Scale Data Mining