Chapter 3 - 大数据管理
Chapter 3 - 大数据管理
The data will be split into smaller parts or blocks and store them in
different machines. Then, we will find the highest temperature in
each part stored in the corresponding machine.
At last, we will combine the results received from each of the
machines to have the final output. Let us look at the challenges
associated with this traditional approach:
Old Way
Dear, Bear, River, Car, Car, River, Deer, Car and Bear
Parallel Processing
In MapReduce, we are dividing the job among multiple
nodes and each node works with a part of the job
simultaneously.
So, MapReduce is based on Divide and Conquer paradigm
which helps us to process the data using different
machines.
As the data is processed by multiple machines instead of a
single machine in parallel, the time taken to process the
data gets reduced by a tremendous amount as shown in
the figure below (2).
Advantages of MapReduce
Parallel Processing
Advantages of MapReduce
Data Locality
Instead of moving data to the processing unit, we are
moving the processing unit to the data in the
MapReduce Framework.
In the traditional system, we used to bring data to the
processing unit and process it.
Advantages of MapReduce
Social Networks
Many of the social network features, such as who visited
your LinkedIn profile, who read your post on Facebook or
Twitter, can be evaluated using the MapReduce,
programming model.
Usage of MapReduce in industry
Entertainment
Netflix uses Hadoop and MapReduce to solve problems
such as discovering the most popular movies, based on
what you watched, what do you like?
Providing suggestions to registered users taken into
account their interests.
MapReduce can determine how users are watching movies,
analyzing their logs and clicks.
Usage of MapReduce in industry
Electronic Commerce
Many e-commerce providers, such as the Amazon, Walmart,
and eBay, use the MapReduce programming model to
identify favorite products based on users’ interests or
buying behavior.
It includes creating product recommendation mechanisms
for e-commerce catalogs, analyzing site records, purchase
history, user interaction logs, and so on.
Usage of MapReduce in industry
Electronic Commerce
It’s used to establish a user’s sentimental profile for a
particular product by reviewing comments or reviews or
analyzing search logs by identifying which items are most
popular based on the search and which products are
missing.
Many Internet service providers use MapReduce to analyze
site records and understand site visits, engagement,
locations, mobile devices, and browsers.
Usage of MapReduce in industry
Fraud Detection
Hadoop and MapReduce are used in the financial industries,
including companies such as banks, insurance providers,
payment locations for fraud detection, trend identification
or business metrics through transaction analysis.
Banks analyze the data of the credit card and the related
expenses, for categorization of these expenses and make
recommendations for different offers, analyzing
anonymous purchasing behavior.
Usage of MapReduce in industry
Data Warehouse
We can utilize MapReduce to analyze large data volumes in
data warehouses while implementing specific business logic
for data insights.