Unit - I Part I
Unit - I Part I
Course Outcomes:
CO 1 - Demonstrate knowledge of Big Data Analytics
concepts and its applications in business.
CO 2 - Demonstrate functions and components of Map
Reduce Framework and HDFS.
CO 3 - Discuss Data Management concepts in NoSQL
environment.
CO 4 - Explain process of developing Map Reduce based
distributed processing applications.
CO 5 - Explain process of developing applications using
HBASE, Hive, Pig etc.
Unit - I
• Introduction to Big Data
Types of digital data, history of Big Data innovation,
introduction to Big Data platform, drivers for Big Data,
Big Data architecture and characteristics, 5 Vs of Big
Data, Big Data technology components, Big Data
importance and applications
Big Data features – security, compliance, auditing and
protection, Big Data privacy and ethics, Big Data
Analytics, Challenges of conventional systems, intelligent
data analysis, nature of data, analytic processes and tools,
analysis vs reporting, modern data analytic tools.
Definition – Big Data
• Data that contains greater variety, arriving in
increasing volumes and with more velocity.
• Otherwise, Big Data is a collection of data that
is huge in volume, yet growing exponentially
with time.
• It is a data with so large size and complexity
that none of traditional data management
tools can store it or process it efficiently. Big
data is also a data but with huge size.
Example of Big Data
• Big data
architecture
refers to the
logical and
physical structure
that dictates how
high volumes of
data are ingested,
processed, stored,
managed, and
accessed.
Layers in BIG DATA Architecture
• Big Data Ingestion Layer
This layer of Big Data Architecture is the first step for the data coming
from variable sources to start its journey. Data ingestion means the data
is prioritized and categorized, making data flow smoothly in further
layers in the Data ingestion process flow.
Tools used by this layer is
Apache Flume - straightforward and flexible architecture based on
streaming data flows,
Apache Nifi - supports robust and scalable directed graphs of data
routing, transformation, and system mediation logic.,
Elastic Logstash - open-source Data ingestion tool, server-side data
processing pipeline that ingests data from many sources, simultaneously
transforms it, and then sends it to your “stash, ” i.e., Elasticsearch
• Data Collector Layer
In this Layer, more focus is on the transportation of data from the
ingestion layer to the rest of the data pipeline. It is the Layer of
data architecture where components are decoupled so that
analytic capabilities may begin.
Data Processing Layer
• In this primary layer of Big Data Architecture, the focus is to
specialize in the data pipeline processing system. We can say
the data we have collected in the previous layer is processed in
this layer. Here we do some magic with the data to route them
to a different destination and classify the data flow, and it’s the
first point where the analytic may occur.
Data Storage Layer
Storage becomes a challenge when the size of the data you are dealing with
becomes large. Several possible solutions, like Data Ingestion Patterns, can
rescue from such problems. Finding a storage solution is very much
important when the size of your data becomes large. This layer of Big Data
Architecture focuses on “where to store such large data efficiently.”
Data Query Layer
This is the architectural layer where active analytic processing of Big Data
takes place. Here, the primary focus is to gather the data value to be more
helpful for the next layer.
Data Visualization Layer
The visualization, or presentation tier, probably the most prestigious tier,
where the data pipeline users may feel the VALUE of DATA.
Importance of Big Data
• To understand Where, When and Why their customers buy
• Protect the company’s client base with improved loyalty
programs
• Seizing cross-selling and upselling opportunities
• Provide targeted promotional information
• Optimize Workforce planning and operations
• Improve inefficiencies in the company’s supply chain
• Predict market trends
• Predict future needs
• Make companies more innovative and competitive
• It helps companies to discover new sources of revenue
Applications of Big Data
• Transportation: Big Data helps run GPS in smart phone
applications which sources data from government agencies and
even satellite images. Airplanes also generate a huge volume of
data for transatlantic flights to optimize fuel efficiency, balance
cargo and passenger weights, and analyze weather conditions in
order to ensure the maximum level of safety.
• Advertising and Marketing: Big Data is a major constituent
of marketing and advertising to target particular segments of the
consumer base. Advertisers purchase or collect large volumes of
data to identify what consumers like.
• Banking and Financial Services: Big Data plays an important role
in the financial industry because it is used for fraud detection,
managing and mitigating risks, optimizing customer
relationships as well as personalized marketing.
• Media and Entertainment: Big Data is extensively used by the
entertainment industry for gaining insights from reviews sent by
consumers, predicting audience preferences and interests, and targeting
campaigns for marketing purposes.
• Meteorology: Weather sensors and satellites all over the globe help
collect large volumes of data to track climate conditions. Meteorologists
extensively use Big Data to study the patterns of natural disasters,
prepare forecasts of weather, and the like.
• Healthcare: Big Data has significantly impacted the healthcare industry at
large. Healthcare providers and organizations have widely used Big Data
for various purposes, including predicting outbreaks of diseases,
detecting early symptoms of preventable diseases, e-records of health,
real-time cautioning, improving patient engagement, predicting and
preventing grave medical conditions, strategic planning, telemedicine
and research, and the like.
• Education: Many educational institutions have embraced the usage of Big
Data for improving curricula, attracting the best talent, and reducing
rates of dropouts by improving student outcomes, targeting global
recruiting, and optimizing the overall student experience.