0% found this document useful (0 votes)
11 views

Introduction To Big Data and Hadoop

Uploaded by

kshitijseven1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Introduction To Big Data and Hadoop

Uploaded by

kshitijseven1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to

Bigdata and Hadoop


Lecture 1
Big Data & Data Science

characterize big data

Big Data Challenges

Big data tools

Applications

Data Analysis & Data Analytics


Understanding
Big Data
Objectives:
➢ To understand what is big data
➢ To know various types of data
➢ To understand Examples
➢ To explore various applications of Big Data
What is Big Data

 Simply: Data of Very Big Size

 Can’t process with usual tools

 Distributed Architecture Needed

 Structured / Unstructured
TYPES OF COMPUTING
Distributed Parallel

➢ Groups of networked computers


➢ Interact with each other
➢ To achieve a common goal. ➢ Breaking down larger problems into smaller and
independent problems
➢ Often similar parts that can be executed simultaneously
by multiple processors
➢ Communicating via shared memory
Big Data & Data Science
❖ Big data is a blanket term for any collection of data sets so
large or complex that it becomes difficult to process them
using traditional data management techniques such as, for
example, the RDBMS (relational database management
systems).

❖ The widely adopted RDBMS has long been regarded as a one-


size-fits-all solution, but the demands of handling big data
have shown otherwise.

❖ Data science involves using methods to analyze massive


amounts of data and extract the knowledge it contains.
Data
Analysis Knowledge
BIG DATA
from large extraction
data set
How can we characterize big data?
How can we characterize big data?

Big data is commonly characterized


using a number of V's.
The Volume refers to the vast amounts of data that is
generated every second, minutes, hour, and day
first in our digitized world.

three Variety refers to the ever increasing different


forms that data can come in such as text, images,
are voice, and geospatial data.
Velocity refers to the speed at which data is
being generated and the pace at which data
moves from one point to the next.
Characteristics of Big Data
Veracity refers to the biases, noise, and abnormality in
data. Or, better yet, It refers to the often
immeasurable uncertainties and truthfulness and
trustworthiness of data.

Value understand the costs and benefits of collecting


and analyzing the data to ensure that ultimately the
data that is gained can be monetized.
Challenges
1. Dealing with data growth
• storing and analyzing all that
information.
• Much of that data is unstructured,
meaning that it doesn't reside in a
database.
• Documents, photos, audio, videos and
other unstructured data can be difficult
to search and analyze.
2. Generating insights in a timely manner
• Decreasing expenses through operational cost
efficiencies
• Establishing a data-driven culture
• Creating new avenues for innovation and disruption
• Accelerating the speed with which new capabilities and
services are deployed
• Launching new product and service offerings

3.Recruiting and retaining big data talent


• Job
4. Integrating disparate data sources
• Big data comes from a lot of different places —
enterprise applications, social media streams,
email systems, employee-created documents,
etc.
5. Validating data
• Often organizations are getting similar pieces of
data from different systems, and the data in
those different systems doesn't always agree.
• The process of getting those records to agree, as
well as making sure the records are accurate,
usable and secure, is called data governance
6. Securing big data
• attractive targets for hackers or advanced
persistent threats
7. Organizational resistance
• It is not only the technological aspects of big
data that can be challenging — people can be
an issue too
• Insufficient organizational alignment
• Lack of middle management adoption and
understanding
• Business resistance or lack of understanding
(41.0 percent)
Big Data
Applications
Crime
Prediction
and
Prevention Manufactur
Healthcare
ing

Cyber
Media &
security &
Entertainm
Intelligenc
ent
e

Big Data

Weather Internet of
Forecastin Things
g (IoT)

Traffic
Governme
Optimizatio
nt
n Pharmaceu
tical Drug
Evaluation
Big Data Applications: Healthcare
personalized medicine and
prescriptive analytics.

identify patterns related to drug side


effects, and gains other important
information that can help patients
and reduce costs.

wearable technologies that includes


electronic health record data,
imaging data, patient generated
data, sensor data, and other forms
of data.
Big Data Applications: Manufacturing
Product quality and defects tracking

Supply planning

Manufacturing process defect tracking

Output forecasting and increasing energy


efficiency

Testing and simulation of new manufacturing


processes

Support for mass-customization of


manufacturing
Big Data Applications: Media & Entertainment

Predicting what the audience


wants and scheduling optimization

Increasing acquisition and


retention

Content monetization and new


product development

Advertisement targeting
Big Data Applications: Internet of Things (IoT)
Data extracted
from IoT devices provides a
mapping of device inter-
connectivity.

increasingly adopted as a
means of gathering sensory
data, and this sensory data is
used in medical and
manufacturing contexts.
Big Data Applications: Government

efficiencies in terms of cost, productivity,


and innovation

the same data sets are often applied across


multiple applications & it requires multiple
departments to work in collaboration.

Government majorly acts in all the


domains, thus it plays an important role in
innovating Big Data applications in each and
every domain.
Cyber security & Intelligence

The federal government launched a


cyber security research and
development plan that relies on the
ability to analyze large data sets in
order to improve the security of U.S.
computer networks.

The National Geospatial-Intelligence


Agency is creating a “Map of the
World” that can gather and analyze
data from a wide variety of sources
such as satellite and social media
data.
Crime Prediction and Prevention
real-time analytics to provide
actionable intelligence that can
be used to understand criminal
behaviour, identify crime/incident
patterns, and uncover location-
based threats.

Weather Forecasting

The NOAA (National Oceanic and


Atmospheric Administration) gathers
data every minute of every day from
land, sea, and space-based sensors.
Daily NOAA uses Big Data to analyze
and extract value from over 20
terabytes of data.
Tax Compliance
analyze both unstructured and
structured data from a variety
of sources in order to identify
suspicious behavior and multiple
identities. This would help in
tax fraud identification.
Traffic Optimization

real-time traffic data gathered from road


sensors, GPS devices and video cameras.

The potential traffic problems in dense


areas can be prevented by adjusting
public transportation routes in real time.
Thank You

You might also like