0% found this document useful (0 votes)
21 views

Big Data Technologie

Uploaded by

sthasumit96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Big Data Technologie

Uploaded by

sthasumit96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Big Data Technologies - Overview

Topic Outline
• The lesson covers:
• Terminologies of Data Analytics
• State of the Practice in Analytics
• BI Versus Data Science
• Drivers of Big Data
• Emerging Big Data Ecosystem and a New
Approach to Analytics
• Key Roles for the New Big Data Ecosystem
• Analyst Perspective on Data Repositories
• Examples of Big Data Analytics
Learning Outcomes

• At the end of this topic, You should be


able to;

– Demonstrate the key concepts of big data technologies


and its terminologies
Concepts and Terminology
• As a starting point, several fundamental concepts and terms
need to be defined and understood:

• Terminologies • Frameworks

 Data Science  Hadoop


 Data Mining  Spark
 Datasets  Storm
 Machine Learning  Flume
 Algorithms • Programming languages
 Business Intelligence (BI)  R
 NoSQL  Python
 Data Warehouse  Java
 Cloud Computing  Scala
 Julia
Concepts and Terminology

• Data Science: Is the professional field that deals with


turning data into value, such as new insights or predictive
models.
• Data mining: defined as the process of discovering
patterns in data.
• Machine Learning: Is a type of artificial intelligent (AI)
that provides computers with the ability to learn without
being explicitly programmed.
Concepts and Terminology- cont’d

• Datasets
 Collections or groups of related data are generally
referred to as datasets.
 Each group or dataset member shares the same
set of attributes or properties as others in the
same dataset.
 Examples:
 Tweets stored in a flat file
 A collection of image files in a directory
 An extract of rows from a database table stored in a CSV formatted file
 Historical weather observations that are stored as XML files
Concepts and Terminology- cont’d

• Algorithms: A process or set of rules to be


followed in calculations or other problem-
solving operations, specially by computers.
• Business Intelligence (BI): is a technology-
driven process for analyzing data and
presenting actionable information to help
make informed business decisions.
Concepts and
Terminology- cont’d
• NoSQL database: provides a mechanism for
storage and retrieval of data that is modeled
in means other than the tabular relations used
in relational databases.
• Data Warehouse: is a collection of corporate
information and data derived from
operational systems and external data
sources.
Concepts and
Terminology- cont’d
• Cloud Computing: is an information
technology paradigm that enables ubiquitous
access to shared pools of configurable system
resources and higher-level services that can
be rapidly provisioned with minimal
management effort, often over the Internet.
(Mell & Grance, 2011).
Data Science Application
– Internet Search
– Digital Advertisements (Targeted Advertising and re-targeting)
– Recommender Systems
– Image Recognition
– Speech Recognition
– Gaming
– Price Comparison Websites
– Airline Route Planning
– Fraud and Risk Detection
– Delivery logistics

1–10
http://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
1–11
1–12
1–13
1–14
Big Data

1–15
History of Big Data

Big Data

“ Big Data is any data that is expensive to manage and


hard to extract value from”

Michael Franklin
Director of the Algorithms, Machines and People Lab
University of Berkeley

1–16
History of Big Data

1–17
History of Big Data

1–18
Big Data Analytics

Big data analytics is the process of


examining large data sets to uncover hidden
patterns, unknown correlations, market
trends, customer preferences and other useful
business information.

1–19
Big Data

1–21
Big Data

1–22
Big Data

Answer
Improve Return on Investment
Weather Preditaion
Biodiversity Trends
customer habits

1–23
Big Data

1–24
Big Data

> 90% of world's data was created in last two years!

https://www.sciencedaily.com/releases/2013/05/130522085217.htm

Date:
May 22, 2013

1–25
Big Data

What is changing in the land of Big Data ?

1–26
Big Data V’s

1–27
From 3Vs, 4Vs, 5Vs, and 6Vs big data

1–28
Analytics stack.

1–29
Big Data V’s to M’s
• Seeing competitors jumping onto the Big Data
bandwagon, many organisations follow suit.

• Some realising the struggle of keeping up with its


maintenance.

• Hidden costs and processes emerge which either


slows organisations, or splits it up into silos.

• Many forgetting or violating the four Ms of big


data: Make Me More Money . 1–30
Emerging Technologies on Big Data Technologies

https://www.dataversity.net/big-data-trends-in-2020/

1–31
Emerging Technologies

1–32
Emerging Technologies

1–33
Most Popular Data Science's Tools

https://www.linkedin.com/pulse/20140903194459-57656293-the-data-science-skills-network
1–34
Useful Books and
Resources
• Big Data For Dummies by Judith Hurwitz, Alan
Nugent, Fern Halper, and Marcia Kaufman.
• Hadoop, The Definitive Guide [pdf]
• http://bigdata.andreamostosi.name/
• Big Data Fundamentals Concepts, Drivers &
Techniques.
• http://www.sciencedirect.com/science/article
/pii/S0306437914001288
Quick Review Question
• What exactly is Big Data?
• What are the biggest challenges of big data?
Creating / collecting the right data? Identifying
/ blending multiple external data sources?
• What is difference between BI and Data
Science?
• How big data analysis helps businesses
increase their revenue?

You might also like