0% found this document useful (0 votes)
39 views

1-Big Data Analytics

The document provides an introduction to big data analytics, defining big data and its key characteristics of volume, variety, and velocity. It discusses how data science differs from business intelligence in its use of predictive analytics and ability to analyze large and complex unstructured data sources. The document also outlines four main business drivers that are pushing organizations to adopt more advanced analytics approaches.

Uploaded by

Bill Gideons
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

1-Big Data Analytics

The document provides an introduction to big data analytics, defining big data and its key characteristics of volume, variety, and velocity. It discusses how data science differs from business intelligence in its use of predictive analytics and ability to analyze large and complex unstructured data sources. The document also outlines four main business drivers that are pushing organizations to adopt more advanced analytics approaches.

Uploaded by

Bill Gideons
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

-1- Introduction to Big Data Analytics

Lecture 1: Introduction to BDA 1


Lecture 1:
Introduction to Big Data Analytics
Upon completion of this module, you should be able to:
• Define big data: What is it? its Characteristics.
• Identify four business drivers for advanced analytics
• Distinguish the techniques for Business Intelligence from Data
Science
• Describe the role of the Data Scientist within the new big data
ecosystem

Lecture 1: Introduction to BDA 2


Big Data
● What is it?
● What makes Data, “Big Data”

Lecture 1: Introduction to BDA 3


Big Data Defined

●“ B ig Data” is data where s cale, dis tribution,


divers ity, and/or timelines s require the us e of new
tec hnic al arc hitec tures and analytic s to enable
ins ig hts that unlock new s ources of bus ines s value.
○ Requires new data architectures
○ New tools
○ New analytical methods
○ Integrating multiple skills into new role of data scientist

• Org anizations are deriving bus ines s benefit from


analyzing ever larg er and more complex data s ets
that inc reas ing ly require real-time or near-real time
capabilities .

Lecture 1: Introduction to BDA 4


Big Data Defined: Characteristics or V’s
Big Data is sometimes described as having 3 characteristics or Vs:
Volume, Variety, and Velocity.

●Main characteristics of big data: (Named V’s)


○ Huge volume of data (Volume): Rather than thousands or millions of
rows, Big Data can be billions of rows and millions of columns.
○ Complexity of data types and structures (Variety): Big Data reflects
the variety of new data sources, formats, and structures, including digital
traces being left on the web and other digital repositories for subsequent
analysis. With an increasing volume of unstructured data (80-90% of the data
in existence is unstructured)
○ Speed of new data creation and growth (Velocity): Big Data can
describe high velocity data, with rapid data ingestion and near real time
analysis.

Source: McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity

Lecture 1: Introduction to BDA 5


Key Characteristics of Big Data
1. Data Volume
○ 44x increase from 2009 to 2020
(0.8 zettabytes to 35.2zb
1 ZB = 10007bytes = 1021byte

2. Speed or velocity of new data creation


3. Data Structure
○ Greater variety of data structures to mine and analyze

Lecture 1: Introduction to BDA 6


Big Data Characteristics: V’s

Lecture 1: Introduction to BDA 7


Big Data Characteristics: Data Structures
Data Growth is Increasingly Unstructured

• Data containing a defined data type, format, structure

• Example: Transaction data and OLAP

• Textual data files with a discernable pattern, enabling parsing


More Structured

• Example: XML data files that are self describing and defined
by an xml schema

• Textual data with erratic data formats, can be formatted


with effort, tools, and time

• Example: Web clickstream data that may contain some


inconsistencies in
• data values and formats

• Data that has no inherent structure and is usually


stored as different types of files.

• Example: Text documents, PDFs, images and


video

Lecture 1: Introduction to BDA 8


Four Main Types of Data Structures
Complex & Varied of Data Structures

Lecture 1: Introduction to BDA 9


From Data Analytics to Big Data Analytics

Lecture 1: Introduction to BDA 10


Business Drivers for Analytics
Current Business Problems Provide Opportunities for Organizations to
Become More Analytical & Data Driven
Driver Examples
1
Desire to optimize business
Sales, pricing, profitability, efficiency
operations

2
Desire to identify business risk Customer churn, fraud,

3
Attempt to make a more profitable sale,
Predict new business opportunities
best new customer prospects

Lecture 1: Introduction to BDA 11


Analytics

• Decision makers may choose to make decisions based on past experiences


or rules of thumb, but unless data is considered, it would not be an
analytical decision-making process.

Lecture 1: Introduction to BDA12


Data Analytics

• Suppose your street ice cream vendor stopped servicing your street but still
serving the next street. Then one day you asked him/her why?
• He / She tells you that your street customers continually bargain and hence
he loses a lot of money and time, but on the street next to yours he has
some great customers for whom he provides excellent service in short
time.
• Your ice cream vendor TESTED servicing your street and within one month
he/she DECIDED to stop servicing your street, and even if you ask him/her,
he/she will not show up.
• Because he/she analyzed the figures of his/her expenses against the income
in your street and realized that he/she is losing money and time. The
vendors used some kind of data analytics and decided not to servicing your
street.
• Can you tell us a real story in which you took a decision based on some
kind of data analytics? 13
Lecture 1: Introduction to BDA
Business Intelligence Versus Data Science

• There are two types of data analytics:


– Descriptive: its purpose is to summarize/describe what happened,
– Predictive: its purpose is to forecast or predict what might happen in the future
• Business Intelligence Technology (BI) is very useful for descriptive analytics
– Useful for closed-ended and explanation of current or past behavior, typically by
aggregating historical data and grouping it in some way.
– Provides hindsight and some insight and generally answers questions related to
“when” and “where” events occurred through reports, dashboards, and queries on
business questions for the current period or in the past.
• Data Science (DS) combines statistics, mathematics and computing concepts,
methodologies and tools to undertake predictive analytics on big data
– to see patterns,
– to discover relationships, and
– to make sense of stunningly varied images and information.
• Data Science developed for undertaking Analytics on Big Data as Physics,
Chemistry and Biology sciences were developed to study physical environment,
chemical elements, and living things respectively

Lecture 1: Introduction to BDA 14


Business Intelligence Versus Data Science
Analytical Approaches for Meeting Business Drivers

Predictive Analytics & Data Mining


(Data Science)
Typical • Optimization, predictive modeling, forecasting,
Techniques & statistical analysis
Data Types • Structured/unstructured data, many types of
sources, very large data sets
High Common • What if…..?
Questions • What’s the optimal scenario for our business ?
• What will happen next/ predict ? What if these
trends continue? Why is this happening?

Data
Science
Business Intelligence
BUSINESS Typical • Standard and ad hoc reporting, dashboards,
VALUE Techniques & alerts, queries, details on demand
Data Types • Structured data, traditional sources,
Business manageable data sets
Intelligence
Common • What happened last quarter?
Questions • How many did we sell?
• Where is the problem? In which situations?

Low

Past TIME Future

Lecture 1: Introduction to BDA 15


Example of BI Queries

• Find the courses whose grades increased by 5% compared to the


two last year,
• Identify the schools from which the best students came from in
the last three years,
• Find out three of the most frequent reasons for which students
left university compared to the last two years,
• Find cities whose purchases grew by more than 20% during the
specified 3-month period, versus the same 3-month period last
year,
• Find the shares in sales for the same period a year ago and then
calculate the change in share between the two years.

Lecture 1: Introduction to BDA 16


Example of DS Queries

• How much increase in students enrolment for next year,


• How many faculty should we recruit in the next three years,
• How to reduce to 4% the rate of students leaving the university in
the next two years,
• How to increase students enrollment in a particular
course/program?
• What happens if we change the lecture starting time to 7am?
• What happens if we reduce car parking slots in the next two
years?
• How to attract customers to buy our products?

Lecture 1: Introduction to BDA 17


Big Data Analytics

• Today, businesses of all sizes use data analytics to answer complex


queries and take decisions.
• If the ice cream vendor can answer why he stopped serving your
street, How many Big businesses with thousands/millions of
customers today could answer questions like:
– Who their MOST PROFITABLE CUSTOMERS are?
– Do they know who their MOST COST GENERATING customers are?
– How should they target their efforts to ACQUIRE the MOST
PROFITABLE customers?
• These questions are very difficult to answer when data is growing
exponentially in today’s internet, social networks, sensors etc. to
become what is known as Big Data.

Lecture 1: Introduction to BDA 18


Considerations for Big Data Analytics
Criteria for Big Data Projects

1. Speed of decision making


2. Throughput
3. Analysis flexibility

Lecture 1: Introduction to BDA 19


Video on the Use of Big Data Analytics

• How Netflix uses Data Analytics to Launch a new TV Series?

20
The Data in Big Data?

Lecture 1: Introduction to BDA 21


The Big Data?
• Very useful Data is created constantly,
and at an ever-increasing rate through
the internet and different electronic
devices:
– Mobile phones, social media e.g.
Facebook, twitter, etc.
– Imaging technologies to determine
a medical diagnosis,
– Devices and sensors automatically
generate diagnostic information
that needs to be stored and
processed in real time.
• This huge data streams of records,
documents, messages, images and
videos generated instantly in billions
per seconds/minutes/hours/day is Big
Data
Lecture 1: Introduction to BDA 22
Opportunities for a New Approach to Analytics
New Applications/Tools Driving Data Volume

MEASURED IN MEASURED IN WILL BE MEASURED IN


LARGE TERABYTES PETABYTES EXABYTES
1TB = 1,000GB 1PB = 1,000TB 1EB = 1,000PB
VOLUME OF INFORMATION

SMALL

1990’s 2000’s 2010’s


(RDBMS & DATA (CONTENT & DIGITAL ASSET (NO-SQL & KEY/VALUE)
WAREHOUSE) MANAGEMENT)

Lecture 1: Introduction to BDA 23


Opportunities for a New Approach to Analytics
New Applications/Tools Driving Data Volume
These data come from multiple sources, including:
• Medical Information, such as genomic sequencing and MRIs
• Increased use of broadband on the Web – including the 2 billion
photos each month that Facebook users currently upload as well as
the innumerable videos uploaded to YouTube and other multimedia
sites
• Video surveillance (airport)
• Increased global use of mobile devices – the torrent of texting is not
likely to cease
• Smart devices – sensor-based collection of information from smart
electric grids, smart buildings and many other public and industry
infrastructure
• Non-traditional IT devices – including the use of RFID readers, GPS
navigation systems

The Big Data trend is generating an enormous amount of information


that requires advanced analytics and new market players to take
advantage of it.

Lecture 1: Introduction to BDA 24


Opportunities for a New Approach to Analytics
New Applications for Big Data Ecosystem

1
Data
Devices
Individual

Analytic Medical Information


Services Brokers Advertising Marketers Employers
Law
Enforcemen
t Government Internet
2
Data Websites
3
Collectors Data
Blue circle collect data Aggregators (dark
from devices and users grey circle make
sense of the data
collected)
Data
Users/Buyers
Catalog
4 Co-Ops
Phone/TV Retail
Media

Private
Media Credit List Investigators
Archives Bureaus Financial Brokers Delivery /Lawyers
Banks Service
Government

Lecture 1: Introduction to BDA 25


Data form Devices… 1

• Data devices and the “Sensornet” gather data from multiple


locations and continuously generate new data about this data.
• For each gigabyte of new data created, an additional petabyte
of data is created about that data.
• Consider someone playing an online video game through a PC,
game console, or smartphone.
• In this case, the video game provider captures data about the skill
and levels attained by the player fine-tune the difficulty of the
game,
• suggest other related games that would most likely interest the
user, and
• offer additional equipment and enhancements for the character
based on the user’s age, gender, and interests

Lecture 1: Introduction to BDA 26


Data From Devices…2
• Smartphones provide another rich source of data.
– In addition to messaging and basic phone usage, they store and transmit
data about Internet usage, SMS usage, and real-time location.
– This metadata can be used for analyzing traffic patterns by scanning the
density of smartphones in locations to track the speed of cars,
– The relative traffic congestion on busy roads.
– GPS devices in cars can give drivers real-time updates and offer alternative
routes to avoid traffic delays.
• Retail shopping loyalty cards record not just the amount an individual spends,
– but the locations of stores that person visits, the kinds of products
purchased, the stores where goods are purchased most often,
– the combinations of products purchased together.
– Insights into shopping and travel habits and the likelihood of successful
advertisement targeting for certain types of retail promotions.

Lecture 1: Introduction to BDA 27


Data Scientist Profile:
Skills Needed In the New Data Ecosystem

• What new skill sets do you need to take advantage of


the big data?
• Do most large organizations have people with these
skill sets?
• If so, who are they?

Lecture 1: Introduction to BDA 28


Three Key Roles of the New Data Ecosystem

Data Role Role Description


Scientists People with advanced training in
Projected U.S. Deep Analytical quantitative disciplines, such as
talent gap: Talent mathematics, statistics, and machine
140,000 to learning.
190,000 People with a basic knowledge of statistics
Data Savvy
and/or machine learning, who can define
Professionals
Analysts & key questions that can be answered using
Data Savvy advanced analytics
Managers People providing technical expertise to
Projected U.S. Technology & Data support analytical projects. Skills sets
talent gap: 1.5 Enablers including computer programming and
million database administration

Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation,
competition, and productivity

Lecture 1: Introduction to BDA 29


Data Scientist Key Activities

Data Scientists
Data Data Bl
● Reframe business Engineers Analyst Analyst
challenges as analytics
challenges Analytic Productivity Platform
● Design, implement and Data
Platform
deploy statistical models Tools & Services Admin

and data mining


Infrastructure
techniques on big data
● Create insights that lead
to actionable
recommendations

Lecture 1: Introduction to BDA 30


Profile of a Data Scientist
Quantitative skills, such
as mathematics or
Technical aptitude,
statistics
such as software Quantitative
engineering, Curious & Creative,
machine learning, must be passionate
and programming Technical Curious about data and
skills. & finding creative ways
Creative to solve problems
and portray
information
Skeptical examine their Communicative &
Communicative
work critically rather Skeptical & Collaborative Collaborative: articulate
than in a one-sided way. the business value in a
clear way, and work
collaboratively with
project sponsors and key
stakeholders.

Lecture 1: Introduction to BDA 31


Big Data Analytics Case Examples

Lecture 1: Introduction to BDA 32


Big Data Analytics: Industry Examples

1
Health Care
• Reducing Cost of Care Medical

2 Public Services Government Internet

• Preventing Pandemics
3 IT Infrastructure Data
• Unstructured Data Analysis Collectors

4 Online Services
• Social Media for Professionals
Phone/TV Retail

Financial

Lecture 1: Introduction to BDA 33


1/ 2 Big Data Analytics: Health/Public Services

• Threat of global pandemics has increased exponentially


Situation • Pandemics spreads at faster rates

• Created a network of viral listening posts


• Combines data from viral discovery in the field, research in
Use of Big Data disease hotspots, and social media trends
• Using Big Data to make accurate predictions on spread of new
pandemics

• Identified a fifth form of human malaria, including its origin


Key • Identified why efforts failed to control swine flu
Outcomes • Proposing more proactive approaches to preventing outbreaks

Lecture 1: Introduction to BDA 34


4
Big Data Analytics: Online Services

Situation • Opportunity to create social media space for professionals

• Collects and analyzes data from over 100 million users


Use of Big Data
• Adding 1 million new users per week

• LinkedIn Skills, Job Recommendations, Recruiting


Key
Outcomes • Established a diverse data scientist group, as founder believes
this is the start of Big Data revolution

Lecture 1: Introduction to BDA 35


Check Your Knowledge

Lecture 1: Introduction to BDA 36


Check Your Knowledge: (From the Textbook)
1. What are the most important characteristics of Big Data, (slide
5-7) and what are the main considerations in processing Big
Data? slide 19 / TextBook page 11
2. Explain the differences between BI and Data Science. slide
15/TextBook page 12 & 13
3. Describe the challenges of the current analytical architecture
for data scientists.
4. What are the key skill sets and behavioral characteristics of a
data scientist? slide 33

Lecture 1: Introduction to BDA 37

You might also like