0% found this document useful (0 votes)
14 views

Wk1_Overview of Data Analytics and Big Data

The document discusses the concept of Big Data, which refers to large and complex datasets that traditional tools struggle to process, characterized by Volume, Velocity, Variety, and Veracity. It outlines the sources of data generation, the distinction between structured and unstructured data, and the four types of data analytics: Descriptive, Diagnostic, Predictive, and Prescriptive. Additionally, it highlights various applications of Big Data, including targeted advertising, personalization, and predictive policing.

Uploaded by

s1711377
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Wk1_Overview of Data Analytics and Big Data

The document discusses the concept of Big Data, which refers to large and complex datasets that traditional tools struggle to process, characterized by Volume, Velocity, Variety, and Veracity. It outlines the sources of data generation, the distinction between structured and unstructured data, and the four types of data analytics: Descriptive, Diagnostic, Predictive, and Prescriptive. Additionally, it highlights various applications of Big Data, including targeted advertising, personalization, and predictive policing.

Uploaded by

s1711377
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Analytics and Big Data

Richard Lui

1
The Big Data Era
• Data: Any piece of information stored and/or processed by a computer or mobile device.
• Companies/Organizations are generating and keeping more and more data
• The term "Big Data" was coined by John Mashey in 1990s to describe data that is too
vast and complex for traditional tools to handle.

1.44 megabytes (MB)

4000-year old clay disk

Video: Facebook Data Center


1 terabytes (TB) = 1,048,576 MB
Over a hundred petabytes of photos and videos data (1,024 terabytes (TB))
Where are the data coming from?
• Your every interaction with your computer or phone
• Your every interaction on social media
• Every time you walk down the street with a phone in your pocket, it’s
tracking your location through GPS sensors
• Every time you buy something with your credit cards or octopus card
• Every time you read an article online
• Every time you stream a song, movie or podcast
• …

3
Explosion of data
• Exponential growth of the Internet and World Wide Web
• Transactions and interaction of users with e-commerce and
mobile applications
• Social network activities
• E.g. YouTube, Facebook, Instagram, Twitter
• Companies collect and store a large volume of data from
different types of users
• E.g. Google, Baidu, Netflix, Uber
• Internet of Things (IoT) and wireless sensors
• Smart watch, thermostat, water heaters, smoke detectors, …

A chart which provides an overview of what happens online every minute


https://www.socialmediatoday.com/news/what-happens-on-the-internet-every-
minute-2021-version-infographic/607586/
4V of Big Data
• Volume
• A huge amount of data
• Velocity
• High speed and continuous flow of data
• Variety
• Different types of structured, semi-structured and unstructured data coming
from heterogenous sources
• Veracity
• Data may be inconsistent, incomplete and messy
Data Analytics
• Data Analytics refers to the technologies and processes that turn raw data into
insight for making decisions and facilitates drawing conclusion from data

{
"timestamp":"2022-08-12 03:01:58.732726",
"user_id":"35",
"click_id":“15cf179b9c9d483a…",
"event_name":"Search",
"user_ip":"11.22.33.44",
"additional_data":{
“engagement_time":40,
"product_id":12345
Clickstreams in an }
e-commerce website

6
Structured vs. Unstructured data
• Structured data
• Data conforms to a data model or schema and is often stored in tabular form.
• Unstructured data
• Data that does not conform to a data model or data schema is known as unstructured data.
• Estimated to makes up 80% of the data within any given enterprise.
• Semi-structured data
• Non-tabular structure, but conform to some level of structure

Unstructured data Semi-structured data

Structured data
7
Are the data structured/unstructured?

It’s estimated that 90% of the big data we generate is unstructured! 8


Four data analytic capabilities

Source: Gartner's 2017 Planning Guide for Data and Analytics.


9
Descriptive Analytics
• What has happened?”
• Example
• What was the sales volume over the past 12 months?

10
Diagnostic Analytics
• Cause of a phenomenon that occurred in the past
• Example
• Why were Q2 sales less than Q1 sales?

11
Predictive Analytics
• Generate future predictions based upon past events.
• Example
• What is the predicted sales in the next month?

12
Visualization
• Creation and study of the visual representation of data
• One of the most important tools for data analytics
• Dashboard: A read-only snapshot of an analysis that you can share with other users for reporting
purposes.

https://www.gapminder.org/fw/world-health-chart AWS QuickSight


https://aws.amazon.com/quicksight
13
Applications of Big Data
• Coca Cola use data to create new products, like Cherry Sprite, based on consumer preferences.
• Targeted advertising on platforms like Facebook is made possible through categorizing users based
on their data.
• The 2016 U.S. presidential campaign used Big Data to target specific groups of voters with
tailored ads.
• Netflix's algorithm for recommending shows and movies based on user preferences.
• Google Maps uses real-time data from users' locations and speeds to predict traffic conditions.
• Alibaba's City Brain initiative in Hangzhou, China, uses data to manage city traffic and
infrastructure.
• Personalize medicines by sequencing a patient’s genome, and predicting which medicine will have
the fewest side effects.

Video: Intro to Big Data: Crash Course Statistics #38

14
How Facebook track your data?
• Facebook has 2.89 billion active users, as of the second quarter of 2021 (Source: Statistica)
• Collect, store and analyze users data and behavior
• Suggest posts and advertisement which match the users’ preference
• Collected data
• Age, gender, Hobbies and recent experiences
• Posts and pages liked by user
• "People You May Know" feature
• phone contacts and shared locations
• Users' political activities, such as protests and marches attended
• Facebook partners with data brokers to gather information about users' purchases.
• Even offline transactions, like credit card payments, can be linked to user profiles, leading to targeted ads.

Video: How Facebook Tracks Your Data


Example: Facebook advertising

https://www.facebook.com/help/794535777607370?ref=learn_more_ipl
16
Artwork Personalization at Netflix
• Artwork selection is crucial to encourage members to engage with unfamiliar titles.
• Netflix personalized the image we use to depict the movie “Good Will Hunting”
• Someone who has watched many romantic movies => show the artwork containing Matt Damon and
Minnie Driver
• A member who has watched many comedies => use the artwork containing Robin Williams, a well-known
comedian.

https://netflixtechblog.com/artwork-personalization-c589f074ad76 17
Data analytic in Healthcare
• Metrics: patient falls with injury, average length of stay, and patient recommendations, etc.
• Create interactive dashboards
• Allow clinicians to analyze their performance and outcomes.
• Highlight areas of improvement in patient care.
• Deliver better and safer patient care.

The SEPTEE model

Video: What it's like to be a Healthcare Data Analyst 18


Predictive policing
• Video: How predictive policing software works
• The use of data to anticipate and prevent crime.
• Hotspot analysis
• Utilizing data from past crimes to forecast the likelihood of crime in each grid during the next
shift
• Placing police officers in these hotspots to prevent future crimes.

19
Case Study: How Cops Are Using Algorithms to
Predict Crimes
• Los Angeles Police Departments (LAPD) are using data-driven algorithms to forecast future crimes.
• Predicts violent crime occurrences and potential perpetrators using historical crime, arrest, and field data.
• PredPol: A predictive policing tool utilized by over 60 departments
• Identifies areas or "hotspots" with a higher likelihood of criminal activity
• Officers are directed to specific hotspots identified by PredPol's algorithm, which analyzes historical crime data
and creates hotspots.
• Drone surveillance and facial recognition-equipped body cameras
• Stop LAPD Spying Coalition argue that such strategies disproportionately target low-income and
communities of color.

Video: How Cops Are Using Algorithms to Predict Crimes


20
Summary
• Data analytics refers to technologies and processes that turn raw data into insights for decision
making.
• "Big Data" describes large, complex datasets that are difficult for traditional tools to process.
• Volume, Velocity, Variety, Veracity.
• Structured vs unstructured data. Unstructured makes up estimated 80% of enterprise data.
• 4 types of analytics: Descriptive, Diagnostic, Predictive, Prescriptive.
• Visualization is crucial for exploring and communicating insights from data.
• Applications of big data
• Targeted Advertising, Personalization, Predictive Policing, etc

21

You might also like