Session 2 - Foundations of Data and Information - 2024
Session 2 - Foundations of Data and Information - 2024
BUSINESS MANAGEMENT
Foundations of Data
and Information
M.Sc. Thien Nguyen
Email: [email protected]
Phone: 0949088908
Agenda
1. Types of data
2. Data quality and integrity
3. Data collection
4. Data Analytics
5. Problem Solving Using Data
What is Data Analytics?
3
Review: What is Data Analytics
Data analytics is the process of analysing raw data in order to draw out
meaningful, actionable insights
(Source: https://careerfoundry.com/en/blog/data-analytics/what-is-data-analytics/)
Source: https://medium.com/codex/life-cycle-of-a-data-analytics-project-954d0e6926fe
4
Review: What is Data Analytics
Source: https://vitalflux.com/what-are-actionable-insights-examples-
concepts/ Source: https://www.softwaretestinghelp.com/data-analytics-companies/
5
1. Understanding different
types of data
6 Type of Data in Statistic & Research: Key in Data Science
Qualitative:
● Nominal (định danh)
● Binary (định danh True/False)
● Ordinal (thứ tự)
Quantitative:
● Discrete (rời rạc)
● Continuous (liên tục)
● Interval (khoảng)
Source: https://www.intellspot.com/data-types/
7
6 Type of Data in Statistic & Research: Key in Data Science
Source: https://www.intellspot.com/data-types/ 8
6 Type of Data in
Statistic &
Research: Key in
Data Science
Source: https://www.intellspot.com/data-types/ 9
6 Type of Data in
Statistic &
Research: Key in
Data Science
Source: https://www.intellspot.com/data-types/ 10
Terminology: List of common terminology related to data and
commonly-used in class for students to read at home
● Data is most straightforward to analyse if it forms a single data table.
● A data table consists of observations and variables.
● Observations are also known as cases.
● Variables are also called features.
● A dataset is a broader concept that includes, potentially, multiple data tables with different
kinds of information to be used in the same analysis
Observations and variables
Source: https://www.statology.org/observation-in-statistics/
Best practices
2. Naming conventions
a. E.g. 1: SalesOrders, ProductID, CountOfOrders
i. Capitalize each word
ii. No space (can use “_” underscore)
b. E.g. 2: SalesOrders_v1, SalesOrders_v2 → NO
i. SalesOrders_Raw / SalesOrders_Cleaned / SalesOrders_Temp
ii. SalesOrders_Dao / SalesOrders_Minh
Best practices
Source:https://www.linkedin.com/pulse/hindsight-insight-foresight-key-ingredients-effective-woods
23
To succeed, we need all three!
Source: https://www.linkedin.com/pulse/hindsight-insight-foresight-patrick-mcdonald
24
5. Problem Solving Using Data
WHY is Problem-Solving using DATA?
26
WHY Problems to Solve using DATA?
2. DETECTING ABNORMALITIES
A fitness tracker company wants to help customers spot something unusual
with their health, e.g., resting heart rates.
3. CATEGORIZING
A retailer wants to have customized promotions for different customers
27
WHY Problems to Solve using DATA?
5. DISCOVERING CONNECTIONS
A beauty company wants to cross-sale its products based on
past customer purchases
6. FINDING PATTERNS
A supermarket wants to learn more about its customers frequency and
purchasing behaviors throughout the week
28
How to SOLVE PROBLEMS Using DATA?
STEP 5 – COMMUNICATION
29
Group Activities
THANK YOU