0% found this document useful (0 votes)
9 views13 pages

Approaches in data analysis [Slides] [Re-brand]

The document provides an overview of data and data analytics, detailing the processes involved in data analysis and data science, including problem statement, data collection, cleaning, exploratory analysis, and model building. It distinguishes between quantitative and qualitative data analysis, emphasizing their importance in understanding complex phenomena. Additionally, it outlines different types of analytics—descriptive, diagnostic, predictive, and prescriptive—based on the goals of the analysis.

Uploaded by

tino.kwabena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Approaches in data analysis [Slides] [Re-brand]

The document provides an overview of data and data analytics, detailing the processes involved in data analysis and data science, including problem statement, data collection, cleaning, exploratory analysis, and model building. It distinguishes between quantitative and qualitative data analysis, emphasizing their importance in understanding complex phenomena. Additionally, it outlines different types of analytics—descriptive, diagnostic, predictive, and prescriptive—based on the goals of the analysis.

Uploaded by

tino.kwabena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

An introduction to data and data analytics

Approaches in data analysis


Please do not copy without permission. © ALX 2024.
An introduction to data and data analytics

The data analysis process

| The data analysis process provides a framework for investigating, cleaning, and
transforming data to extract useful information and insights.

_Problem_ _Data_ _Exploratory_ _Gather_


_Data cleaning_
_statement_ _collection_ _data analysis_ _insights_

State the problem Find the right data Remove, fix, and Understand the Gather and report
or hypothesis sources filter data findings

This approach to data analysis helps us to discover meaningful patterns, relationships, and trends and
helps us make informed and robust decisions.
2
An introduction to data and data analytics

The data science process

| The data science process is a systematic approach to transforming a data problem into a
data-driven solution.

The first few steps are similar


to what we do in data
analysis.

_Problem_ _Data_ _Exploratory_


_Data cleaning_
_statement_ _collection_ _data analysis_

State the problem Find the right data Remove, fix, and Understand the
or hypothesis sources filter data _Model_ _Model_
_building_ _deployment_
Various forms of this process are used across different data
Select features, Deploy, test, and
disciplines, including data analytics, science, and engineering, under
build, train, and communicate
various names, such as OSEMN and CRISP-DM. validate
3
An introduction to data and data analytics

Overview
It is important that we are able to make Problem statement
informed decisions and derive
appropriate insights from data.

Model Data collection


We therefore need a structured deployment
framework for working with data and
extracting valuable insights from it. Data science
process
Model building Data cleaning

How our data science or data analytics


process is applied and interpreted Exploratory data
depends on several factors, including Gather insights
analysis
whether we are doing a quantitative or
qualitative analysis, and whether we
need hindsight, insight, foresight, or
context.

4
An introduction to data and data analytics

Quantitative and qualitative data analysis

|
Quantitative and qualitative data analysis are important because they enable us to gain a
more comprehensive understanding of complex phenomena and make data-driven
decisions.

Quantitative data analysis involves numerical Qualitative data analysis involves exploring
measurement and statistical analysis. patterns and themes in non-numerical data.

It allows us to measure and analyze numerical data It allows us to explore and interpret
using statistical methods, enabling us to identify non-numerical data, such as text, images, or
patterns, trends, and relationships between videos.
variables.
It is useful for understanding the context of a
It is useful for making predictions, testing problem and people’s attitudes, behaviours, etc.
hypotheses, and identifying cause-and-effect
relationships.

Both types of analysis are important because they provide different ways of understanding and
interpreting data.
5
An introduction to data and data analytics

Problem statement

| The problem statement helps us define the scope and objectives of our analysis and ensures
that our insights are relevant.

A problem statement identifies the gap between the current (problem) state and the desired (outcome)
state. It should be specific, brief, concise, clear, unbiased, and measurable.

A problem statement may also be in the form of a hypothesis, which is a proposed cause and effect for a
particular phenomenon or problem which has not yet been proven correct.

Examples:

Statement: We need to report on Hypothesis: The estimated water and Question: How much water and
estimated water and electricity electricity income from domestic electricity income can we expect from
income from different customer customers is 30% lower than from commercial customers per month?
groups. other customers.

6
An introduction to data and data analytics

Data collection

| Data collection includes identifying and acquiring applicable data sources, both internally and
externally, which can help answer the problem statement.

We can use company data or open-source data, or collect our own data depending on the nature of our
problem and the analysis we would like to do.

_Examples:_

Data acquired from surveys such as Queried data from databases or APIs Downloaded data from open sources
market research and customer (Application Programming Interfaces) and cloud repositories such as
satisfaction surveys. such as sales data and employee general census data.
information.

7
An introduction to data and data analytics

Data cleaning

| Data cleaning, also known as data wrangling, involves transforming raw data into usable
formats.

We can use several cleaning techniques to ensure that our data are indeed accurate and of the required
quality. If our data are inaccurate, so will our insights be.

_Examples:_

Using spreadsheets or a programming Using regular expressions for pattern Using data visualization tools such as
language to remove irrelevant matching and replacing data. PowerBI or spreadsheets for identifying
observations, handle missing values, fix outliers and anomalies.
structural issues, etc.

8
An introduction to data and data analytics

Exploratory data analysis

| Exploratory data analysis (EDA) is an approach used to summarize the main characteristics of a
dataset using aggregations, fundamental statistics, and visualization techniques.

Before we can gather insights or build a model, we first need to understand our data. We can use
non-graphical methods, such as descriptive statistics and correlation, or graphical (visualization)
methods to investigate our data.

_Examples:_

Descriptive statistics Standard dev.


Aggregations Count
Measures of central tendency Mean
Measures of distribution Kurtosis
Correlation Pearson Bar Scatter Density Violin

9
An introduction to data and data analytics

Univariate and multivariate analysis

| In EDA, we either do a univariate or multivariate analysis, depending on what we want to


investigate.

Univariate analysis is the In a .multivariate analysis.


exploration of individual we're more interested in the
variables in a dataset, i.e. we relationship between the
only consider one variable at a different variables of our
time. dataset.

Non-graphical Graphical Non-graphical Graphical

We can use descriptive We can use visualizations We use correlation to We can use visualizations
statistics such as the such as histograms, understand the strength such as heatmaps, scatter
standard deviation, central density plots, and box plots and direction between plots, and pair plots to
tendency, and measures of to understand the variables. investigate the relationship.
distribution. characteristics of a
variable.

10
An introduction to data and data analytics

Gather insights

| The last step in the data analysis process is to gather and report the insights derived from the
analysis.

Gathering and reporting insights may form part of the data science process as well, and is often known as
data dissemination.

Insights may be gathered in and reported to stakeholders through dashboards and reports that include text
and data visualizations.

Examples:

Using spreadsheets or a programming Using data visualization tools such as


language to summarize data and PowerBI or spreadsheets to visualize
construct insights to form a report. and report the insights.

11
An introduction to data and data analytics

Model building Model deployment

|
Model building involves selecting an Model deployment involves integrating
appropriate algorithm and training the the model into a large system or
model on the data. application.

The model-building and deployment phases are more often applicable to data science than data analysis.
Model building often involves reiteration since a model will rarely give us the results we seek on the first
try. This means that we train and test a model until we’ve found a suitable model before deploying it into a
larger system.
Some common tools and skills required for data Select features
collection include:
A

Machine learning libraries Deep learning libraries Deploy the model E B Build model
such as Scikit-learn and such as Keras and PyTorch Regression, classification,
TensorFlow for building for building neural or other ML model
models in Python. networks in Python.
Validate the results D C Train model

12
An introduction to data and data analytics

Type of analytics

| The type of analytics we apply depends on our goal and prescribes our approach to the
data analytics or data science process.

Descriptive Diagnostic Predictive Prescriptive

Hindsight Insight Foresight Context

Used to describe what has Used to determine why Used to forecast what will Used to recommend the
happened in the past. something has happened happen in the future. best course of action for a
in the past. given situation.
It's a summary of historical Uses statistical models and
data that provides insights Helps organizations machine learning Uses advanced algorithms
into patterns, trends, and understand the factors that algorithms to identify and optimization
relationships within the contributed to a particular patterns and trends in techniques to suggest the
data outcome. historical data to predict most optimal solution
future outcomes. based on a variety of
Examples: Dashboards Examples: Data mining and factors and constraints.
and reports. drill-down analysis. Examples: Forecasting and
risk modelling. Examples: Optimization.
13

You might also like