0% found this document useful (0 votes)
19 views

2 Data Analytics

The document provides an introduction to data analytics, covering topics such as the overview of data analytics, types of data analytics including descriptive, diagnostic, predictive, and prescriptive analytics, the data analytics lifecycle, building data analytics models, and machine/deep learning and data visualization. The document contains detailed information on these data analytics concepts across multiple sections and pages.

Uploaded by

melkter3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

2 Data Analytics

The document provides an introduction to data analytics, covering topics such as the overview of data analytics, types of data analytics including descriptive, diagnostic, predictive, and prescriptive analytics, the data analytics lifecycle, building data analytics models, and machine/deep learning and data visualization. The document contains detailed information on these data analytics concepts across multiple sections and pages.

Uploaded by

melkter3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Introduction to

Data Science

Solomon Teferra Abate


@
SIS, AAU
Topics of Data Analytics
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Overview of Data Analytics
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Overview of Data Analytics
• Data analytics is the method for looking at big data to reveal
hidden patterns, incomprehensible relationships, and other
important information that can be utilized to resolve on
enhanced business decisions.

• Data analytics is the systematic approach of collecting,


processing, and analyzing data sets using statistical and
other business analysis methodologies regardless of size and
volume to provide better insights in strategic, tactical, and
operational decision making.
• Hence, data analytics can also be exemplified as the
systematic approach of the collection of massive data sets,
processing, and analyzing for data-driven decision making
Overview of Data Analytics

Data analysis and Data analytics – are often used
interchangeably and could be confusing

Data analysis is a detailed examination of the elements
or structure of something

Data analytics is the systematic computational analysis
of data or statistics
Data Analytics Data Analysis
Broader Specific to a problem
Defines the science behind Applies analytics methods
Looks to the future (predict) Looks backwards
Identifies inexplicable or Used based on theoretical
novel relationships/trends foundation
Seeks to visualize the data Seeks to identify a significant
to allow the observation of level to address hypotheses or
relationships/trends Research Questions
Types of Data Analytics
2.1. Overview of Data Analytics
2.2. Types of Data Analytics/Analysis
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Types of Data Analytics
Analytics/

Data/
Types of Data Analytics
• Descriptive Analytics: This is a method for quantitatively
describing the main features of a collection of data

Variables: Categorical, Ratio, Independent,
Dependent

Frequency Distribution: Histogram, Normal
distribution

Measures of Centrality: Mean, Median, Mode

Dispersion of a Distribution: Range, Interquartile
range, Variance, Standard deviation
• Diagnostic Analytics/Causal analysis: Are used for
discovery, or to determine why something happened

Correlation: Pearson’s r correlation
Types of Data Analytics
• Predictive Analytics: These analytics are about
understanding the future using the data and the trends
we have seen in the past, as well as emerging new
contexts and processes. They are done in stages:

• Prescriptive Analytics: Analyzes potential decisions, the


interactions between decisions, the influences that bear upon
these decisions, and the bearing all of this has on an outcome
to ultimately prescribe an optimal course of action in real
time
Types of Data Analytics
• Exploratory analysis: Is an approach to analyzing datasets
to find previously unknown relationships. Often such
analysis involves using various data visualization
approaches:

• Mechanistic analysis: Involves understanding the exact


changes in variables that lead to changes in other
variables for individual objects

Regression analysis is a process for estimating the
relationships among variables. It shows how one
variable can be predicted from another (Correlation
shows only their relation)
Data Analytics Lifecycle
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Data Analytics Life Cycle
• Phase 1: Data Discovery and Formation: In the initial step,
data will be evaluated for its potential uses and demands –
such as where it comes from, what message you wish for it to
send and how this incoming information benefits your
business.
• The data science team investigates and learns about the
challenge.

Create context and understanding

Learn about the data sources that will be required and
available for the project

The team develops preliminary hypotheses that can later
be tested with data
Data Analytics Life Cycle
• Phase 2: Data Preparation and Processing: Data
preparation and processing involves gathering, sorting,
processing and purifying collected data to make sure it
can be utilized by subsequent steps of analysis
• Following are methods of data acquisition

Data Collection: Draw information from external sources.

Data Entry: Within an organization, data entry refers to
creating new points of information using either digital
technologies or manual input procedures.

Signal Reception: Accumulating data from digital devices
like the Internet of Things devices and control systems.
Data Analytics Life Cycle
• Phase 3: Design a Model: The phase of creating a model
that uses the data to achieve the defined goal. Model
planning is the name given to this stage of the data
analytics process.
• There are numerous methods for loading data into the
system and starting to analyze it:

ETL (Extract, Transform, and Load) converts the data
before loading it into a system using a set of business
rules.

ELT (Extract, Load, and Transform) loads raw data into the
sandbox before transforming it.

ETLT (Extract, Transform, Load, Transform) is a
combination of two layers of transformation.
Data Analytics Life Cycle
• Phase 3: Model Building: The use of tools and methods,
such as decision trees, regression techniques logistic
regression), and neural networks to create and run the
model.
• Tasks that are done in this stage include:

Creation of datasets for use in testing, training, and
production.

Examining if the present tools will serve for running the
models or if a more robust environment is required for
model execution.

Python, R, Octave, and WEKA are examples of free or
open-source tools.
Data Analytics Life Cycle
• Phase 5: Result Communication and Publication: The
communication process begins with cooperation with key
stakeholders to decide whether the project’s outcomes
are successful or not.
• The project team is responsible for identifying the major
conclusions of the analysis, calculating the business value
associated with the outcome, and creating a narrative to
summarize and communicate the results to stakeholders.
Data Analytics Life Cycle
• Phase 6: Measuring Effectiveness: As your data analytics
life cycle comes to an end, the final stage is to offer
stakeholders a complete report that includes important
results, coding, briefings, and technical papers or documents.
• Furthermore, to assess the effectiveness of the study, the
data is transported from the sandbox to a live environment
and observed to see if the results match the desired business
aim.
• If the findings meet the objectives, the reports and outcomes
are finalized. However, if the conclusion differs from the
purpose stated in phase 1, then you can go back in the data
analytics life cycle to any of the previous phases to adjust
your input and get a different result.
Building Data Analytics Models
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Building Data Analytics Models
• The models are built to extract insights and knowledge
from the data to make business decisions and strategies
• In this phase of the project data science team needs to develop
data sets for training, testing, and production purposes
• Model building in data analytics is aimed at achieving not only
high accuracy on the training data but also the ability to
generalize and perform well on new, unseen data.
 Dividing The Dataset For Model Building
 Scaling The Dataset: Makes our model more robust to the outliers
 Modeling The Data:

Is the problem a regression or a classification problem?

Should the model be more explainable or of higher accuracy?
 Plotting The Decision Graph
ML/DL for Data Analytics
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
What is machine learning?

A branch of artificial intelligence,
concerned with the design and
development of algorithms that allow
computers to evolve behaviors based on
empirical data.

As intelligence requires knowledge, it is
necessary for the computers to acquire
knowledge.
ML/DL for Data Analytics
• Machine Learning: explores the use of algorithms that
can learn from the data and use that knowledge to
make predictions on data they have not seen before
• Such algorithms are designed to overcome strictly
static program instructions by making data-driven
predictions or decisions through building a model from
sample inputs.

• Machine Learns from past experiences Improve


the performances of intelligent programs
• A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P
Traditional Programming vs Machine Learning

Traditional Programming
Data
Computer Output
Program

Machine Learning
Data
Computer Program/
Output Rules
ML Related Fields
data Assignment1:
mining control theory Discuss their relation
to ML
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models


Machine learning is primarily concerned with the accuracy
and effectiveness of the computer system in performing
complex tasks.
Machine Learning Domain
Basic Algorithms
Supervised Vs Unsupervised Learning
Algorithms

Supervised learning Unsupervised learning


Training data with labels Training data not labeled

Semi-Supervised learning
Machine learning structure

Supervised learning
Machine learning structure

Unsupervised learning
Learning Techniques

Supervised learning categories and techniques
 Linear classifier (numerical functions)
 Parametric (Probabilistic functions)

Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov
models (HMM), Probabilistic graphical models
 Non-parametric (Instance-based functions)

K-nearest neighbors, Kernel regression, Kernel density estimation,
Local regression
 Non-metric (Symbolic functions)

Classification and regression tree (CART), decision tree
 Aggregation (Ensembling)

Bagging (bootstrap + aggregation), Adaboost, Random forest
Learning Techniques

Unsupervised learning categories and techniques
 Clustering

K-means clustering

Spectral clustering
 Density Estimation

Gaussian mixture model (GMM)

Graphical models
 Dimensionality reduction

Principal component analysis (PCA)

Factor analysis
Deep Learning
Deep Learning
Deep Learning
Weights to learn!

Weights to learn!
• Lots of hidden layers
• Depth = power (usually)

Weights to learn!
Deep Learning

Weights to learn!
Deep Learning- Loos gradients
• Denoted as (diff notations):

• i.e. how does the loss change as a function of the


weights
• We want to change the weights in such a way that
makes the loss decrease as fast as possible
Deep Learning- Loos gradients
Deep Learning- Loos gradients
Evaluation Metrics
Few Applications

Predication (weather, agricultural yield, etc)

Face Detection; Character recognition;

Surveillance and security system

Object detection and recognition

Natural language processing

Speech/Image Recognition

Multimedia event detection

Economical and commercial usage

……… many many many more
Data Analytics for DV and ST
2.1. Overview of Data Analytics
2.2. Types of Data Analytics
2.3. Data Analytics lifecycle
2.4. Building Data Analytics Models
2.5. Introduction to Machine/Deep Learning
2.6. Data Visualization and Story telling
Data Visualization and Story Telling

Data Visualization is the act of placing data into a
visual context, such as a graph.

It makes data easier for the human brain to
understand and detect patterns, trends and outliers
in a group of data.

The goal of data storytelling is to present complex data
in a way that is easy to understand and engaging for
the audience.

By using data visualization, charts, and other tools,
data storytellers can make their data more
accessible and understandable to a wider audience.
Data Visualization vs Storytelling

Data visualization and data storytelling are related
concepts, but they are not interchangeable.

Data visualization refers to the graphical representation of
data, such as charts, graphs, and maps, whereas data
storytelling refers to the process of using data to tell a
compelling story.

Data visualization is an important part of data storytelling, as it
helps to make data more accessible and understandable for
the audience. Data visualization can help highlight trends,
patterns, and insights that might have been missed with
traditional data analysis methods. Data storytelling, on the
other hand, goes beyond data visualization to create a
narrative that connects the data to the audience. Data
storytelling involves using data to tell a story, such as
identifying a problem, presenting evidence to support a
solution, and providing a call to action.
Data Visualization vs Storytelling

Data visualization 
Data Storytelling

focuses on presenting data in 
focuses on presenting data in a
a visual format, such as way that tells a story and
charts, graphs, or maps. connects with the audience.

often used to highlight 
may use data visualization, but
patterns, trends, or it goes beyond charts and
relationships in data. graphs to include narratives

aims to help the audience and other storytelling
understand the data quickly techniques.
and easily. 
aims to help the audience

requires technical skills to understand the data in the
create effective visualizations. context of a larger story.

It may not always tell a story 
requires both technical and
or provide context for the data storytelling skills to be effective.
being presented. 
can inspire action and change
by providing insights and
solutions based on the data.
Data Visualization - Types

Bar charts: Bar charts (vertical or horizontal) are used to
compare categorical data.

Line charts: Line charts are used to show trends over time.

Scatter plots: Scatter plots are used to show the relationship
between two variables. They consist of points on a graph that
represent the values of the two variables being compared.

Pie charts: Pie charts are used to show the proportion of
different categories in a data set.

Heat maps: Heat maps are used to show the distribution of
values in a data set. They consist of a grid where each cell is
colored based on the value it represents.

Bubble charts: Bubble charts are used to show the
relationship between three variables. They consist of circles
that vary in size and color to represent the values of three
different variables.

Maps: Maps are used to show geographic data.
Data Storytelling- Elements

Audience: Understanding your audience is key to creating an
effective data story.

You should consider their background knowledge, interests,
and motivations when selecting the data and crafting the
narrative.

Narrative: A narrative structure can help to make data more
compelling and understandable.

The story should have a clear beginning, middle, and end,
and should be focused on a central problem or question.

Data: The data should be accurate, relevant, and reliable.

It should support the story and help to provide evidence for
the main argument.

Visuals: Visuals such as charts, graphs, and maps can help to
make the data more accessible and understandable.

The visuals should be chosen based on the story's needs,
and should be clear and easy to interpret.
Data Storytelling- Elements

Emotion: Emotion can help to engage the audience and make
the story more memorable.

Including personal anecdotes, case studies, or testimonials
can help to bring the data to life and connect with the
audience on an emotional level.

Call to Action: A call to action can help to inspire the audience to
take action based on the insights provided by the data.

The call to action should be specific, realistic, and
achievable, and should be aligned with the story's main
message.
Data Visualization and Storytelling tools

The Best Data Visualization tools of 2024

Microsoft Power BI: Best for business intelligence (BI)

Tableau: Best for interactive charts

Qlik Sense: Best for artificial intelligence (AI)

Klipfolio: Best for custom dashboards

Looker: Best for visualization options

Zoho Analytics: Best for Zoho users

Domo: Best for custom apps

Other data visualization and storytelling tools:

Infogram: Allows us to create a variety of charts, graphs, and
infographics.

Canva: A design tool that allows us to create a range of visual
content, including data visualizations, charts, and infographics.

Google Data Studio: Allows us to create interactive dashboards and
reports. It offers a range of visualization options and allows us to
connect to a variety of data sources.

Piktochart: Allows us to create a variety of visual content, including
infographics and presentations.

You might also like