0% found this document useful (0 votes)
42 views

Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish

This document discusses data analysis from a theoretical and implementation perspective using tools like Excel, Python, and Flourish. It provides an overview of a course on data analysis that covers topics ranging from data preprocessing to machine learning. The instructor, Dr. Ghoniem Lawaty, has certifications in data analysis, data science, machine learning and agile methodologies. The course aims to explain data analysis concepts and processes, demonstrate examples, and apply techniques using software tools through a structured approach for each topic.

Uploaded by

mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish

This document discusses data analysis from a theoretical and implementation perspective using tools like Excel, Python, and Flourish. It provides an overview of a course on data analysis that covers topics ranging from data preprocessing to machine learning. The instructor, Dr. Ghoniem Lawaty, has certifications in data analysis, data science, machine learning and agile methodologies. The course aims to explain data analysis concepts and processes, demonstrate examples, and apply techniques using software tools through a structured approach for each topic.

Uploaded by

mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Analysis

From theoretical to implementation


Using Excel, python, flourish

Dr.Ghoniem Lawaty
[email protected]
MIS, DA, ML, Digitization and Micro-Services, TOGAF, DEVOPS
Certified ATM for CMMI SCAMPI (A) method, DA,DS,ML, ICAgile
https://www.linkedin.com/in/ghoniem-abdel-azim-mostafa-33860691
Lecture Content
Course Topics Lecture Content
01-Introduction to DA • Methodology 05-Data interpretation • Interpretation preparation
• Data analysis 5 W’s • Meters bases concept
• Why data analysis? • RGY concept
• Required sciences for DA? • Storyfication (Tell the story)
• Difference between DA, DM,DS, ML, and
data engineering
• Data analysis tools and skills
• DA lifecycle, brief introduction
02-Data analysis • Data gathering phase 07-DA practices at strategic level
Preprocessing • Data cleaning phase
• Data integration
• Data reduction
• Data transformation and normalization
03-Data analysis types • Central tendency measures 08-Hypothesis testing • What is Inference statistics
• Dispersion measures • Normal distribution
• Anomalies detection (Rationale to be • Standard normal distribution
here) • Estimation periods
• Correlation measures • Z-Table
• Distribution Measures • Sample size
• Predictive Analysis • Hypothesis formulation
• Time-Series • Hypothesis proofing
04-Data visualization • Frequency Tables 09- Machine Learning • Supervised Learning
• Concurrency tables (Moving forward to data science) • Un-Supervised Learning
• Histograms • Sentiment Analysis
• Pie chart • Association Rules
• Trends
• Distribution
• Bar-chart Race Model
Course Approach
 For each topic, we will study the following:
 What is the topic?
What?
 Why We need it?
 When to do it?
Sample Why?
 What is the process to do it?
 What is the advantage and disadvantages
Course Approach
 Sample on the topic
 Comparisons if applicable
Advantage /
When?
 Case study using Excel/Python/Flourish Disadvantage

Process?
Analyzing the past to improve the future
Data Analysis
From theoretical to implementation
Using Excel, python, flourish

Lecture #1/8
Introduction to DA

Dr.Ghoniem Lawaty
[email protected]
MIS, DA, ML, Digitization and Micro-Services, TOGAF, DEVOPS
Certified ATM for CMMI SCAMPI (A) method, DA,DS,ML, ICAgile
https://www.linkedin.com/in/ghoniem-abdel-azim-mostafa-33860691
Session Topics
 What is DA?
 Data analysis 5W’s
 Difference between DA/DS/DE
 Datasets definition
 Data analysis tools
 Data analysis and Data warehousing
 Data analysis Myths
 Data analysis lifecycle model (Practical guide)
Data analysis 5 W’s
 What is data analysis?
 Non-trivial process of inspecting, cleansing, transforming and modeling data with the goal of:
 Discovering useful information, Hidden patterns, like the usage % of resources
 Supporting decision-making, like deciding to sell these resources
 Forecasting the future, like forecasting the usage in the future to buy new resource
 What is (not) data analysis?
 Trivial search process in database, like finding transactions for specific employee
 Complicated search process, also not considered to be data analysis
 Both are tools of data discovery that support the data analysis
 Why data analysis?
 Reduce the cost of inspecting enormous data
 The evolution of dimensionality of data
 Discover the meanings that carried by the data
 Discover data patterns
 Support decision making
 When?
 Continuous process
 Valuable when enough information is available for analysis
In order to analyze,
You should measure,
From different perspectives.
Snapshots and trends
Data Analysis Questions

What I’m tending to?


What is the variance?
What is my position?
Analysis lab Do we have a relation?
What is my evolution over time?
Can We forecast?
Data analysis 5 W
 How?
 Using domain knowledge (Finance, SCM,…)
 Different sciences (Math, Statistics, ….)
 Development Tools (C#, Python, Java,…)
 BI tools (Excel,Power BI, SSIS,SSRS,R, Tableu,…)
 Who?
 Data Analyst is the key role of that
 Qualified Develop team members
 Qualified managers
 Qualified PMO
 Qualified Strategy management office
“Exploratory analysis can never be the whole
story, but nothing else can serve as the
foundation stone” John Tukey

DA, Move from concepts (Perceptions), to


quantitative measures and indicators
DE,DA,DM,ML,DS
 The key inputs to DA as a science is the data.
 Data in DA and data science has different name called dataset
 Dataset is a composition of multiple, data tables, structured only Data science

in data analysis, and include un-structured in data science.


Machine learning
 Each data table has a set of columns, each one called Attribute,
or feature.
KDD (Knowledge discovery database)
 As a developer, you can be the (DE) data engineer, that you share
in building models that achieve business goals.
Data mining
 These models can contain Physical and logical attributes
 As a (DA) data analyst, you are the consumer of the data model, Data analysis
so you can only create logical attributes only to support your
mission. Feature selection
 DM,ML,DS: Data mining is the use of non-trivial algorithms to
find data patterns that support descriptive and predictive needs, Data (Feature) engineering
in ML and DS, we have a larger scale of tools, algorithms and
problems, that can be resolved using theses sciences.
Data engineering, and DA
 Features data types can be:
 Nominal : [Cairo, Alex], [Yes, No],[Male, Female]
 Ordinal: [Fair, Good, Excellent],[Poor, Med, Rich] Data Types

 Discrete: Temperature, count items


 Numeric: like salaries, rates Qualitative
Or Quantitative
 Each attribute can be: Categorical
 Discrete: have limited values
 Continuous: have unlimited values Nominal Ordinal Discrete Continuous

 Why it’s important to understand attribute nature:


 To make Discretization
 What is discretization?
 Discretization is The process of converting continuous variables
to discrete one
 Purpose: grouping the data, and have clear analytics about it,
like class frequencies
Data engineering, and DA
Values scale
 Each attribute can be:
Nominal
 Discrete: have limited values
 Continuous: have unlimited values
Ordinal (1,2,3)
 Data scale can be:
 Nominal
Interval (From/To)
 Ordinal
 Interval
Ratio : numeric values
 Ratio
 Why it’s important to understand attribute nature:
 To select suitable measure
 To make data Discretization
 What is discretization?
 Discretization is The process of converting continuous variables
to discrete one
 Purpose: grouping the data, and have clear analytics about it,
like class frequencies
Scales operations
How to build intelligent model according to
data type?
Required Sciences for DA
 Descriptive statistics:
Domain knowledge
 Summarizes or describes characteristics of a data
set.
DBMS
 Descriptive statistics consists of two basic categories
of measures: measures of central tendency and
measures of variability or spread. Descriptive statistics

 Measures of central tendency describe the center of


a data set. Analytical statistics

 Measures of variability or spread describe the Data analysis


dispersion of data within the set. Inferential statistics
 Data engineering is the CORE , that no statistics will
be valid if you missed the Math
features/columns/attributes from the data model
 Math Data engineering

 Data engineering
Machine learning
 Machine learning
Data analysis Tools
ETL and ELT
 ETL is the abbreviation of Extract, transform and load
 It used to merge the different data sources, into the
target DWH, that will be used in DA process
 Extract: Retrieving raw data from an unstructured data
pool and migrating it into a temporary, staging data
repository data sources, like access, xls, db, json,txt
 Transform: Structuring, enriching and converting the
raw data to match the target source
 Load: Loading the structured data into a data
warehouse to be analyzed and used by business
intelligence (BI) tools
 It considered a tool of DA
 It depends on: will you import the data to your
warehouse before or after the process
Difference between ETL and ELT
 ELT is the modern trend of handling big data
 ETL
 Extract data from data sources
 Transform you target data format
 Load data in your data base, with enabling to
visualization
 ELT
 Extract is the same
 Load your data first, and then data warehouse
 Transform with regeneration to new data format
Database and Data warehouse
 According to the scale of your problem you may
need to deal with different level of tools and
technologies
 For small size of business, you may just limit your
access to one database, flat files, xls sheets
 According to the complexity, you may need
additional level of tools, like data warehousing tools
,(SSIS) as an example
 You may need to think about ETL (Extract ,
transform, load) concepts, as it will be one of the
core jobs in DA.
 Why We need that?
 You need that for consolidation purposes
 You Need that to prevent negative impact on
operational databases
Data warehousing challenges
 Performance: As data increased over time
 Information driven analysis : Spend more time on
understanding and documenting business needs
 Data structure and system optimization: Carefully
design your data analysis tools
 Balancing resources
 Multiple department access
 Access control
 Decrease efficiency
 Accessibility measures can help balancing your
resources
 Data governance and master data
 One of the mistakes is lack of investing in data
governance and master data
 Data should be consistent and accurate
BI and DWH is a key for
data analysis
 Consolidate
 operational database
 Different systems
 Different media types

 Data warehouse contains:


 Metadata
 Summary data
 Raw data

 You can build the following upon DWH:


 Analytics
 Reporting
 Mining

 Advantages of DWH
 Strategic questions answering
 Faster and more accurate
 DWH solutions is not a product to be purchase, it’s a customization on the
company requirements
Do not analyze and operationalize at the
same database.
Data Analysis Myths
S# Myth Truth
1 All data has equal value Each Data element (Dataset) has a different weight according to its nature and priority in achieving
organization objectives, in addition, data quality take place in evaluating data models.
We can’t measure everything, as we should work according to RICE model that drive the business to
fast growth.

2 You need to hire data scientists team Data science team has different responsibilities than the DA team.
In addition, it depends on the current level of required analytics and used tools
3 You need budget for data Data Storage become on the cloud so no hardware required, and ease of data collection, like
operational databases and IoT platforms
4 Data analysis requires massive volume of data That can be valid when building ML models, but DA can work with any number of data.
5 Data analysis can improve every part of your business Misleading statement, It provide a different types of analytics to decision makers according to the
available data, with a specific level of quality, and it’s not its responsibility to improve.
6 DA is too time-intensive Automation tools becomes available, every where
7 Data analytics leads to job loss Data analysis leads to business improvement, and taking the right decisions for the organization
Improvement may leads to increase team members, reallocating others, and more actions.
8 Only large organizations with big data need it Any level of business require a level of data analysis, as you do not have clear vision about your
organization performance, what happened, what will happen , without DA, even startups.
9 Algorithm never go wrong Selection of algorithm depends on the available data and accuracy level.
Algorithm selection should be incrementally validated, as with data growth, we may find that
another algorithm should take place instead of the existing one.

10 Predication is easy with data analytics Depending on the data patterns that should be re-evaluated periodically in order to have higher
accuracy.
DA Lifecycle, Brief intro
 Data understanding: Using domain knowledge
 Data preparation
 Collection
 Cleaning
 Integration
 Reduction
 Transformation
 DA problem definition and objectives
 Questioning Phase
 Central tendency measures
 Desperation measures
 Correlation measures
 Anomalies detection
 Forecasting
 Data visualization
 Results review process
 Data interpretation (Storification)
DA: Output Maturity Model (Maximization Model)
AnalyticA Journey using Ten-Steps Analytics Journey

0. Listing All your


1. Composite reports 2. Data summarization 3. Dashboards with RGY
Module Objects
according to metadata levels approach
(Screens)

7. Trends & Race 4. Desperation Analytics


Models 8. Relational Models 5. Anomalies

8. Comparative 10. ML Forecasting


9.Hypothesis Testing Storytelling
Analytics Model

Prepared By Dr. Ghoniem Lawaty


Ten-Steps Analytics Journey
AnalyticA Journey Sample Application
Step Why How Sample
0. Listing All your Module Organization should have a clear understanding of all its objects that represents List all solutions data models/screens Competitors/Markets/Employees/BU/Custom
Objects (Screens) data models, to be used for analysis, and relationship between them List all business units key inputs data ers/Sales/Orders
List all business units key output information
1. Composite reports Organization need to conduct more complicated reports rather than primitive one Create reports on multiple columns datasets Sales orders with value greater than the
according to metadata Create reports on multiple related datasets average
2. Data summarization Organization need to have the smallest amount of data, that describe correctly the Using tables Sales summary per month
levels huge amount of existing data. Using Matrixes (2 dimensions tables) Sales matrix summary per country
(The core is the aggregation functions) Using graphs Pie/histogram/…
3. Dashboards with RGY Dashboards is the KPI tool that used to monitor the organization healthy state Select KPI MTD Sales
approach In order to keep it dynamic, We update the model to include RGY healthy tracker Align the RGY (Red/Green/Yellow) factors with MTD active users
strategy board
Create response plans
4. Desperation Analytics As the organization needs to measure the data tendency using central tendency Use desperation measures to measure data variance Sales Range within month
measures, it also needs to measure the variance of the data Students grades variance
5. Anomalies Anomalies represent key driver for root cause analysis and resolution, that should Using anomalies equations Upper extreme Sales orders
be considered to order to correct organization direction Using box-Plot Lower extreme Sales orders
8. Relational Models How organization can find the relation between different data models, in Using correlation models to find the relation strength Relationship between gender and cosmetics
mathematical models and direction sales
7. Trends Organization needs to analyze behaviors, which take place over time, so trends are 2 dimensional datasets including the key Monthly Sales growth MTD
required for analysis measurement, and the date/time in order to Monthly active users growth MTD
represent the evolution over time
8. Comparative Analytics Since organization needs to have detailed behavioral analytics, it needs to compare Trends for the same objects Year/Year Sales Comparison
the results together to conduct RCA of what actually happen that cause the compare object results with another object per value
variance, and how to improve to get the best results Using different charts like pie/bars Cosmetics/Medicines Sales Comparison
9.Hypothesis Testing determine whether there is enough statistical evidence in favor of a certain belief, Use model to accept/reject the hypothesis Using the existing students sample for K12,
or hypothesis, about a parameter success rate will not exceed 78
10. ML Forecasting Model Organization need to forecast the future, in order to take corresponding responses Using different Machine learning algorithms, for 1. Sales forecasting
to be proactive better forecasting accuracy, and to be updated 2. Active users forecasting
incrementally 3. Registration growth
Storytelling Organization needs to visualize analytical models in well formatted story in order Understand the relationship between your models Yearly behavior, with achieved goals
to meet all audiences requirements Understand the dependency
Data warehousing Model
DA Type Operational DB Data Mart Data warehouse

Reports X X X
Dashboards X X X

Min Trends X X
Operations Distribution X X X
Analytics

Comparative X X X
Minimal reports
inputs
Desperation X X
reports

Great Outputs Relational Models X X

Anomalies X X
Inference X X
Hypothesis X X
Forecasting X X

You might also like