Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
Dr.Ghoniem Lawaty
[email protected]
MIS, DA, ML, Digitization and Micro-Services, TOGAF, DEVOPS
Certified ATM for CMMI SCAMPI (A) method, DA,DS,ML, ICAgile
https://www.linkedin.com/in/ghoniem-abdel-azim-mostafa-33860691
Lecture Content
Course Topics Lecture Content
01-Introduction to DA • Methodology 05-Data interpretation • Interpretation preparation
• Data analysis 5 W’s • Meters bases concept
• Why data analysis? • RGY concept
• Required sciences for DA? • Storyfication (Tell the story)
• Difference between DA, DM,DS, ML, and
data engineering
• Data analysis tools and skills
• DA lifecycle, brief introduction
02-Data analysis • Data gathering phase 07-DA practices at strategic level
Preprocessing • Data cleaning phase
• Data integration
• Data reduction
• Data transformation and normalization
03-Data analysis types • Central tendency measures 08-Hypothesis testing • What is Inference statistics
• Dispersion measures • Normal distribution
• Anomalies detection (Rationale to be • Standard normal distribution
here) • Estimation periods
• Correlation measures • Z-Table
• Distribution Measures • Sample size
• Predictive Analysis • Hypothesis formulation
• Time-Series • Hypothesis proofing
04-Data visualization • Frequency Tables 09- Machine Learning • Supervised Learning
• Concurrency tables (Moving forward to data science) • Un-Supervised Learning
• Histograms • Sentiment Analysis
• Pie chart • Association Rules
• Trends
• Distribution
• Bar-chart Race Model
Course Approach
For each topic, we will study the following:
What is the topic?
What?
Why We need it?
When to do it?
Sample Why?
What is the process to do it?
What is the advantage and disadvantages
Course Approach
Sample on the topic
Comparisons if applicable
Advantage /
When?
Case study using Excel/Python/Flourish Disadvantage
Process?
Analyzing the past to improve the future
Data Analysis
From theoretical to implementation
Using Excel, python, flourish
Lecture #1/8
Introduction to DA
Dr.Ghoniem Lawaty
[email protected]
MIS, DA, ML, Digitization and Micro-Services, TOGAF, DEVOPS
Certified ATM for CMMI SCAMPI (A) method, DA,DS,ML, ICAgile
https://www.linkedin.com/in/ghoniem-abdel-azim-mostafa-33860691
Session Topics
What is DA?
Data analysis 5W’s
Difference between DA/DS/DE
Datasets definition
Data analysis tools
Data analysis and Data warehousing
Data analysis Myths
Data analysis lifecycle model (Practical guide)
Data analysis 5 W’s
What is data analysis?
Non-trivial process of inspecting, cleansing, transforming and modeling data with the goal of:
Discovering useful information, Hidden patterns, like the usage % of resources
Supporting decision-making, like deciding to sell these resources
Forecasting the future, like forecasting the usage in the future to buy new resource
What is (not) data analysis?
Trivial search process in database, like finding transactions for specific employee
Complicated search process, also not considered to be data analysis
Both are tools of data discovery that support the data analysis
Why data analysis?
Reduce the cost of inspecting enormous data
The evolution of dimensionality of data
Discover the meanings that carried by the data
Discover data patterns
Support decision making
When?
Continuous process
Valuable when enough information is available for analysis
In order to analyze,
You should measure,
From different perspectives.
Snapshots and trends
Data Analysis Questions
Data engineering
Machine learning
Machine learning
Data analysis Tools
ETL and ELT
ETL is the abbreviation of Extract, transform and load
It used to merge the different data sources, into the
target DWH, that will be used in DA process
Extract: Retrieving raw data from an unstructured data
pool and migrating it into a temporary, staging data
repository data sources, like access, xls, db, json,txt
Transform: Structuring, enriching and converting the
raw data to match the target source
Load: Loading the structured data into a data
warehouse to be analyzed and used by business
intelligence (BI) tools
It considered a tool of DA
It depends on: will you import the data to your
warehouse before or after the process
Difference between ETL and ELT
ELT is the modern trend of handling big data
ETL
Extract data from data sources
Transform you target data format
Load data in your data base, with enabling to
visualization
ELT
Extract is the same
Load your data first, and then data warehouse
Transform with regeneration to new data format
Database and Data warehouse
According to the scale of your problem you may
need to deal with different level of tools and
technologies
For small size of business, you may just limit your
access to one database, flat files, xls sheets
According to the complexity, you may need
additional level of tools, like data warehousing tools
,(SSIS) as an example
You may need to think about ETL (Extract ,
transform, load) concepts, as it will be one of the
core jobs in DA.
Why We need that?
You need that for consolidation purposes
You Need that to prevent negative impact on
operational databases
Data warehousing challenges
Performance: As data increased over time
Information driven analysis : Spend more time on
understanding and documenting business needs
Data structure and system optimization: Carefully
design your data analysis tools
Balancing resources
Multiple department access
Access control
Decrease efficiency
Accessibility measures can help balancing your
resources
Data governance and master data
One of the mistakes is lack of investing in data
governance and master data
Data should be consistent and accurate
BI and DWH is a key for
data analysis
Consolidate
operational database
Different systems
Different media types
Advantages of DWH
Strategic questions answering
Faster and more accurate
DWH solutions is not a product to be purchase, it’s a customization on the
company requirements
Do not analyze and operationalize at the
same database.
Data Analysis Myths
S# Myth Truth
1 All data has equal value Each Data element (Dataset) has a different weight according to its nature and priority in achieving
organization objectives, in addition, data quality take place in evaluating data models.
We can’t measure everything, as we should work according to RICE model that drive the business to
fast growth.
2 You need to hire data scientists team Data science team has different responsibilities than the DA team.
In addition, it depends on the current level of required analytics and used tools
3 You need budget for data Data Storage become on the cloud so no hardware required, and ease of data collection, like
operational databases and IoT platforms
4 Data analysis requires massive volume of data That can be valid when building ML models, but DA can work with any number of data.
5 Data analysis can improve every part of your business Misleading statement, It provide a different types of analytics to decision makers according to the
available data, with a specific level of quality, and it’s not its responsibility to improve.
6 DA is too time-intensive Automation tools becomes available, every where
7 Data analytics leads to job loss Data analysis leads to business improvement, and taking the right decisions for the organization
Improvement may leads to increase team members, reallocating others, and more actions.
8 Only large organizations with big data need it Any level of business require a level of data analysis, as you do not have clear vision about your
organization performance, what happened, what will happen , without DA, even startups.
9 Algorithm never go wrong Selection of algorithm depends on the available data and accuracy level.
Algorithm selection should be incrementally validated, as with data growth, we may find that
another algorithm should take place instead of the existing one.
10 Predication is easy with data analytics Depending on the data patterns that should be re-evaluated periodically in order to have higher
accuracy.
DA Lifecycle, Brief intro
Data understanding: Using domain knowledge
Data preparation
Collection
Cleaning
Integration
Reduction
Transformation
DA problem definition and objectives
Questioning Phase
Central tendency measures
Desperation measures
Correlation measures
Anomalies detection
Forecasting
Data visualization
Results review process
Data interpretation (Storification)
DA: Output Maturity Model (Maximization Model)
AnalyticA Journey using Ten-Steps Analytics Journey
Reports X X X
Dashboards X X X
Min Trends X X
Operations Distribution X X X
Analytics
Comparative X X X
Minimal reports
inputs
Desperation X X
reports
Anomalies X X
Inference X X
Hypothesis X X
Forecasting X X