0% found this document useful (0 votes)

121 views

Data Science Theory: Analysis and Analytics

This document discusses key concepts in data science theory including: 1) It distinguishes between analysis (examining past data) and analytics (predicting future patterns). Qualitative analysis uses intuition while quantitative analysis uses formulas. 2) Data science can improve predictive accuracy by analyzing data extracted from various activities. Business intelligence analyzes historical data to explain past events. 3) Machine learning uses data to make predictions and analyze patterns without explicit programming. Artificial intelligence simulates human decision making. 4) The document outlines approaches for working with different data types from raw to processed data to information and techniques for analyzing big data.

Uploaded by

Nonameforever

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views

Data Science Theory: Analysis and Analytics

Uploaded by

Nonameforever

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

🧠

Data science Theory

Class Data science theory

Completed

Created Jun 12, 2020 1019 PM

Materials

Source Udemy

Type Lecture

Analysis and analytics

Analysis-

preform analysis on things that have already happended in the past.

Example: Hoe the sales decreased in the summer.

We do analysis to find what or something happen

Analytics-

Exploring patterns in exploring what we can do in the future.

There are two type of analytics

Qualitative analysis = intution and analysis

Data science Theory 1

Quantitative analysis =formulas and algorithms

Introduction:

In this some business activities are data driven while others are subjective or
experience driven.

Business needs -

Business case studies - real world experience of how companies succeed

and fail. We dont need a data set to understand case studies.

Qualitative anaytics - its all about intuition and knowleage about the market
,This includes working with tools to pridict the future behavior.

Preliminary data reporting

reporting with visuals

Creating dashboards
Sales forecastings

👆In the following the pink are data driven

👆The yello is experience driven
Some of the terms you refer to activites that aim to explain past
behavior(This is called as Analysis) while others refer to activites used for
predicting future behavior(This is called as analytics).

Data science Theory 2

Here the business case studies are analysis and qualitative analysis is all
about analytics predicting the fututre(Analytics).

NOTE Business analytics=business analysis + business analytics.

Data science: Can be used to improve the accuracy of prediction based on

data extraced from various activities.

Business Intelligence BI :The process of analysing and reporting historical

business data .Aims to explain past events using business data.preliminary
step of predictive analytics

 Analyse past data and extract useful insights

 create appropriate models

Reporting visuals and creating dashboards is all about BI

Machine Learning: The ability of mahine to pridict outcomes without being

explicitly programmed. is all about creating and implementing algorithms that
let machines receive data and use this data to

 Make pridictions

 analyses patterns

 give recommendations

Artificial intelligence: Simulating human knowledge and decision makeing with

computers.

Data science Handbook :

Data science Theory 3

Approaches and techniques working with traditional data.
Raw data to processed data and to information

Data science Theory 4

 Raw facts or Raw data

 Cannot be analysied straight away

 in is untouched data you have accumulated and storded in the server

 Data collection

 Examples: Survey Can be taken by surveys.How much people like

or dislike the product in the scale of 1 to 10 }

 Cookies : They provide companies with detailed information about

users activities on a web site.

 processed data

 Data pre-processing :

 Before data processing we do data pre-processing.This we do

after data collection.This is a group of operation that will basically
convert your raw data into a format that is more understandable.

 Example : In the SQL database is the person enters the age is 932
or name as united kingdom

 Before any analysis that data should be makred as invalid or

corrected.

 Methods in pre-processing:

 Class labeling -

 This inculdes labeling the data point to the correct data

type or arranging data by category.

 This can be

 Numerical - number of unites sold in the day

 categorical - cannot be manipulated.

 Data cleansing = data cleaning = data scrubbing

 It is to deal with inconsistant data

 Example: Correcting spelling mistakes and deal with

missing values.

 Example for Data preprocessing :

Data science Theory 5

 Balancing : Imagine you have copiled a survey to gather
data on the shopping habits of man and women .To find
who spends more money in the weekend.When you have
the data 80% of women and 20% of men in the
respondents. So the trends you may notice are not going
towards men as much as women to counteract.Applying
balancing techiques wiuld be the best thing to do such as
takeing equal number of respondents from each group.so
the ratio is 50/50.

 Data shuffling : Shuffling the observation from the dataset

is just like shuffling of cards.Prevents unwanted
patterns.Improves predictive perforance.helps avoid
misleading results.Suffling is the process of randomize
data.

 Information

Visualization represents databases containing traditional data.

(visualization of relational database management system)
Entity relationship diagram (or ER Relational schema

Showes how the tables in the Here each rectangle represent a

datbase are related. distinct data table. and the line
represents which is and which are.

Teachniques for working with big data

Here there are much more verity beyond categorial and numerical Examples of
big data can be number text,digital images ,digital video data ,digital audio
data.

Data science Theory 6

with a wider range of data types comes with wider range if data cleansing
methods.
There are thchniques that verify that a digital image observation is ready for
processing.

Text data mining: The process of deriving valuable ,unstructured data from a
text.

Data masking: analyse the information without compromising private detailes.

Business intelligence (BI) analysis:

Data skills + business knowledge and intution to eplain the past performance
of the company.

How we measure business performance.

We start by collecting observation.

For example Collecting variables shuch as sales volume or new

customer enrolled in your web site

Each monthly revenue is each customer is consider a single

observation

Then we must quantify that information.Quantification is the process of

representing observation as numbers.

Measure: ameasure is the

accumulation of observations to
show some information

For example : If you total the

revenue of all three months
to obtain the value of $350
that would be that will be the
measure if the revenue of the
first quarter of that year.

Similary add together the

nukmber of new customer for
the same period : 50 and you
have a another measure.

Data science Theory 7

Analyze the data
Metrics - refers to the value that derives from the measure you obtain and
aims at gauging business performance or progress.
NOTE : Metric=meansure + business meaning

☝This is useful for comparision.

Can we kepp track of all possible metric we can extract from data set? - YES

Does it makes sense to do that ? NO

What you need to do is choose the metrics that are tightly aligned with your
business objective.There metrics are called KPIs Key Performance Indicators)
KPIs=metrics + business objectives

Key - related to ypur business goals

Performance - how successfully you have performed within a specified time

frame.
Indicators - generated only from users who have clicked on a link provided in
your ad campaign.

Metric KPI

The traffic of a page from your The traffic generated only from users
website that was visited by any type who have clicked on a link provided
of user. in your ad campaign

Data science Theory 8

And the next step every quantitative meaning you extracted must me
visulaized.

Traditional methods
At this stage we start applying analytics.

Techniques for working with traditional data

Regression: A model used for quantifying casal relationships among the

different variables included in your analysis.

For example:
Linear regression models

The table below is the data of price and house in square feet. This is linear
regression models.

Here the Red line is regression line.

because the all the point are close to the red
line while its not close to the green line. So
green line is not regression line

Data science Theory 9

So this red line can be written as

y = bx

Here, y -house price ,b-coefficient and x-house size

Logistic regression

The values in the vertical line will be 1s or 0s only.

Such models used in decision making process.

Companies apply logistical regression algorithms to filter job candidates

during their screening process.

If the algorithm estimates the probability that a prospective candidate will

perform well and the company is above 50 % it would be predicted one or
a successful application. Otherwise its 0

Data science Theory 10

Cluster analysis

For example if the house price vs house square feet graph is like below

Here the red line is the regression line. But here we ca do more : cluster
analysis .

This is another technique that will take into account that certain observations
exhibit similar house sizes and prices

Here in the cluster city

center : cost high and small
,far from the city : big but
cost less , nice

Data science Theory 11

neighborhoods : in the city
cost high and big house

For this example we only have the house size and house price.
but when it comes to this table:

Here the mathematical expression for regression model.

y = a + b1 x1 + b2 x2 + b3 x3 + ....... + bn xn

NOTE X explanatory variable is AKA regressor or independent variable

=predictor variable
For example analyzing a survey that consist of 100 questions.

In this question the regression model is:

y = a + b1 x1 + b2 + x2 + b3 x3 + .......... + b100 x100

Data science Theory 12

Here the factor analysis comes place.

In the example : Question 1 : I like animals ⭕⭕⭕⭕⭕

Question 2 : I care about animals ⭕⭕⭕⭕⭕
Question 3 : I am against animal cruelty ⭕⭕⭕⭕⭕
Whoever marks 5 to the first question most likely to give 5 for the rest two
questions.In other words if you strongly agree with one of there questions
you will not disagree with other 2.
With factor analysis We can add all the three questions to general attitude
towards animals.

⎧
⎪x1 1. I like animals
z1 = ⎨x2 2. I care about animals
⎪
⎩
x3 3. I am against animal cruelty

By this way we can reduce the regressor to 100 to 10.Which is more accurate
prediction.

y = n + n1 z1 + n2 z2 + n3 z3 + ......... + n10 z10

Time series
Plotting values against time. Time is always in x-axis.

Example for traditional methods

Example : User experience

Image you are the head of the user experience UX)department of a web site
selling goods on a global scale.
So as the head of UX our goal is to maximize user satisfaction.

Assume you already designed and implemented a survey that measured the
attitude of your customers towards the latest global products you have
launched

Data science Theory 13

When you the data on survey as the graph in left side. We should do the
cluster analysis.
Once we find out there are 4 separate groups it makes sense to run four
separate test.

Machine learning
creating an algorithm, which a computer then uses to find a model that fits the
data as best as possible and makes vert predictions based on that.

Machine learning algorithm -A trial and error process. Each consecutive trial
is at least as good as the previous one .

There are 4 ingredients.

 Data

 Model

 Objective function - To measure the inaccuracy

 Optimization algorithm - To improve

Types of machine learning :

 Supervised learning - This uses the prior results here the data is labeled

 Unsupervised learning - Here the data is unlabeled.

 Reinforcement learning -

Data science Theory 14

Step-by-Step Exploratory Data Analysis (EDA) Using Python
100% (1)
Step-by-Step Exploratory Data Analysis (EDA) Using Python
20 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Reading Textbooks in The Natural and Social Sciences
83% (6)
Reading Textbooks in The Natural and Social Sciences
11 pages
Cambridge IGCSE Maths Skills Teachers Support
No ratings yet
Cambridge IGCSE Maths Skills Teachers Support
14 pages
Data Cleaning: A Brief Guide To
100% (2)
Data Cleaning: A Brief Guide To
15 pages
The 365 DS Booklet PDF
100% (1)
The 365 DS Booklet PDF
67 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Data Science
100% (2)
Data Science
38 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Data Science
100% (1)
Data Science
7 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Chapter 2 - NumPy and Pandas
No ratings yet
Chapter 2 - NumPy and Pandas
26 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
No ratings yet
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
8 pages
SQL - Basics
No ratings yet
SQL - Basics
25 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
Time Series
No ratings yet
Time Series
23 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
Introduction To Python For Data Science - Syllabus
100% (1)
Introduction To Python For Data Science - Syllabus
5 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
Python For Data Science
100% (1)
Python For Data Science
4 pages
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
No ratings yet
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
1 page
771 A18 Lec4
100% (1)
771 A18 Lec4
128 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
New Batches Info: Quality Thought Ai-Data Science Diploma
No ratings yet
New Batches Info: Quality Thought Ai-Data Science Diploma
16 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Data Visualisation and Analytics
No ratings yet
Data Visualisation and Analytics
3 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Data Wrangling
No ratings yet
Data Wrangling
30 pages
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
100% (1)
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
3 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
20 pages
Data Science Course Content
No ratings yet
Data Science Course Content
4 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
30 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Data Mining
100% (1)
Data Mining
53 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Data Visualization PDF
No ratings yet
Data Visualization PDF
3 pages
ALX Data Analytics Program Description
No ratings yet
ALX Data Analytics Program Description
6 pages
Github Data Science Projects
No ratings yet
Github Data Science Projects
16 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Rapid Miner - Data Preparation
100% (1)
Rapid Miner - Data Preparation
17 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
Python Pyramid Program
No ratings yet
Python Pyramid Program
4 pages
Data Generalization
No ratings yet
Data Generalization
3 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
VCET-CSE-NEONHACK 2025
No ratings yet
VCET-CSE-NEONHACK 2025
1 page
Mufon Ufo Journal - May 1981
100% (2)
Mufon Ufo Journal - May 1981
20 pages
Grade 7 Science Module
100% (1)
Grade 7 Science Module
4 pages
Fall (23-24) FAT RES7008 RM ExamSeating Schedule 07022024
No ratings yet
Fall (23-24) FAT RES7008 RM ExamSeating Schedule 07022024
3 pages
1011-Semiotic Analysis
No ratings yet
1011-Semiotic Analysis
19 pages
Scan-to-BIM Method in Construction Assessment of The 3D Buildings Model Accuracy in Terms Inventory Measurements
No ratings yet
Scan-to-BIM Method in Construction Assessment of The 3D Buildings Model Accuracy in Terms Inventory Measurements
23 pages
Ap21 Apc Statistics q5 - Unlocked
No ratings yet
Ap21 Apc Statistics q5 - Unlocked
16 pages
Controversy and Psychology
100% (1)
Controversy and Psychology
143 pages
Laboratory Experiment 1: Density Determination
No ratings yet
Laboratory Experiment 1: Density Determination
26 pages
0000-NMMU Int Office - Masters PHD - Insert
No ratings yet
0000-NMMU Int Office - Masters PHD - Insert
2 pages
Demonstration-Based Training (DBT) in The Design of A Video Tutorial For Software Training
No ratings yet
Demonstration-Based Training (DBT) in The Design of A Video Tutorial For Software Training
16 pages
Lesson 15 - Crossover Designs
No ratings yet
Lesson 15 - Crossover Designs
20 pages
Random Sampling Error - 2013-03-02
No ratings yet
Random Sampling Error - 2013-03-02
4 pages
High Quality Literature Review
100% (1)
High Quality Literature Review
8 pages
Academic Cover Letter INSA
No ratings yet
Academic Cover Letter INSA
1 page
Experiences of Science Teachers in Implementing Spiral Approach
No ratings yet
Experiences of Science Teachers in Implementing Spiral Approach
5 pages
Modern Science and Islamic Essential Ism
No ratings yet
Modern Science and Islamic Essential Ism
8 pages
Colorado School of Mines Thesis Guidelines
100% (3)
Colorado School of Mines Thesis Guidelines
8 pages
Pronunciation Action Research
33% (3)
Pronunciation Action Research
38 pages
On Social Research and Its Concepts
No ratings yet
On Social Research and Its Concepts
3 pages
Colorful Modern Illustrated Learning and Technology School Project Education Presentation
No ratings yet
Colorful Modern Illustrated Learning and Technology School Project Education Presentation
19 pages
Anova Lesson
No ratings yet
Anova Lesson
19 pages
Anna University:: Chennai - 600 025: Office of The Controller of Examinations
No ratings yet
Anna University:: Chennai - 600 025: Office of The Controller of Examinations
11 pages
Gurdjieff. Keith Buzzell OKIDANOKH
100% (1)
Gurdjieff. Keith Buzzell OKIDANOKH
6 pages
Guessing - C. S. Peirce
No ratings yet
Guessing - C. S. Peirce
19 pages
Difference Between Background of Study and Literature Review
100% (1)
Difference Between Background of Study and Literature Review
4 pages
Complete Download (eBook PDF) Educational Research: Competencies for Analysis and Applications 12th Edition PDF All Chapters
100% (2)
Complete Download (eBook PDF) Educational Research: Competencies for Analysis and Applications 12th Edition PDF All Chapters
41 pages
Hypothesis
100% (1)
Hypothesis
29 pages