0% found this document useful (0 votes)
121 views

Data Science Theory: Analysis and Analytics

This document discusses key concepts in data science theory including: 1) It distinguishes between analysis (examining past data) and analytics (predicting future patterns). Qualitative analysis uses intuition while quantitative analysis uses formulas. 2) Data science can improve predictive accuracy by analyzing data extracted from various activities. Business intelligence analyzes historical data to explain past events. 3) Machine learning uses data to make predictions and analyze patterns without explicit programming. Artificial intelligence simulates human decision making. 4) The document outlines approaches for working with different data types from raw to processed data to information and techniques for analyzing big data.

Uploaded by

Nonameforever
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Data Science Theory: Analysis and Analytics

This document discusses key concepts in data science theory including: 1) It distinguishes between analysis (examining past data) and analytics (predicting future patterns). Qualitative analysis uses intuition while quantitative analysis uses formulas. 2) Data science can improve predictive accuracy by analyzing data extracted from various activities. Business intelligence analyzes historical data to explain past events. 3) Machine learning uses data to make predictions and analyze patterns without explicit programming. Artificial intelligence simulates human decision making. 4) The document outlines approaches for working with different data types from raw to processed data to information and techniques for analyzing big data.

Uploaded by

Nonameforever
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

🧠

Data science Theory


Class Data science theory

Completed

Created Jun 12, 2020 1019 PM

Materials

Source Udemy

Type Lecture

Analysis and analytics


Analysis-

preform analysis on things that have already happended in the past.


Example: Hoe the sales decreased in the summer.

We do analysis to find what or something happen

Analytics-

Exploring patterns in exploring what we can do in the future.

There are two type of analytics

Qualitative analysis = intution and analysis

Data science Theory 1


Quantitative analysis =formulas and algorithms

Introduction:

In this some business activities are data driven while others are subjective or
experience driven.

Business needs -

Business case studies - real world experience of how companies succeed


and fail. We dont need a data set to understand case studies.

Qualitative anaytics - its all about intuition and knowleage about the market
,This includes working with tools to pridict the future behavior.

Preliminary data reporting

reporting with visuals

Creating dashboards
Sales forecastings

👆In the following the pink are data driven


👆The yello is experience driven
Some of the terms you refer to activites that aim to explain past
behavior(This is called as Analysis) while others refer to activites used for
predicting future behavior(This is called as analytics).

Data science Theory 2


Here the business case studies are analysis and qualitative analysis is all
about analytics predicting the fututre(Analytics).

NOTE Business analytics=business analysis + business analytics.

Data science: Can be used to improve the accuracy of prediction based on


data extraced from various activities.

Business Intelligence BI :The process of analysing and reporting historical


business data .Aims to explain past events using business data.preliminary
step of predictive analytics

 Analyse past data and extract useful insights

 create appropriate models

Reporting visuals and creating dashboards is all about BI

Machine Learning: The ability of mahine to pridict outcomes without being


explicitly programmed. is all about creating and implementing algorithms that
let machines receive data and use this data to

 Make pridictions

 analyses patterns

 give recommendations

Artificial intelligence: Simulating human knowledge and decision makeing with


computers.

Data science Handbook :

Data science Theory 3


Approaches and techniques working with traditional data.
Raw data to processed data and to information

Data science Theory 4


 Raw facts or Raw data

 Cannot be analysied straight away

 in is untouched data you have accumulated and storded in the server

 Data collection

 Examples: Survey Can be taken by surveys.How much people like


or dislike the product in the scale of 1 to 10 }

 Cookies : They provide companies with detailed information about


users activities on a web site.

 processed data

 Data pre-processing :

 Before data processing we do data pre-processing.This we do


after data collection.This is a group of operation that will basically
convert your raw data into a format that is more understandable.

 Example : In the SQL database is the person enters the age is 932
or name as united kingdom

 Before any analysis that data should be makred as invalid or


corrected.

 Methods in pre-processing:

 Class labeling -

 This inculdes labeling the data point to the correct data


type or arranging data by category.

 This can be

 Numerical - number of unites sold in the day

 categorical - cannot be manipulated.

 Data cleansing = data cleaning = data scrubbing

 It is to deal with inconsistant data

 Example: Correcting spelling mistakes and deal with


missing values.

 Example for Data preprocessing :

Data science Theory 5


 Balancing : Imagine you have copiled a survey to gather
data on the shopping habits of man and women .To find
who spends more money in the weekend.When you have
the data 80% of women and 20% of men in the
respondents. So the trends you may notice are not going
towards men as much as women to counteract.Applying
balancing techiques wiuld be the best thing to do such as
takeing equal number of respondents from each group.so
the ratio is 50/50.

 Data shuffling : Shuffling the observation from the dataset


is just like shuffling of cards.Prevents unwanted
patterns.Improves predictive perforance.helps avoid
misleading results.Suffling is the process of randomize
data.

 Information

Visualization represents databases containing traditional data.


(visualization of relational database management system)
Entity relationship diagram (or ER Relational schema

Showes how the tables in the Here each rectangle represent a


datbase are related. distinct data table. and the line
represents which is and which are.

Teachniques for working with big data


Here there are much more verity beyond categorial and numerical Examples of
big data can be number text,digital images ,digital video data ,digital audio
data.

Data science Theory 6


with a wider range of data types comes with wider range if data cleansing
methods.
There are thchniques that verify that a digital image observation is ready for
processing.

Text data mining: The process of deriving valuable ,unstructured data from a
text.

Data masking: analyse the information without compromising private detailes.

Business intelligence (BI) analysis:


Data skills + business knowledge and intution to eplain the past performance
of the company.

How we measure business performance.

We start by collecting observation.

For example Collecting variables shuch as sales volume or new


customer enrolled in your web site

Each monthly revenue is each customer is consider a single


observation

Then we must quantify that information.Quantification is the process of


representing observation as numbers.

Measure: ameasure is the


accumulation of observations to
show some information

For example : If you total the


revenue of all three months
to obtain the value of $350
that would be that will be the
measure if the revenue of the
first quarter of that year.

Similary add together the


nukmber of new customer for
the same period : 50 and you
have a another measure.

Data science Theory 7


Analyze the data
Metrics - refers to the value that derives from the measure you obtain and
aims at gauging business performance or progress.
NOTE : Metric=meansure + business meaning

☝This is useful for comparision.


Can we kepp track of all possible metric we can extract from data set? - YES

Does it makes sense to do that ? NO

What you need to do is choose the metrics that are tightly aligned with your
business objective.There metrics are called KPIs Key Performance Indicators)
KPIs=metrics + business objectives

Key - related to ypur business goals

Performance - how successfully you have performed within a specified time


frame.
Indicators - generated only from users who have clicked on a link provided in
your ad campaign.

Metric KPI

The traffic of a page from your The traffic generated only from users
website that was visited by any type who have clicked on a link provided
of user. in your ad campaign

Data science Theory 8


And the next step every quantitative meaning you extracted must me
visulaized.

Traditional methods
At this stage we start applying analytics.

Techniques for working with traditional data

Regression: A model used for quantifying casal relationships among the


different variables included in your analysis.

For example:
Linear regression models

The table below is the data of price and house in square feet. This is linear
regression models.

Here the Red line is regression line.


because the all the point are close to the red
line while its not close to the green line. So
green line is not regression line

Data science Theory 9


So this red line can be written as

y = bx

Here, y -house price ,b-coefficient and x-house size

Logistic regression

The values in the vertical line will be 1s or 0s only.

Such models used in decision making process.

Companies apply logistical regression algorithms to filter job candidates


during their screening process.

If the algorithm estimates the probability that a prospective candidate will


perform well and the company is above 50 % it would be predicted one or
a successful application. Otherwise its 0

Data science Theory 10


Cluster analysis

For example if the house price vs house square feet graph is like below

Here the red line is the regression line. But here we ca do more : cluster
analysis .

This is another technique that will take into account that certain observations
exhibit similar house sizes and prices

Here in the cluster city


center : cost high and small
,far from the city : big but
cost less , nice

Data science Theory 11


neighborhoods : in the city
cost high and big house

For this example we only have the house size and house price.
but when it comes to this table:

Here the mathematical expression for regression model.

y = a + b1 x1 + b2 x2 + b3 x3 + ....... + bn xn

NOTE X explanatory variable is AKA regressor or independent variable


=predictor variable
For example analyzing a survey that consist of 100 questions.

In this question the regression model is:

y = a + b1 x1 + b2 + x2 + b3 x3 + .......... + b100 x100

Data science Theory 12


Here the factor analysis comes place.

In the example : Question 1 : I like animals ⭕⭕⭕⭕⭕


Question 2 : I care about animals ⭕⭕⭕⭕⭕
Question 3 : I am against animal cruelty ⭕⭕⭕⭕⭕
Whoever marks 5 to the first question most likely to give 5 for the rest two
questions.In other words if you strongly agree with one of there questions
you will not disagree with other 2.
With factor analysis We can add all the three questions to general attitude
towards animals.


⎪x1  1. I like animals 
z1 = ⎨x2 2. I care about animals


x3 3. I am against animal cruelty 

By this way we can reduce the regressor to 100 to 10.Which is more accurate
prediction.

y = n + n1 z1 + n2 z2 + n3 z3 + ......... + n10 z10

Time series
Plotting values against time. Time is always in x-axis.

Example for traditional methods


Example : User experience

Image you are the head of the user experience UX)department of a web site
selling goods on a global scale.
So as the head of UX our goal is to maximize user satisfaction.

Assume you already designed and implemented a survey that measured the
attitude of your customers towards the latest global products you have
launched

Data science Theory 13


When you the data on survey as the graph in left side. We should do the
cluster analysis.
Once we find out there are 4 separate groups it makes sense to run four
separate test.

Machine learning
creating an algorithm, which a computer then uses to find a model that fits the
data as best as possible and makes vert predictions based on that.

Machine learning algorithm -A trial and error process. Each consecutive trial
is at least as good as the previous one .

There are 4 ingredients.

 Data

 Model

 Objective function - To measure the inaccuracy

 Optimization algorithm - To improve

Types of machine learning :

 Supervised learning - This uses the prior results here the data is labeled

 Unsupervised learning - Here the data is unlabeled.

 Reinforcement learning -

Data science Theory 14

You might also like