0% found this document useful (0 votes)

8 views

Basic Statistics Concepts For Data Science

statistics concept

Uploaded by

nitdrjothilakshmi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Basic Statistics Concepts For Data Science

statistics concept

Uploaded by

nitdrjothilakshmi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Basic Statistics Concepts for Data

Science
1. Descriptive Statistics

It is used to describe the basic features of data that provide a summary of the

given data set which can either represent the entire population or a sample of

the population.

It is derived from calculations that include:

 Mean: It is the central value which is commonly known as arithmetic

average.

 Mode: It refers to the value that appears most often in a data set.

 Median: It is the middle value of the ordered set that divides it in exactly half .

2. Variability

Variability includes the following parameters:

 Standard Deviation: It is a statistic that calculates the dispersion of a data

set as compared.

 Variance: It refers to a statistical measure of the spread between the

numbers in a data set. In general terms, it means the difference from the

mean. A large variance indicates that numbers are far apart from average

value. Small variance indicates that the numbers are closer to the average

values. Zero variance indicates that the values are identical to the given set.
 Range: This is defined as the difference between the largest and smallest

value of a dataset.

 Percentile: It refers to the measure used in statistics that indicates the value

below which the given percentage of observation in the dataset falls.

 Quartile: It is defined as the value that divides the data points into quarters .

 Interquartile Range: It measures the middle half of your data . In general

terms, it is the middle 50% of the dataset.

3. Correlation

It is one of the major statistical techniques that measure the relationship

between two variables. The correlation coefficient indicates the strength of the

linear relationship between two variables.

 A correlation coefficient that is more than zero indicates a positive

relationship.

 A correlation coefficient that is less than zero indicates a negative

relationship.

 Correlation coefficient zero indicates that there is no relationship between

the two variables.

4. Probability Distribution

It specifies of all possible events. In simple terms, an event refers to the result

of an experiment. Events are of two types dependent and independent .

 Independent event: The event is said to be an Independent event when it is

not affected by the earlier events .

 Dependent event: The event is said to be dependent when the occurrence

of the event is dependent on the earlier events

The probability of independent events is calculated by simply multiplying the

probability of each event and for a dependent event is calculated by conditional

probability.

5. Regression

It is a method that is used to determine the relationship between one or more

independent variables and a dependent variable. Regression is mainly of two

types:

 Linear regression: It is used to fit the regression model that explains the

relationship between a numeric predictor variable and one or more predictor

variables.

 Logistic regression: It is used to fit a regression model that explains the

relationship between the binary response variable and one or more predictor

variables.
6. Normal Distribution

Normal is used to define the probability density function for a continuous

random variable in a system . The standard normal distribution has two

parameters – mean and standard deviation . When the distribution of random

variables is unknown, the normal distribution is used. The central limit theorem

justifies why normal distribution is used in such cases.

7. Bias

In statistical terms, it means when a model is representative of a complete

population. This needs to be minimized to get the desired outcome .

The three most common types of bias are:

 Selection bias: It is a phenomenon of selecting a group of data for statistical

analysis, the selection in such a way that data is not randomized resulting in

the data being unrepresentative of the whole population.

 Confirmation bias: It occurs when the person performing the statistical

analysis has some predefined assumption.

 Time interval bias: It is caused intentionally by specifying a certain time

range to favor a particular outcome.

7 Basic Statistics
No ratings yet
7 Basic Statistics
2 pages
1.1 CS3352-FDS -UNIT 1
No ratings yet
1.1 CS3352-FDS -UNIT 1
42 pages
Statistics
No ratings yet
Statistics
3 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
a, b, c, d & 6
No ratings yet
a, b, c, d & 6
6 pages
Unit IV
No ratings yet
Unit IV
22 pages
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
No ratings yet
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
11 pages
Types of Statistics
No ratings yet
Types of Statistics
3 pages
Document 8
No ratings yet
Document 8
10 pages
Statistics_Compendium_DMS IIT DELHI_2025
No ratings yet
Statistics_Compendium_DMS IIT DELHI_2025
18 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Basics of Statistics
No ratings yet
Basics of Statistics
1 page
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Statics Imp Answer
No ratings yet
Statics Imp Answer
14 pages
Statistics
No ratings yet
Statistics
12 pages
Basic statistics involve analyzing
No ratings yet
Basic statistics involve analyzing
2 pages
STATICS - Copy
No ratings yet
STATICS - Copy
12 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Module3
No ratings yet
Module3
54 pages
Probability and Statistics Notes
No ratings yet
Probability and Statistics Notes
38 pages
3-4-RESEARCH-8-2
No ratings yet
3-4-RESEARCH-8-2
54 pages
Q. Bank final
No ratings yet
Q. Bank final
9 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Statistics For Data Analyst
No ratings yet
Statistics For Data Analyst
7 pages
STATISTICS
No ratings yet
STATISTICS
2 pages
Statistics 1: 2 Marks
No ratings yet
Statistics 1: 2 Marks
5 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Statistics
No ratings yet
Statistics
152 pages
Solution Manual For Statistics Data Analysis and Decision Modeling 5th Edition Evans 0132744287 9780132744287
100% (47)
Solution Manual For Statistics Data Analysis and Decision Modeling 5th Edition Evans 0132744287 9780132744287
7 pages
View
No ratings yet
View
4 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
5.basic Statistics
No ratings yet
5.basic Statistics
43 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
SAS 2130 Statistics 2021
No ratings yet
SAS 2130 Statistics 2021
212 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Ebook - Statistics Fundamentals For Business Analytics
No ratings yet
Ebook - Statistics Fundamentals For Business Analytics
9 pages
Data Science Module 3 q & A
No ratings yet
Data Science Module 3 q & A
7 pages
BSC First Year Syllabus
100% (1)
BSC First Year Syllabus
6 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Descriptive & Inferential Statistics
No ratings yet
Descriptive & Inferential Statistics
11 pages
Statistics
No ratings yet
Statistics
3 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Data Analytics Chat GPT
No ratings yet
Data Analytics Chat GPT
75 pages
PRELIM-COVERAGE
No ratings yet
PRELIM-COVERAGE
6 pages
INTRODUCTION TO SATISTICS .DOC1
No ratings yet
INTRODUCTION TO SATISTICS .DOC1
7 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Contents UNIT 42
No ratings yet
Contents UNIT 42
21 pages
Final Stats Intrerview Q&A
No ratings yet
Final Stats Intrerview Q&A
12 pages
F.Y.B.sc. Statistics-Statistical Techniques
100% (1)
F.Y.B.sc. Statistics-Statistical Techniques
18 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Business Analytics
No ratings yet
Business Analytics
40 pages
719 Final Syllabus Merged
No ratings yet
719 Final Syllabus Merged
200 pages
What is Statistics
No ratings yet
What is Statistics
4 pages
Statistics
No ratings yet
Statistics
21 pages
5412-1
No ratings yet
5412-1
13 pages
Statistics
No ratings yet
Statistics
7 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Concept of Data Warehouse
No ratings yet
Concept of Data Warehouse
4 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Dara Mining
No ratings yet
Dara Mining
3 pages
Basic Data Science
No ratings yet
Basic Data Science
2 pages
CIA-RDP96-00789R003100030001-4 (CIA Stargate Document)
No ratings yet
CIA-RDP96-00789R003100030001-4 (CIA Stargate Document)
439 pages
Math 142 Co2 (2.3)
No ratings yet
Math 142 Co2 (2.3)
2 pages
CS30 5 System Modeling and Simulation Prof. Dr. Khaled Mahar
No ratings yet
CS30 5 System Modeling and Simulation Prof. Dr. Khaled Mahar
32 pages
Chapter 7-Tahoe-Salt
No ratings yet
Chapter 7-Tahoe-Salt
13 pages
Pengaruh Kepemimpinan Dan Keselamatan Kesehatan Kerja (K3) Terhadap Kinerja Karyawan
No ratings yet
Pengaruh Kepemimpinan Dan Keselamatan Kesehatan Kerja (K3) Terhadap Kinerja Karyawan
11 pages
Nism Paper MF
No ratings yet
Nism Paper MF
169 pages
The Effect of Earnings Surprises on Stock Returns: Analysis of the Canadian Market
No ratings yet
The Effect of Earnings Surprises on Stock Returns: Analysis of the Canadian Market
26 pages
Hodder AAHL Practice Questions 4. Statistics and Probability
No ratings yet
Hodder AAHL Practice Questions 4. Statistics and Probability
18 pages
Complete Download Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R 1st Edition Paul Roback PDF All Chapters
No ratings yet
Complete Download Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R 1st Edition Paul Roback PDF All Chapters
71 pages
Ken Black QA 5th Chapter15 Solution
100% (1)
Ken Black QA 5th Chapter15 Solution
12 pages
CAPM vs. Fama-French3 Model
No ratings yet
CAPM vs. Fama-French3 Model
10 pages
ISYE 6501 Georgia Tech hmwk4.2
No ratings yet
ISYE 6501 Georgia Tech hmwk4.2
4 pages
49 - Correlation and Causation Practice
No ratings yet
49 - Correlation and Causation Practice
2 pages
Lesson 5 Measures of Dispersion (Rhea)
No ratings yet
Lesson 5 Measures of Dispersion (Rhea)
25 pages
Session 1 Forecasting: Advanced Management Accounting
100% (1)
Session 1 Forecasting: Advanced Management Accounting
40 pages
Chapter 3 - Descriptive statistics (Ungrouped Data)
No ratings yet
Chapter 3 - Descriptive statistics (Ungrouped Data)
30 pages
Introduction To Econometrics - Stock & Watson - CH 13 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 13 Slides
38 pages
ETC1000 Exam Sem1 2017 PDF
No ratings yet
ETC1000 Exam Sem1 2017 PDF
11 pages
Statistics for The Behavioral Sciences 10th Edition, (Ebook PDF) pdf download
100% (36)
Statistics for The Behavioral Sciences 10th Edition, (Ebook PDF) pdf download
56 pages
Mendenhall Ch06-+modified
No ratings yet
Mendenhall Ch06-+modified
28 pages
Chapter 6 in Class Questions Handout 2
No ratings yet
Chapter 6 in Class Questions Handout 2
2 pages
Machine Learning Notes: 2. All The Commands For Eda
100% (2)
Machine Learning Notes: 2. All The Commands For Eda
5 pages
DrSoomro - 2588 - 20292 - 1 - Lecture 7 & 8
No ratings yet
DrSoomro - 2588 - 20292 - 1 - Lecture 7 & 8
60 pages
Predicting The Volatility of The S&P-500 Stock Index Via GARCH Models: The Role of Asymmetries
No ratings yet
Predicting The Volatility of The S&P-500 Stock Index Via GARCH Models: The Role of Asymmetries
17 pages
Kolmogorov-Smirnov Test For Normality
No ratings yet
Kolmogorov-Smirnov Test For Normality
16 pages
Spring 2024 - STA301 - 1 - SOL
No ratings yet
Spring 2024 - STA301 - 1 - SOL
4 pages
Unit 3 Fod
No ratings yet
Unit 3 Fod
18 pages
1 s2.0 S0169207020300224 Main
No ratings yet
1 s2.0 S0169207020300224 Main
19 pages
Psych Assessment Chapter 3
No ratings yet
Psych Assessment Chapter 3
4 pages
Mahalanobis Distance
No ratings yet
Mahalanobis Distance
4 pages

Basic Statistics Concepts For Data Science

Uploaded by

Basic Statistics Concepts For Data Science

Uploaded by

Basic Statistics Concepts for Data

It is derived from calculations that include:

 Mean: It is the central value which is commonly known as arithmetic

Variability includes the following parameters:

 Variance: It refers to a statistical measure of the spread between the

below which the given percentage of observation in the dataset falls.

 Interquartile Range: It measures the middle half of your data . In general

terms, it is the middle 50% of the dataset.

It is one of the major statistical techniques that measure the relationship

linear relationship between two variables.

 A correlation coefficient that is more than zero indicates a positive

 A correlation coefficient that is less than zero indicates a negative

 Correlation coefficient zero indicates that there is no relationship between

the two variables.

of an experiment. Events are of two types dependent and independent .

 Independent event: The event is said to be an Independent event when it is

not affected by the earlier events .

 Dependent event: The event is said to be dependent when the occurrence

of the event is dependent on the earlier events

The probability of independent events is calculated by simply multiplying the

probability of each event and for a dependent event is calculated by conditional

It is a method that is used to determine the relationship between one or more

independent variables and a dependent variable. Regression is mainly of two

relationship between a numeric predictor variable and one or more predictor

 Logistic regression: It is used to fit a regression model that explains the

Normal is used to define the probability density function for a continuous

random variable in a system . The standard normal distribution has two

parameters – mean and standard deviation . When the distribution of random

justifies why normal distribution is used in such cases.

In statistical terms, it means when a model is representative of a complete

population. This needs to be minimized to get the desired outcome .

The three most common types of bias are:

 Selection bias: It is a phenomenon of selecting a group of data for statistical

the data being unrepresentative of the whole population.

 Confirmation bias: It occurs when the person performing the statistical

analysis has some predefined assumption.

 Time interval bias: It is caused intentionally by specifying a certain time

range to favor a particular outcome.

You might also like