0% found this document useful (0 votes)
9 views5 pages

Statistics_FoundationalMathofAI_S24

This document provides an introduction to statistics, covering essential concepts such as descriptive statistics, measures of central tendency, measures of dispersion, data visualization, and measures of correlation. It explains how these statistical methods help in collecting, analyzing, and interpreting data effectively. Mastery of these concepts is crucial for drawing meaningful conclusions from data.

Uploaded by

Tej Grover
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Statistics_FoundationalMathofAI_S24

This document provides an introduction to statistics, covering essential concepts such as descriptive statistics, measures of central tendency, measures of dispersion, data visualization, and measures of correlation. It explains how these statistical methods help in collecting, analyzing, and interpreting data effectively. Mastery of these concepts is crucial for drawing meaningful conclusions from data.

Uploaded by

Tej Grover
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Introduction to Statistics

Yashil Sukurdeep
June 26, 2024

1 Introduction
Statistics is a branch of mathematics that deals with collecting, analyzing, in-
terpreting, and presenting data. We provide an overview of basic statistical
concepts, including measures of central tendency, measures of dispersion, mea-
sures of correlation, and data visualization. These concepts are fundamental in
understanding data and drawing meaningful conclusions from it.

2 Descriptive statistics
Descriptive statistics are statistical methods that summarize and organize the
characteristics of a dataset. They provide simple summaries about the sample
and the measures. Descriptive statistics are useful for understanding the basic
features of the data and for simplifying large amounts of data in a sensible
way. Common descriptive statistics include measures of central tendency and
measures of dispersion.

2.1 Measures of Central Tendency


Measures of central tendency are used to describe the center (or typical value)
of a dataset. Consider a sample of univariate data x1 , x2 , . . . , xn ∈ R. The most
common measures of central tendency are the mean, median, and mode:
• Mean: The mean (or average) is the sum of all data points divided by
the number of data points. It is denoted as:
n
1X
x̄ = xi
n i=1

• Median: The median is the middle value when the data points are ar-
ranged in ascending order. If there is an even number of data points, the
median is the average of the two middle values.
• Mode: The mode is the value that occurs most frequently in the data
set. A dataset may have one mode, more than one mode, or no mode at
all.

1
2.2 Measures of Dispersion
Measures of dispersion describe the spread or variability of a dataset. The most
common measures are the range, quartiles, variance and standard deviation:

• Range: The range is the difference between the highest and lowest values
in a dataset:
Range = xmax − xmin ,
where xmax = max xi and xmin = min xi .
i i

• Quartiles: Quartiles divide a data set into four equal parts. The first
quartile (Q1) is the median of the lower half, the second quartile (Q2) is
the median of the data set, and the third quartile (Q3) is the median of
the upper half.
• Inter-Quartile Range (IQR): The inter-quartile range is the difference
between the third quartile and the first quartile:

IQR = Q3 − Q1

• Variance: The variance measures the average squared deviation from the
mean. It is denoted as:
n
2 1 X
σ = (xi − x̄)2
n − 1 i=1

• Standard deviation: The standard deviation is the square-root of the


variance: √
σ = σ2

3 Data Visualization
Data visualization is a powerful tool for understanding and communicating data.
Common types of visualizations include histograms, box plots, and scatter plots.

• Histogram: A histogram is a graphical representation of the distribution


of a dataset. It divides the data into bins and shows the frequency of data
points in each bin, see Figure 1.
• Box Plot: A box plot, or box-and-whisker plot, displays the distribution
of a dataset through its quartiles. It shows the median, quartiles, and
potential outliers, Figure 2.
• Scatter Plot: A scatter plot displays the relationship between two vari-
ables. Each point represents an observation in the data set, with its posi-
tion determined by the values of the two variables, see Figure 3.

2
Figure 1: Example of a Histogram

Figure 2: Example of a Box Plot

3
Figure 3: Example of a Scatter Plot

3.1 Measures of Correlation


Correlation measures the strength and direction of the relationship between two
variables. Consider two samples X = {x1 , . . . , xn } and Y = {y1 , . . . , yn }.

• Covariance: The covariance σXY between X and Y is a measure of how


much the two variables change together. It is defined as:
n
1X
σXY = (xi − x̄)(yi − ȳ)
n i=1

where x̄ and ȳ are the means of X and Y , respectively.


• Pearson’s Correlation Coefficient: This is a measure of the linear
relationship between X and Y . It is defined as:
σXY
ρXY = p 2 2
σX σY
2
where σXY is the covariance between X and Y , and σX and σY2 are the
variances of X and Y , respectively.

4
Figure 4: Examples of scatter plots with different values of the correlation
coefficient ρXY .

4 Conclusion
Understanding these basic statistical measures and visualizations is crucial for
analyzing data effectively. Descriptive statistics, including measures of central
tendency and dispersion, provide insights into the center and spread of the
data. Measures of correlation reveal the relationships between variables, and
data visualizations make it easier to comprehend and communicate the data.
Mastery of these concepts is essential for anyone looking to work with data and
draw meaningful conclusions from it.

You might also like