Statistics_FoundationalMathofAI_S24
Statistics_FoundationalMathofAI_S24
Yashil Sukurdeep
June 26, 2024
1 Introduction
Statistics is a branch of mathematics that deals with collecting, analyzing, in-
terpreting, and presenting data. We provide an overview of basic statistical
concepts, including measures of central tendency, measures of dispersion, mea-
sures of correlation, and data visualization. These concepts are fundamental in
understanding data and drawing meaningful conclusions from it.
2 Descriptive statistics
Descriptive statistics are statistical methods that summarize and organize the
characteristics of a dataset. They provide simple summaries about the sample
and the measures. Descriptive statistics are useful for understanding the basic
features of the data and for simplifying large amounts of data in a sensible
way. Common descriptive statistics include measures of central tendency and
measures of dispersion.
• Median: The median is the middle value when the data points are ar-
ranged in ascending order. If there is an even number of data points, the
median is the average of the two middle values.
• Mode: The mode is the value that occurs most frequently in the data
set. A dataset may have one mode, more than one mode, or no mode at
all.
1
2.2 Measures of Dispersion
Measures of dispersion describe the spread or variability of a dataset. The most
common measures are the range, quartiles, variance and standard deviation:
• Range: The range is the difference between the highest and lowest values
in a dataset:
Range = xmax − xmin ,
where xmax = max xi and xmin = min xi .
i i
• Quartiles: Quartiles divide a data set into four equal parts. The first
quartile (Q1) is the median of the lower half, the second quartile (Q2) is
the median of the data set, and the third quartile (Q3) is the median of
the upper half.
• Inter-Quartile Range (IQR): The inter-quartile range is the difference
between the third quartile and the first quartile:
IQR = Q3 − Q1
• Variance: The variance measures the average squared deviation from the
mean. It is denoted as:
n
2 1 X
σ = (xi − x̄)2
n − 1 i=1
3 Data Visualization
Data visualization is a powerful tool for understanding and communicating data.
Common types of visualizations include histograms, box plots, and scatter plots.
2
Figure 1: Example of a Histogram
3
Figure 3: Example of a Scatter Plot
4
Figure 4: Examples of scatter plots with different values of the correlation
coefficient ρXY .
4 Conclusion
Understanding these basic statistical measures and visualizations is crucial for
analyzing data effectively. Descriptive statistics, including measures of central
tendency and dispersion, provide insights into the center and spread of the
data. Measures of correlation reveal the relationships between variables, and
data visualizations make it easier to comprehend and communicate the data.
Mastery of these concepts is essential for anyone looking to work with data and
draw meaningful conclusions from it.