2 Unit
2 Unit
In addition to central tendency and dispersion, we can further explore characteristics of a single
data set using:
● M oments:These are measures that describe the distributionof data around the mean.
Common moments include:
○ Skewness:Indicates the asymmetry of the data distribution.A positive skew means the
distribution has a tail extending to the right, while a negative skew indicates a tail to the
left.
○ Kurtosis:Measures the "tailedness" of the distributioncompared to a normal distribution.
A positive kurtosis (leptokurtic) indicates heavier tails, while a negative kurtosis
(platykurtic) shows lighter tails.
● Boxplots:A visual representation of the distributionof data, displaying the median, quartiles
(Q1 and Q3), and outliers.
inear regression is a statistical method used to model the relationship between a continuous
L
dependent variable (what you want to predict) and one or more independent variables
(predictors). It fits a best-fitting line to the data points, allowing you to make predictions for new
data points based on the established relationship.
● S lope:Represents the change in the dependent variablefor a unit change in the
independent variable.
● Intercept:The y-intercept of the regression line,representing the predicted value of the
ependent variable when all independent variables are zero (if applicable).
d
R-squared:A statistical measure that indicates theproportion of variance in the dependent
●
variable explained by the independent variable(s). Values closer to 1 indicate a better fit.
Remember:
● R efer to your textbook (TB2) for detailed explanations, formulas, and examples of these
statistical concepts.
● Consider practicing with real-world datasets to solidify your understanding. Online resources
and tutorials can be helpful for this.
his is a more comprehensive overview of Unit 1 and Unit 2 in Data Science. By understanding
T
these concepts, you'll have a stronger foundation for further exploration in data analysis,
modeling, and visualization techniques.