Assignment 02
Assignment 02
: 6107
Subject: 510302 - BDS
ASSIGNMENT: 02
Aim: Take a sample dataset (The lab teacher may provide it). Plot the data using appropriate graphs (e.g.
scatter diagram). Perform normality and symmetry tests on it using at least one graph method and at least
one statistical test. Analyse the results. Then evaluate Spearman’s Rank Correlation for this data.
Requirements:
• Software: PyCharm Professional
• Libraries: Pandas, Scikit-Learn, Seaborn, Matplotlib, and NumPy
• Dataset: studentperformance.csv from Kaggle
Theory: This program analyzes the relationship between two variables by plotting a scatter plot, testing for
normality using z-scores, and assessing symmetry with skewness. Histograms are used to visually examine
data distribution, while Spearman’s Rank Correlation measures the strength and direction of the monotonic
relationship between the variables.
Code:
import pandas as pd
data = pd.read_csv('StudentsPerformance.csv')
data.head()
plt.figure(figsize=(8, 6))
sns.scatterplot(x='math score', y='reading score', data=data)
plt.title('Scatter Plot of Math Score vs Reading Score')
plt.xlabel('Math Score')
plt.ylabel('Reading Score')
plt.grid(True)
plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS
import numpy as np
math_scores = data['math score'].values
reading_scores = data['reading score'].values
writing_scores = data['writing score'].values
def calculate_skewness(data):
n = len(data)
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
skew_math = calculate_skewness(math_scores)
skew_reading = calculate_skewness(reading_scores)
skew_writing = calculate_skewness(writing_scores)
plt.figure(figsize=(12, 6))
plt.subplot(1, 3, 1)
sns.histplot(math_scores, kde=True, bins=20)
plt.title('Histogram of Math Score')
plt.subplot(1, 3, 2)
sns.histplot(reading_scores, kde=True, bins=20)
plt.title('Histogram of Reading Score')
plt.subplot(1, 3, 3)
sns.histplot(writing_scores, kde=True, bins=20)
plt.title('Histogram of Writing Score')
plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS
Conclusion: In conclusion, the scatter plot provides a visual insight into the relationship between the two
variables, while the normality test suggests whether the data follows a normal distribution. The skewness
measure indicates any asymmetry in the data, and the histograms offer a clear view of the distribution's
shape. Finally, Spearman’s Rank Correlation helps determine the strength and direction of the relationship
between the variables, offering a comprehensive understanding of their interdependence.