0% found this document useful (0 votes)
13 views4 pages

Assignment 02

Uploaded by

DHRUV TILLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Assignment 02

Uploaded by

DHRUV TILLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Name: Dhruv Jayant Tillu Roll No.

: 6107
Subject: 510302 - BDS

ASSIGNMENT: 02
Aim: Take a sample dataset (The lab teacher may provide it). Plot the data using appropriate graphs (e.g.
scatter diagram). Perform normality and symmetry tests on it using at least one graph method and at least
one statistical test. Analyse the results. Then evaluate Spearman’s Rank Correlation for this data.

Requirements:
• Software: PyCharm Professional
• Libraries: Pandas, Scikit-Learn, Seaborn, Matplotlib, and NumPy
• Dataset: studentperformance.csv from Kaggle

Theory: This program analyzes the relationship between two variables by plotting a scatter plot, testing for
normality using z-scores, and assessing symmetry with skewness. Histograms are used to visually examine
data distribution, while Spearman’s Rank Correlation measures the strength and direction of the monotonic
relationship between the variables.

Code:
import pandas as pd

data = pd.read_csv('StudentsPerformance.csv')
data.head()

gender race/ethnicity parental level of education lunch \


0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score


0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

import matplotlib.pyplot as plt


import seaborn as sns

plt.figure(figsize=(8, 6))
sns.scatterplot(x='math score', y='reading score', data=data)
plt.title('Scatter Plot of Math Score vs Reading Score')
plt.xlabel('Math Score')
plt.ylabel('Reading Score')
plt.grid(True)
plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

corr = data[['math score', 'reading score', 'writing score']].corr()


sns.heatmap(corr, annot=True, cmap='seismic')
plt.title('Correlation Heatmap')

Text(0.5, 1.0, 'Correlation Heatmap')

import numpy as np
math_scores = data['math score'].values
reading_scores = data['reading score'].values
writing_scores = data['writing score'].values

z_scores_math = (math_scores - np.mean(math_scores)) / np.std(math_scores)


z_scores_reading = (reading_scores - np.mean(reading_scores)) / np.std(reading_scores)
z_scores_writing = (writing_scores - np.mean(writing_scores)) / np.std(writing_scores)

within_one_std_math = np.mean(np.abs(z_scores_math) <= 1)


within_two_std_math = np.mean(np.abs(z_scores_math) <= 2)

within_one_std_reading = np.mean(np.abs(z_scores_reading) <= 1)


within_two_std_reading = np.mean(np.abs(z_scores_reading) <= 2)

within_one_std_writing = np.mean(np.abs(z_scores_writing) <= 1)


within_two_std_writing = np.mean(np.abs(z_scores_writing) <= 2)

print(f'Percentage of Math Scores within 1 std: {within_one_std_math * 100}%')


print(f'Percentage of Math Scores within 2 std: {within_two_std_math * 100}%')

print(f'Percentage of Reading Scores within 1 std: {within_one_std_reading * 100}%')


Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

print(f'Percentage of Reading Scores within 2 std: {within_two_std_reading * 100}%')

print(f'Percentage of Writing Scores within 1 std: {within_one_std_writing * 100}%')


print(f'Percentage of Writing Scores within 2 std: {within_two_std_writing * 100}%')

Percentage of Math Scores within 1 std: 69.6%


Percentage of Math Scores within 2 std: 95.39999999999999%
Percentage of Reading Scores within 1 std: 66.4%
Percentage of Reading Scores within 2 std: 95.39999999999999%
Percentage of Writing Scores within 1 std: 68.8%
Percentage of Writing Scores within 2 std: 95.8%

def calculate_skewness(data):
n = len(data)
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)

# Pearson's second coefficient of skewness


skewness = 3 * (mean - median) / std_dev
return skewness

skew_math = calculate_skewness(math_scores)
skew_reading = calculate_skewness(reading_scores)
skew_writing = calculate_skewness(writing_scores)

print(f'Skewness for Math Score: {skew_math}')


print(f'Skewness for Reading Score: {skew_reading}')
print(f'Skewness for Writing Score: {skew_writing}')

Skewness for Math Score: 0.01761737051555966


Skewness for Reading Score: -0.1708366195714668
Skewness for Writing Score: -0.18685734108808663

plt.figure(figsize=(12, 6))

plt.subplot(1, 3, 1)
sns.histplot(math_scores, kde=True, bins=20)
plt.title('Histogram of Math Score')

plt.subplot(1, 3, 2)
sns.histplot(reading_scores, kde=True, bins=20)
plt.title('Histogram of Reading Score')

plt.subplot(1, 3, 3)
sns.histplot(writing_scores, kde=True, bins=20)
plt.title('Histogram of Writing Score')

plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510302 - BDS

from scipy.stats import spearmanr


spearman_corr, p_value = spearmanr(math_scores, reading_scores)

print(f"Spearman's Rank Correlation: {spearman_corr}")


print(f"P-value: {p_value}")

Spearman's Rank Correlation: 0.8040638885551747


P-value: 1.3538514946746025e-227

spearman_corr, p_value = spearmanr(reading_scores, writing_scores)

print(f"Spearman's Rank Correlation: {spearman_corr}")


print(f"P-value: {p_value}")

Spearman's Rank Correlation: 0.9489525187100921


P-value: 0.0

Conclusion: In conclusion, the scatter plot provides a visual insight into the relationship between the two
variables, while the normality test suggests whether the data follows a normal distribution. The skewness
measure indicates any asymmetry in the data, and the histograms offer a clear view of the distribution's
shape. Finally, Spearman’s Rank Correlation helps determine the strength and direction of the relationship
between the variables, offering a comprehensive understanding of their interdependence.

You might also like