0% found this document useful (0 votes)
2 views

DATA ANALYSIS

The document contains a Python script that loads a CSV file into a pandas DataFrame and performs statistical analysis on average test scores, student enrollment, and books. It calculates mean, median, and standard deviation for these metrics, identifies schools with the highest and lowest test scores, and analyzes student enrollment. Additionally, it computes books per student and categorizes schools based on their performance relative to the average test score.

Uploaded by

Khushi Seth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DATA ANALYSIS

The document contains a Python script that loads a CSV file into a pandas DataFrame and performs statistical analysis on average test scores, student enrollment, and books. It calculates mean, median, and standard deviation for these metrics, identifies schools with the highest and lowest test scores, and analyzes student enrollment. Additionally, it computes books per student and categorizes schools based on their performance relative to the average test score.

Uploaded by

Khushi Seth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

import pandas as pd

import matplotlib.pyplot as plt

# Step 1: Load the CSV file into a pandas DataFrame


df = pd.read_csv(r"C:\Users\User\OneDrive\Desktop\KHUSHI\AI PROJECT\DataFrame.csv")

# Step 2: Display the first few rows to ensure data is loaded correctly
print(df.head())

# Step 3: Perform basic statistics (Mean, Median, Standard Deviation)


mean_test_scores = df['average test scores'].mean()
median_test_scores = df['average test scores'].median()
std_dev_test_scores = df['average test scores'].std()

mean_enrollment = df['student enrollment'].mean()


median_enrollment = df['student enrollment'].median()
std_dev_enrollment = df['student enrollment'].std()

mean_books = df['books'].mean()
median_books = df['books'].median()
std_dev_books = df['books'].std()

print(f"Mean of Average Test Scores: {mean_test_scores}")


print(f"Median of Average Test Scores: {median_test_scores}")
print(f"Standard Deviation of Average Test Scores: {std_dev_test_scores}")

print(f"Mean of Student Enrollment: {mean_enrollment}")


print(f"Median of Student Enrollment: {median_enrollment}")
print(f"Standard Deviation of Student Enrollment: {std_dev_enrollment}")

print(f"Mean of Books: {mean_books}")


print(f"Median of Books: {median_books}")
print(f"Standard Deviation of Books: {std_dev_books}")

# Step 4: Perform other analysis


# Highest and lowest test scores
highest_score = df['average test scores'].max()
highest_score_school = df[df['average test scores'] == highest_score]
print(f"\nHighest Test Score: {highest_score}")
print(f"School with highest test score:\n{highest_score_school}")

lowest_score = df['average test scores'].min()


lowest_score_school = df[df['average test scores'] == lowest_score]
print(f"\nLowest Test Score: {lowest_score}")
print(f"School with lowest test score:\n{lowest_score_school}")

# School with most and least students


school_with_most_students = df[df['student enrollment'] == df['student
enrollment'].max()]
school_with_least_students = df[df['student enrollment'] == df['student
enrollment'].min()]
print(f"\nSchool with most students:\n{school_with_most_students}")
print(f"School with least students:\n{school_with_least_students}")

# Books per student and most/least books per student


df['books per student'] = df['books'] / df['student enrollment']
most_books_per_student = df[df['books per student'] == df['books per
student'].max()]
least_books_per_student = df[df['books per student'] == df['books per
student'].min()]

print(f"\nSchool with most books per student:\n{most_books_per_student}")


print(f"School with least books per student:\n{least_books_per_student}")

# Schools above and below average test score


overall_average_score = df['average test scores'].mean()
above_average_schools = df[df['average test scores'] > overall_average_score]
below_average_schools = df[df['average test scores'] < overall_average_score]

print(f"\nSchools performing above average:\n{above_average_schools}")


print(f"Schools performing below average:\n{below_average_schools}")

You might also like