The document contains a Python script that loads a CSV file into a pandas DataFrame and performs statistical analysis on average test scores, student enrollment, and books. It calculates mean, median, and standard deviation for these metrics, identifies schools with the highest and lowest test scores, and analyzes student enrollment. Additionally, it computes books per student and categorizes schools based on their performance relative to the average test score.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views
DATA ANALYSIS
The document contains a Python script that loads a CSV file into a pandas DataFrame and performs statistical analysis on average test scores, student enrollment, and books. It calculates mean, median, and standard deviation for these metrics, identifies schools with the highest and lowest test scores, and analyzes student enrollment. Additionally, it computes books per student and categorizes schools based on their performance relative to the average test score.
# Step 2: Display the first few rows to ensure data is loaded correctly print(df.head())
# Step 3: Perform basic statistics (Mean, Median, Standard Deviation)
mean_test_scores = df['average test scores'].mean() median_test_scores = df['average test scores'].median() std_dev_test_scores = df['average test scores'].std()
print(f"Mean of Average Test Scores: {mean_test_scores}")
print(f"Median of Average Test Scores: {median_test_scores}") print(f"Standard Deviation of Average Test Scores: {std_dev_test_scores}")
print(f"Mean of Student Enrollment: {mean_enrollment}")
print(f"Median of Student Enrollment: {median_enrollment}") print(f"Standard Deviation of Student Enrollment: {std_dev_enrollment}")
print(f"Mean of Books: {mean_books}")
print(f"Median of Books: {median_books}") print(f"Standard Deviation of Books: {std_dev_books}")
# Step 4: Perform other analysis
# Highest and lowest test scores highest_score = df['average test scores'].max() highest_score_school = df[df['average test scores'] == highest_score] print(f"\nHighest Test Score: {highest_score}") print(f"School with highest test score:\n{highest_score_school}")
lowest_score = df['average test scores'].min()
lowest_score_school = df[df['average test scores'] == lowest_score] print(f"\nLowest Test Score: {lowest_score}") print(f"School with lowest test score:\n{lowest_score_school}")
# School with most and least students
school_with_most_students = df[df['student enrollment'] == df['student enrollment'].max()] school_with_least_students = df[df['student enrollment'] == df['student enrollment'].min()] print(f"\nSchool with most students:\n{school_with_most_students}") print(f"School with least students:\n{school_with_least_students}")
# Books per student and most/least books per student
df['books per student'] = df['books'] / df['student enrollment'] most_books_per_student = df[df['books per student'] == df['books per student'].max()] least_books_per_student = df[df['books per student'] == df['books per student'].min()]
print(f"\nSchool with most books per student:\n{most_books_per_student}")
print(f"School with least books per student:\n{least_books_per_student}")
# Schools above and below average test score
overall_average_score = df['average test scores'].mean() above_average_schools = df[df['average test scores'] > overall_average_score] below_average_schools = df[df['average test scores'] < overall_average_score]