Python Codes Test 1
Python Codes Test 1
Upload data using the below code, in jupiter upload file in csv format and put file name in file_path it will load the data.
1. import pandas as pd
file_path = r'DEVP-II_TEST_DATASET.csv'
df = pd.read_csv(file_path)
df.info()
A1 Display the customer surname (and the value) that has the highest repeat transactions or repeat purchase. (Tip: Use count or sum
statistic)
surname_counts = df['surname'].value_counts()
A2. Display the country (and the value) that has: (1) the highest number of membership, (2) the lowest number of membership. (Tip: Use
member = 1 | Use count or sum statistic)
# A2
import pandas as pd
A3. Display the gender (and the value) that is: (1) the youngest, (2) the oldest. (Tip: Use median statistic)
# A3
gender_age_median = df.groupby('gender')['age'].median().reset_index()
A 4. Display the country (and the value) that has: (1) the richest customer, (2) the poorest customers. (Tip: Use salary | Use mean statistic)
import pandas as pd
# Find the country with the richest customer (based on mean salary)
richest_country = df.groupby('country')['salary'].mean().idxmax()
richest_salary = df.groupby('country')['salary'].mean().max()
# Find the country with the poorest customers (based on mean salary)
poorest_country = df.groupby('country')['salary'].mean().idxmin()
poorest_salary = df.groupby('country')['salary'].mean().min()
A5. Display the gender (and the value) that has: (1) the best credit score, (2) the worst credit score. (Tip: Use mean statictic)
import pandas as pd
# Find the gender with the best credit score (based on mean credit score)
best_credit_gender = df.groupby('gender')['credit_score'].mean().idxmax()
best_credit_score = df.groupby('gender')['credit_score'].mean().max()
print(f"The gender with the best credit score is {best_credit_gender} with a mean credit score of {best_credit_score}.")
# Find the gender with the worst credit score (based on mean credit score)
worst_credit_gender = df.groupby('gender')['credit_score'].mean().idxmin()
worst_credit_score = df.groupby('gender')['credit_score'].mean().min()
print(f"The gender with the worst credit score is {worst_credit_gender} with a mean credit score of {worst_credit_score}.")
This line groups the DataFrame df by the 'gender' column.
It then calculates the mean credit score for each gender using the 'credit_score' column.
idxmax() and max() are used to find the index (gender) and maximum mean credit score, respectively.
Similar to the process for the best credit score, idxmin() and min() are used to find the index and minimum mean credit score
for the worst credit score.
A6. Display the country (and the value) with: (1) highest variation in salary (2) lowest variation in salary. (Tip: Use std. dev. statistic)
import pandas as pd
A7. Display the gender statistics with respect to country. (Tip: Use count or sum statistic)
1. Country with highest number of male customer 2. Country with lowest number of male customer
3. Country with highest number of female customer 4. Country with lowest number of female customer
import pandas as pd
A8. Display the member counts (both 0 & 1) with respect to both country & gender. (Tip: Use count or sum statistic)
import pandas as pd
print("Member Counts:")
print(member_counts)
import pandas as pd
import matplotlib.pyplot as plt
A9. Display the combination of country & gender (and the value) that has: (1) the best credit score, (2) the worst credit score. (Tip: Use
mean statictic)
import pandas as pd
# Assuming your DataFrame is loaded as 'df'
# Find the combination of country & gender with the best credit score
(based on mean credit score)
best_credit_combination = df.groupby(['country', 'gender'])
['credit_score'].mean().idxmax()
best_credit_score = df.groupby(['country', 'gender'])
['credit_score'].mean().max()
print(f"The combination of country & gender with the best credit score
is {best_credit_combination} with a mean credit score of
{best_credit_score}.")
# Find the combination of country & gender with the worst credit score
(based on mean credit score)
worst_credit_combination = df.groupby(['country', 'gender'])
['credit_score'].mean().idxmin()
worst_credit_score = df.groupby(['country', 'gender'])
['credit_score'].mean().min()
print(f"The combination of country & gender with the worst credit score
is {worst_credit_combination} with a mean credit score of
{worst_credit_score}.")
# Calculating the mean credit score for each combination of country and
gender
mean_credit_scores_by_country_gender = df.groupby(['country',
'gender'])['credit_score'].mean()
# Finding the combination with the best (highest) and worst (lowest)
mean credit score
best_credit_combo = mean_credit_scores_by_country_gender.idxmax()
best_credit_score = mean_credit_scores_by_country_gender.max()
worst_credit_combo = mean_credit_scores_by_country_gender.idxmin()
worst_credit_score = mean_credit_scores_by_country_gender.min()
# Plotting the mean credit scores for the combinations with the highest
and lowest scores
plt.figure(figsize=(6, 4))
sns.barplot(x=filtered_credit_scores.index.map(lambda x: f"{x[0]},
{x[1]}"), y=filtered_credit_scores.values, palette="Blues")
plt.title("Best and Worst Credit Scores by Country and Gender")
plt.xlabel("Country, Gender")
plt.ylabel("Mean Credit Score")
plt.show()
A10. Display the top 3 & bottom 3 credit scores in the: (1) members catergory, (2) non-members category. (Tip: Use n-largest / n-smallest
statictic)
# Finding the top 3 and bottom 3 credit scores in the members category
top_3_member_credit_scores = df[df['member'] == 1]
['credit_score'].nlargest(3)
bottom_3_member_credit_scores = df[df['member'] == 1]
['credit_score'].nsmallest(3)
# Plotting the top 3 and bottom 3 credit scores for members and non-
members
plt.figure(figsize=(10, 6))
credit_scores_combined.plot(kind='bar', color=["lightblue",
"lightcoral", "mediumseagreen", "peachpuff"])
plt.title("Top 3 and Bottom 3 Credit Scores in Members and Non-Members
Categories")
plt.xlabel("Rank")
plt.ylabel("Credit Score")
plt.xticks(ticks=[0, 1, 2], labels=['1st', '2nd', '3rd'], rotation=0)
plt.legend(title='Category')
plt.show()
For members (df['member'] == 1), nlargest(3, 'credit_score') is used to find the top 3 credit scores.
nsmallest(3, 'credit_score') is used to find the bottom 3 credit scores.
These results are printed.
For non-members (df['member'] == 0), nlargest(3, 'credit_score') is used to find the top 3 credit scores.
nsmallest(3, 'credit_score') is used to find the bottom 3 credit scores.
These results are printed.
Using nlargest and nsmallest on the credit scores directly for both members and non-members.
Creates a new DataFrame, credit_scores_combined, with columns representing the top 3 and bottom 3 credit scores for both
members and non-members.
Creates a bar graph using the combined DataFrame.
The x-axis represents the rank (1st, 2nd, 3rd), and the y-axis represents the credit score.
Different colors are used for members and non-members.
Displays the bar graph.
A11. Display the top & bottom country (and the value) in terms of salary in the: (1) members category, (2) non-members category. (Tip:
Use median statistics)
import pandas as pd
# Assuming your DataFrame is loaded as 'df'
# Top country in terms of salary in the members category
top_members_salary_country = df[df['member'] == 1].groupby('country')['salary'].median().nlargest(1).reset_index()
print("Top country in terms of salary in the members category:")
print(top_members_salary_country)
# Calculating the median salary for each country within members and non-members categories
median_salary_members = df[df['member'] == 1].groupby('country')['salary'].median()
median_salary_non_members = df[df['member'] == 0].groupby('country')['salary'].median()
# Finding the top & bottom country in terms of median salary for members and non-members
top_country_salary_members = median_salary_members.idxmax()
top_salary_members = median_salary_members.max()
bottom_country_salary_members = median_salary_members.idxmin()
bottom_salary_members = median_salary_members.min()
top_country_salary_non_members = median_salary_non_members.idxmax()
top_salary_non_members = median_salary_non_members.max()
bottom_country_salary_non_members = median_salary_non_members.idxmin()
bottom_salary_non_members = median_salary_non_members.min()
# Plotting the median salaries for the top & bottom countries in members and non-members categories
plt.figure(figsize=(10, 6))
sns.barplot(x=['Top Member', 'Bottom Member', 'Top Non-Member', 'Bottom Non-Member'],
y=[top_salary_members, bottom_salary_members, top_salary_non_members, bottom_salary_non_members],
hue=['Members', 'Members', 'Non-Members', 'Non-Members'],
palette='Set2')
plt.title("Top & Bottom Countries by Median Salary in Members and Non-Members Categories")
plt.ylabel("Median Salary")
plt.show()
{
"top_country_salary_members": (top_country_salary_members, top_salary_members),
"bottom_country_salary_members": (bottom_country_salary_members, bottom_salary_members),
"top_country_salary_non_members": (top_country_salary_non_members, top_salary_non_members),
"bottom_country_salary_non_members": (bottom_country_salary_non_members, bottom_salary_non_members)
}
A12. Display the top 2 & bottom 2 age groups in the (1) members category, (2) non-members category. (Tip: Use n-largest / n-smallest
statistic)
# Counting the number of customers in each age group for members and non-members
age_group_counts_members = df[df['member'] == 1]['age_group'].value_counts()
age_group_counts_non_members = df[df['member'] == 0]['age_group'].value_counts()
# Extracting the top 2 and bottom 2 age groups for members and non-members
top_2_members = combined_age_groups['Members'].nlargest(2)
bottom_2_members = combined_age_groups['Members'].nsmallest(2)
top_2_non_members = combined_age_groups['Non-Members'].nlargest(2)
bottom_2_non_members = combined_age_groups['Non-Members'].nsmallest(2)
# Merging top 2 and bottom 2 age groups into a single DataFrame for plotting
# Ensuring all relevant age groups are included in the index
all_age_groups =
top_2_members.index.union(bottom_2_members.index).union(top_2_non_members.index).union(bottom_2_non_members.index)
merged_age_groups = pd.DataFrame(index=all_age_groups)
# Adding the top 2 and bottom 2 age groups to the DataFrame
merged_age_groups['Top 2 Members'] = top_2_members.reindex(all_age_groups)
merged_age_groups['Bottom 2 Members'] = bottom_2_members.reindex(all_age_groups)
merged_age_groups['Top 2 Non-Members'] = top_2_non_members.reindex(all_age_groups)
merged_age_groups['Bottom 2 Non-Members'] = bottom_2_non_members.reindex(all_age_groups)