Capstone Project Assignment
Capstone Project Assignment
Objective
This capstone project is designed to challenge and enhance your skills in Python
programming, focusing on data preprocessing, cleaning, manipulation, and analysis using
Pandas and NumPy. It also evaluates your ability to create compelling, meaningful
visualizations with Matplotlib, Seaborn, Plotly, and Bokeh.
Project Context
The dataset contains detailed employee information, including demographics, job roles,
salaries, bonuses, performance scores, and other attributes. It has intentionally been
augmented with anomalies (e.g., typos, missing values, and outliers) to simulate real-world
data. Your task is to clean, analyze, and extract meaningful insights to guide business
decisions.
Project Tasks
1. GroupBy Analysis:
o Calculate the average salary by Department and Gender.
o Identify the top 3 job roles in terms of average PerformanceScore.
2. Correlation and Relationships:
o Compute correlations between Salary, YearsAtCompany, and
PerformanceScore.
o Identify whether salary has a stronger correlation with PerformanceScore or
YearsAtCompany.
3. Crosstab Analysis:
o Analyze the relationship between RemoteWork and MaritalStatus using a
crosstab.
4. Filtering and Ranking:
o List the top 5 employees with the highest bonus-to-salary ratio.
o Identify the top 3 cities with the highest average salaries and their
corresponding average performance scores.
5. Departmental Analysis:
o Find the department with the most balanced gender ratio.
o Compare the average salaries of employees in Sales and Engineering across
countries.
6. NumPy Calculations:
o Calculate the median salary for each Job type using NumPy.
o Standardize the PerformanceScore column using z-scores.
7. Performance and Age:
o Group employees by AgeGroup and calculate average PerformanceScore and
AnnualBonus.
o Explore how bonuses vary across different ExperienceLevel categories.
3. Visualization Tasks
Deliverables
1. Code Notebook:
o A well-documented Jupyter notebook with clean, modular code and comments.
o Include analysis, visualizations, and insights.
2. Summary Report:
o A 2–3 page report summarizing:
Key findings and insights.
Embedded visuals with brief explanations.
Recommendations based on the analysis.
3. Presentation Slides:
o A 7–10 slide deck summarizing the project approach, visuals, and actionable
insights.
Evaluation Criteria