BSBA F23 BAIT3473 Project1 Powerquery
BSBA F23 BAIT3473 Project1 Powerquery
Term Project
1
Group Members
ABDULLAH (L1F21BSBA0014)
JAVERIA UMAR(L1F21BSBA0025)
ALY(L1F21BSBA0026)
2
Executive Summary
In our Business Database Strategy project, our focus is on HR analytics, utilizing a dataset
obtained from Kaggle. The dataset, centered around the Human Resources department, contains
detailed employee information. Our primary goal is to employ Power BI for comprehensive data
analysis and visualization.
We initiated the project by connecting the dataset to Power BI, structuring and cleaning the data
through Power Query. This involved tasks such as removing irrelevant columns, handling
missing values, and transforming numerical ratings into qualitative measures.
Our project emphasizes a strategic approach to data capturing, transformation, and cleaning. The
dataset's structured nature ensures efficient data management and analysis. The main objectives
include achieving company goals, fostering a positive workplace culture, and promoting
employee empowerment within the HR domain.
Moving forward, the project involves creating dashboards that visualize key HR metrics,
including attrition rates, job satisfaction, and employee demographics. This approach enables us
to derive meaningful insights, support informed decision-making, and contribute to the overall
success of our organization by optimizing HR strategies. The project aligns with the broader goal
of leveraging data-driven insights to enhance business database strategy.
3
Table of Contents
project scope 5
business domain 5
nature of data 5
objectives and features of dataset 5
4
Project Scope
Business domain:
The dataset that we had selected is about human resource department that contain entire
employee information with respect to their work activities. The Human Resources (HR) function
plays a pivotal role in ensuring the effective management of human capital to achieve project
objectives. The primary objectives of the HR are achieving company objectives, improving
workplace culture, conducting development and training programs, employee empowerment,
employee motivation, and teamwork.
Nature of data:
The dataset chosen for our analysis is characterized as a structured dataset, distinguished by
well-defined attributes and key identifiers. This dataset exhibits a clear organization, with
distinct variables and identifiable keys that facilitate efficient data management and analysis. The
structured nature of this dataset ensures a systematic arrangement of information, allowing for
precise retrieval and interpretation of data elements. The inclusion of proper attributes and key
identifiers enhances the overall integrity and reliability of the dataset, providing a solid
foundation for meaningful insights and informed decision-making in our analytical endeavors.
Step1
Data capturing
Data capturing refers to the process of collecting, extracting, or obtaining raw data from various
sources for further analysis. In the context of analytics and business intelligence, data capturing
is a crucial step that involves gathering information to be used for making informed decisions,
identifying trends, and deriving insights. The importance of data capturing lies in its role as the
foundation for any analytical or reporting endeavor.
5
Performance Evaluation
Capturing relevant data allows businesses to assess the performance of various processes,
departments, or initiatives, aiding in strategic planning and resource allocation.
Identification of Patterns and Trends:
Analyzing captured data helps in identifying patterns, trends, and anomalies, which is essential
for understanding the dynamics of a business or system.
Predictive Analytics:
Historical data captured over time can be used to develop predictive models, allowing organizations to
anticipate future trends and make proactive decisions.
Resource Optimization:
Understanding data helps in optimizing resources, streamlining processes, and improving efficiency by
eliminating bottlenecks or inefficiencies.
2. Extraction and Format Conversion: The dataset, initially in a zip folder, was extracted to
access the CSV file containing the raw data. This extraction process is akin to capturing the data
from its original compressed state, making it accessible for further analysis.
3. Power BI Connection: Connecting the dataset with Power BI is a pivotal step in the data
analytics workflow. Power BI serves as a robust tool for visualization, analysis, and reporting,
making it essential for transforming raw data into meaningful insights.
6
The data is executed successfully as shown below:
7
In summary, the process of capturing data from Kaggle, extracting it from a zip folder,
converting it to a usable format (CSV), and connecting it to Power BI marks the initial steps in
the data analytics journey. This ensures that the dataset is ready for exploration, transformation,
and visualization within the Power BI environment, setting the stage for comprehensive HR
analytics.
Step 2
data transformation
Data Transformation
Data transformation refers to the process of converting raw data from its original format into a
format that is suitable for analysis or other downstream tasks. It involves restructuring,
modifying, or aggregating data to make it more usable and meaningful. Data transformation may
include tasks such as merging datasets, handling missing values, converting data types, creating
new variables, and scaling numerical values. The goal is to prepare the data in a way that makes
it easier to extract valuable insights or feed into machine learning models.
8
Transformation techniques can be applied to handle missing data, such as imputing values based
on certain criteria or removing instances with missing values.
Data Type Conversion:
Converting data types ensures that variables are represented in a format suitable for analysis. For
example, transforming a date string into a datetime format.
Data Cleaning:
Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and
correcting errors, inconsistencies, and inaccuracies in datasets. It involves tasks such as handling
missing values, removing duplicates, correcting typos, and addressing outliers.
Both data transformation and data cleaning are essential steps in the data preprocessing pipeline.
They contribute to the overall data quality, making the data more suitable for analysis, reporting,
and modeling purposes. Clean and well-transformed data sets the foundation for extracting
meaningful insights and making informed decisions.
9
data transformation and cleaning within our dataset:
Removing Unnecessary Columns:
In the initial exploration of the Kaggle dataset, we identified columns such as relationship status
and relationship satisfaction that were not aligned with the specific analytical goals of our
project. To streamline the dataset and focus on key factors influencing attrition, we utilized
Power Query's 'Remove Columns' functionality. This step ensures that only relevant information
is retained for subsequent analysis.
10
Handling Blank Cells
Addressing missing or null values is
crucial for ensuring the integrity and
completeness of the dataset. Using
Power Query, we applied the
'Replace' and 'Fill' functions to deal
with empty cells. For instance, if
some employees had not provided
salary information (resulting in null
or 0 values), we replaced these
instances with meaningful values or
filled them down based on the
context of the data. This process
ensures a consistent and reliable
dataset for analysis.
11
BEFORE:
12
AFTER:
In conclusion, the data transformation and cleaning process using Power Query in Power BI were pivotal
for refining the dataset, eliminating irrelevant information, addressing missing values, and converting
13
numerical ratings into more meaningful qualitative measures. This meticulous preparation lays the
foundation for accurate analysis and insightful visualizations, aligning the data with the specific
requirements of the HR analytics project.
STEP 3
DATA VISUALIZATION
Power BI is a powerful business intelligence tool that allows organizations to transform raw data
into meaningful insights. Its user-friendly interface and robust analytical capabilities make it an
ideal choice for creating interactive and visually compelling dashboards. In the presented Power
BI dashboard for HR analytics, we harnessed the tool's capabilities to visualize key performance
indicators (KPIs) such as total employees, attrition count, attrition rate, active employees, and
average age. Through pie charts, stacked column charts, line charts, stacked bar charts, and
matrices, we visually represented critical HR metrics such as department-wise attrition, age
group distribution, environment satisfaction trends, attrition based on education fields, and job
satisfaction across different departments. By incorporating filters for education, department, and
job role, we enhanced the dashboard's interactivity, allowing users to dynamically explore and
analyze specific aspects of the workforce. The utilization of Power BI facilitated a
comprehensive and intuitive representation of HR analytics data, empowering stakeholders to
make informed decisions and optimize human resource management strategies.
14
In the HR analytics dash board we created KPIs, VISUALS and GRAPHS accordingly.
KPIs
KPIs are critical metrics that gauge the performance of an organization in achieving its
objectives. In HR analytics, KPIs like attrition rate, total employees, and average age provide a
holistic view of workforce health.
Importance: KPIs serve as benchmarks, offering clear insights into trends, successes, and areas
that require attention. They guide strategic decision-making, enabling organizations to align their
efforts with overarching goals.
THE KPIs OF OUR DATASET ARE AS FOLLOW:
Total Employees
This KPI provides a baseline understanding of the organization's
workforce size. Monitoring changes in the total number of employees
over time helps in assessing the overall growth or contraction of the
company.
Attrition Count
15
The attrition count KPI is crucial for identifying the number of
employees leaving the company. It is a key metric for HR to track
turnover, measure the effectiveness of retention strategies, and assess the
impact on workforce planning.
Attrition Rate
Calculating attrition as a percentage of the total workforce standardizes the measure, making it
easier to compare attrition rates across different time periods or
departments. A high attrition rate may indicate potential issues in
employee satisfaction, work environment, or other factors.
Active Employees
Knowing the number of active employees at any given time provides a real-
time snapshot of the workforce. This KPI is essential for day-to-day
operational management, ensuring that there are enough employees to meet
business demands.
Average Age
The average age of the workforce is valuable for demographic analysis. It
aids in succession planning, identifying potential skill gaps, and tailoring
benefits or training programs to different age groups.
FILTERS
Filters allow users to interactively slice and dice data, focusing on specific subsets. In HR
analytics, filters based on education, department, and job role empower users to tailor their
analysis to particular segments of the workforce.
Importance: Filters enhance the flexibility and depth of analysis. They enable personalized
exploration, facilitating the identification of trends or issues within specific categories and
supporting targeted decision-making.
THE FILTERS OF OUR DATASET ARE AS FOLLOW:
Education
16
The education filter enables a deeper analysis based on the educational background of
employees. This filter helps in understanding if education is a factor influencing
attrition, job satisfaction, or performance in specific roles.
Department
The department filter allows for a focused analysis on specific organizational units.
Identifying departments with higher attrition rates or lower job satisfaction
provides insights for targeted interventions, training, or restructuring.
Job Role
The job role filter allows users to narrow down their analysis to specific roles within
the organization. This is essential for understanding whether certain job roles are
more susceptible to attrition or dissatisfaction, guiding role-specific interventions or
adjustments.
Each KPI and filter in our dashboard serves a unique purpose, collectively offering a
comprehensive view of the workforce. This detailed information helps HR professionals make
data-driven decisions, identify trends, and implement targeted strategies to enhance overall
employee satisfaction and retention.
VISUALS
Visuals and graphs transform raw data into compelling and easy-to-understand representations.
They include pie charts, bar graphs, line charts, tree map, matrix, donut chart etc., providing a
visual narrative for complex datasets.
Importance: Visuals are crucial for conveying insights quickly and intuitively. They aid in
pattern recognition, trend identification, and storytelling. Visualizations make data accessible to a
broader audience, fostering better understanding and engagement.
Enables quick identification of departments with the highest and lowest attrition.
Facilitates comparison of attrition rates among departments.
Helps HR focus on targeted strategies for specific departments, addressing unique challenges they
might face.
17
Number of Employees by Age Group (Stacked Column Chart)
This visual is a stacked column chart illustrating the distribution of employees across various age groups.
Allows visualization of how changes in environment satisfaction correlate with workforce size.
Helps HR understand the impact of environmental factors on employee retention.
Provides insights into the effectiveness of initiatives aimed at improving work satisfaction.
18
Attrition Count per Education Field (Stacked Bar Chart)
This visual is a stacked bar chart representing attrition counts categorized by different education fields.
Identifies whether certain educational backgrounds are associated with higher attrition rates.
Assists in tailoring recruitment strategies based on educational preferences.
Helps HR focus on development programs that resonate with the educational diversity of the
workforce.
19
Importance of Using Power BI for Visualization
1. Interactive Dashboards:
Power BI enables the creation of interactive and dynamic dashboards. Users can explore data
intuitively, adjusting filters and interacting with visuals to gain deeper insights. This interactivity
enhances the user experience and encourages data-driven exploration.
2. Data Connectivity:
Power BI allows seamless integration with various data sources, both on-premises and in the
cloud. This flexibility enables organizations to consolidate and analyze diverse datasets,
providing a comprehensive view of HR analytics.
3. User-Friendly Interface:
The user-friendly interface of Power BI makes it accessible to individuals with varying levels of
technical expertise. Its drag-and-drop functionality allows users to build complex visualizations
without extensive coding knowledge.
4. Real-time Updates:
Power BI can connect to real-time data sources, ensuring that dashboards are always up-to-date.
This is particularly crucial in HR analytics, where timely insights into workforce dynamics can
drive proactive decision-making.
20
5. Scalability:
Power BI is scalable, accommodating both small businesses and large enterprises. Its capabilities
make it suitable for handling diverse HR datasets, from basic metrics to advanced analytics,
ensuring flexibility as organizational needs evolve.
In summary, the combination of effective KPIs, filters, and visuals in HR analytics, when
leveraged through Power BI, enhances the interpretation and communication of data-driven
insights. The interactivity, connectivity, user-friendliness, and scalability of Power BI make it an
ideal platform for creating meaningful and actionable visualizations that support informed
decision-making in human resource management.
21