BI FILE
BI FILE
Theory:-Power BI is a collection of software services, apps, and connectors that work together to turn
your unrelated sources of data into coherent, visually immersive, and interactive insights. Your data
might be an Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses.
Power BI lets you easily connect to your data sources, visualize and discover what's important, and share
that with anyone or everyone you want.
Procedure:-
Setting up Power BI: -
5. To open the Power BI Desktop, on the taskbar, click the Microsoft Power BI Desktop shortcut.
6. When prompted, click Sign In.
7. In Power BI Desktop, at the top-right corner, verify that you see your account.
EXPERIMENT-2
Aim:- Import the data from different sources and load into the target system.
Theory:-Power BI provides several methods for loading data into the target system, offering flexibility
to cater to diverse data sources and scenarios,
The primary ways include:
1. Direct Query: This method establishes a live connection to the data source, enabling real-time
data analysis without importing it into Power BI. Its is suitable for large datasets where the focus is
on current, up-to-the-minute insights, However,
2. Import Data: The import option involves loading data directly into the Power BI model,
which resides in-memory. This
approach is effective for smaller to moderately sized datasets, providing faster query performance.
Users can leverage Power
3. Power Query Editor: A versatile tool within Power BI, Power Query Editor enables users
to connect to various data
sources, apply transformations, and shape the data before loading it into the model. This method
allows for comprehensive
4. Dataflows: Dataflows facilitate a Power Query experience in the cloud, where data
transformations are applied in a Power
BI service workspace. This provides a reusable and centralized approach for data preparation,
promoting consistency in data
processing across reports and dashboards.
5. Power Automate Integration: Power Bl can integrate with Power Automate (formerly known
as Microsoft Flow) to
automate data loading processes, This integration enables users to trigger data refreshes based on
predefined schedules,
events, or triggers from external systems.
Procedure: -
1. In the Power Query Editor window, in the Queries pane, select the DimEmployee query.
2. To rename the query, in the Query Settings pane (located at the right), in the Name box,
replace the text with Salesperson, and then press Enter. The query name will determine
the model table name. It’s recommended to define concise, yet friendly, names.
3. In the Queries pane, verify that the query name has been updated.
4. To locate a specific column, on the Home ribbon tab, from inside the Manage
Columns group, click the Choose Columns down-arrow, and then select Go
to Column.
5. In the Go to Column window, to order the list by column name, click the AZ sort button,
and then select Name.
8. Click OK
9. In the Query Settings pane, in the Applied Steps list, notice the addition of the Filtered
Rows step.
10. To remove columns, on the Home ribbon tab, from inside the Manage Columns
group, click the Choose Columns icon.
11. . In the Choose Columns window, to uncheck all columns, uncheck the (Select All
Columns) item.
15. To create a single name column, first select the FirstName column header
16. While pressing the Ctrl key, select the LastName column
17. Right-click either of the select column headers, and then in the context menu, select
Merge Columns.
18. In the Merge Columns window, in the Separator dropdown list, select Space.
19. In the New Column Name box, replace the text with Salesperson.
20. Click OK.
21. To rename the
EmployeeNationalIDAlternateKey column,
double-click the
EmployeeNationalIDAlternateKey column
header
22. . Replace the text with EmployeeID, and then press Enter
23. . Use the previous steps to rename the EmailAddress column to UPN.
24. At the bottom-left, in the status bar, verify that the query has five columns and 18 rows
EXPERIMENT-3
Theory:-Each loading method in Power Bl aligns with theoretical principles of data integration and
ETL, allowing users to choose the most suitable approach based on their data volume. performance
requirements, and real-time analysis needs. In Power Bl. the
Extract, Transform, Load (ETL) process is a theoretical framework guiding the construction of a
database within the Power BI model.
1. Extract: The process begins with data extraction from various sources, such as
databases, spreadsheets, or cloud services.
Power Bl supports a wide range of connectors, adhering to the principle of extracting diverse data
types. The extraction
aligns with the theoretical foundation of data connectivity, ensuring compatibility with various
source systems.
2. Transform: Power BI's Power Query Editor plays a pivotal role in the transformation phase.
Rooted in theoretical
principles of relational algebra, it allows users to shape and clean the data. The transformations
include filtering, merging,
and aggregating data, ensuring data quality and relevance. The M language, based on functional
programming theory.
3. Load: The transformed data is loaded into the Power Bl model, which follows principles
of relational database theory.
The tabular data model organizes data into tables, leveraging relationships between them. This
optimized structure
adheres to normalization principles, ensuring efficient storage, retrieval, and query performance
within the model.
Procedure: -
1. Open Power BI and Connect to the Data Source
2. Enter the connection settings for the data source. You may be asked for a user ID
and password upon clicking OK.
3. Select the Tables You Need
4. Transform the Data Based on Your Requirements
a. CREATE A NEW QUERY BY REFERENCING AN EXISTING TABLE
i. Click Group By
ii. Then, click Advanced. We need to group using 2 columns.
iii. Click Add grouping. And when a dropdown list appears, select
appropriate column (for example:- product_id)
iv. Then, define the aggregation. Enter Total Premium in the New column
name. Then, select the Operation. And last, select the premium column to
sum.
v. Finally, click OK to create the grouping.
ii. If you do not see all seven tables, scroll horizontally to the right, and then drag and
arrange the tables more closely together so they can all be seen at the same time.
iii. To return to the Report view, at the left, click the Report view icon.
iv. To view all table fields, in the Fields pane, right-click an empty area, and then select Expand All.
v. To create a table visual, in the Fields pane, from inside the Product table, check
the Category field.
vi. To add a column to the table, in the Fields pane, check the Sales | Sales field.
vii. Notice that the table visually lists four product categories, and that the sales value is the
same for each, and the same for the total.
viii. On the Modeling ribbon tab, from inside the Relationships group, click Manage Relationships.
ix. In the Manage Relationships window, notice that no relationships are yet defined.
x. To create a relationship, click New.
xi. In the Create Relationship window, in the first dropdown list, select the Product table.
xii.Select other appropriate settings like cardinality, filter, etc.
xiii.Click OK.
xiv. In the Manage Relationships window, notice that the new relationship is listed, and
then click Close.
xv. In the report, notice that the table visual has been updated to display different values for
each product category.
xvi. Filters applied to the Product table now propagate to the Sales table
xvii. Switch to Model view, and then notice there is now a connector between the two tables.
xviii. To create a new relationship, from the Reseller table, drag the ResellerKey column on to the
ResellerKey column of the Sales table.
xix. Create other relationships as required.
xx. Save the Power BI Desktop files.
d. Create Hierarchy
i. In Model view, in the Fields pane, if necessary, expand the Product table.
ii. To create a hierarchy, in the Fields pane, right-click the Category column, and then
select Create Hierarchy.
iii. In the Properties pane (to the left of the Fields pane), in the Name box, replace the
text with Products.
iv. To add the second level to the hierarchy, in the Hierarchy dropdown list, select Subcategory.
v. To add the third level to the hierarchy, in the Hierarchy dropdown list, select Product.
vi. To complete the hierarchy design, click Apply Level Changes.
Theory:-
Data visualization within the context of Business Intelligence (BI) is intricately tied to the Extract,
Transform, Load (ETL) process, contributing to the effective communication of insights. Theoretical
foundations guide this process:
1. Data Extraction (Extract): In the initial phase of ETL, data is sourced from diverse
systems. Theoretical principles of data compatibility and integration inform decisions on
extracting relevant information from structured and unstructured sources, ensuring data
consistency.
2. Data Transformation (Transform): The transformation phase involves shaping and structuring
the data. Theoretical aspects of data cleaning, normalization, and enrichment are applied,
aligning with principles from relational database theory. Transformation also encompasses the
preparation of data for optimal visual representation.
3. Data Loading (Load): Loaded data is organized within BI systems, often adopting
principles from data modeling theories. A well-structured data model facilitates efficient
querying and supports the creation of meaningful visualizations.
4. Visualization Design: Theoretical concepts from visual perception and cognitive psychology
guide the design of data visualizations. Principles like Edward Tufte's data-ink ratio, color
theory, and Gestalt principles are employed to create visually appealing and informative
dashboards and reports.
5. Interactivity and User Experience: Theoretical underpinnings of human-computer interaction
influence the incorporation of interactive elements. Users can explore data dynamically,
allowing for a more intuitive and engaging experience, aligning with usability and user
experience design theories.
Procedure:-
In power BI Desktop open the Retail analysis sample PBIX file in the report view and select EDIT
There are various methods available in Power BI for visualization, these include :-
There are various methods available in Power BI for visualization, these include :-
Area Chart
Doughnut chart
1>On the Visualizations pane, select the icon for doughnut chart to convert your bar
chart to a doughnut chart. If Last Year Sales isn't in the Values section of the
Visualizations pane, drag it there.
2>Select Item > Category to add it to the Legend area of the Visualizations pane.
Cards
1>On the Data pane, expand Store and select the Open Store Count checkbox. By default,
Power BI creates a clustered column chart with a single data value.
You can convert the chart to a card visualization.
7>In the upper-right corner of the visual, select the More options ellipsis (...) and select
Sort axis >FiscalMonth.
8>From the Fields pane, drag Sales > Last Year Sales to the Line y-axis bucket.
9>Combo chart should look something like this:
EXPERIMENT-5
Software Used:-Python
Theory:-
In Business Intelligence (BI), classification algorithms play a crucial role in analyzing and categorizing
data to support data-driven decision-making. These algorithms are rooted in machine learning,
specifically supervised learning, where they learn from labeled historical data to predict the category of
new, unseen data.
1. Training Phase: The classification algorithm is initially trained using a labeled dataset,
learning the associations between input features and specific categories. This process is
grounded in statistical and probabilistic concepts, aiming to identify meaningful patterns and
correlations in the data.
2. Feature Selection and Engineering: Principles of feature selection and engineering are
applied to enhance the model’s capacity to differentiate between classes. Important features are
selected based on their relevance, following information theory and data dimensionality
reduction principles.
3. Model Construction: The algorithm develops a predictive model, often utilizing theoretical
frameworks such as decision trees, support vector machines, or neural networks. These
models embody the learned associations and are used for classifying new data.
4. Evaluation and Validation: Statistical concepts guide the evaluation and validation of
the classification model. Metrics such as precision, recall, and accuracy measure the
model’s effectiveness and ability to generalize to new data.
5. Deployment in BI: Once trained and validated, the classification model is deployed within BI
systems to classify and categorize incoming data. This application of machine learning
enhances BI capabilities, providing insights in areas like customer segmentation, fraud
detection, risk assessment, and other classification-focused business applications.
Procedure:-
import pandas as pd
import matplotlib.pyplot as plt
# Create a pandas DataFrame with a DatetimeIndex to represent the time series data
start_date = "2012-01-01"
end_date = "2013-01-01" # Adjusted to include December 2012
date_range = pd.date_range(start=start_date, end=end_date, freq='M')
rainfall_df = pd.DataFrame(rainfall, index=date_range, columns=["Rainfall"])
Output:-
Rainfall
2012-01-31 799.0
2012-02-29 1174.8
2012-03-31 865.1
2012-04-30 1334.6
2012-05-31 635.4
2012-06-30 918.5
2012-07-31 685.5
2012-08-31 998.6
2012-09-30 784.2
2012-10-31 985.0
2012-11-30 882.8
2012-12-31 1071.0
EXPERIMENT-6
Theory:-A Decision Tree is a supervised learning algorithm useful for both classification and regression
tasks. It visually represents a decision-making process, making it particularly intuitive and easy to
interpret. Key features of Decision Trees are outlined below:
1. Tree Structure: A Decision Tree is made up of nodes, branches, and leaves. Each node signifies
a decision or test on an attribute, each branch indicates an outcome of that decision, and each
leaf represents a class label (in classification) or a predicted value (in regression).
2. Recursive Partitioning: The process of constructing a Decision Tree involves recursively
splitting the dataset into subsets based on the values of the most informative attributes, aiming
to produce subsets that are as homogenous as possible in terms of the target variable.
3. Attribute Selection: At each node, the algorithm chooses the attribute that provides the optimal
split, typically using metrics such as Information Gain (for classification) or Mean Squared
Error (for regression).
4. Pruning: Decision Trees may overfit by capturing noise in the data. Pruning is a technique
used to reduce overfitting by removing branches that do not significantly enhance the tree's
performance on a validation dataset.
5. Interpretability: A major advantage of Decision Trees is their interpretability. The
decision-making process is easily visualized, making them highly valuable for explaining model
predictions.
Procedure:-
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, plot_tree
Output:-
EXPERIMENT-7
Theory:-K-means clustering is a widely used unsupervised machine learning algorithm for dividing a
dataset into distinct, non-overlapping groups or clusters. The algorithm's goal is to group similar data
points closely while keeping different clusters as distinct as possible. It works through an iterative
process, adjusting cluster centroids (representative points) until the solution converges. In Business
Intelligence (BI), K-means clustering has several valuable applications:
Procedure:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
for i in range(k):
plt.scatter(X[cluster_assignments == i, 0], X[cluster_assignments == i, 1],
color=colors[i], marker=markers[i], label=f'Cluster {i + 1}')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.title('K-means Clustering (k=3)')
plt.show()
Output:-
Plot the clusters and their centers.
EXPERIMENT -8
Theory:
Tracking employee productivity is essential for optimizing workplace efficiency and identifying areas of
improvement. Business Intelligence (BI) systems can analyse employee performance data to extract
meaningful insights and trends. This experiment demonstrates the use of Python to visualize and analyse
productivity metrics such as task completion, working hours, and efficiency.
Applications in BI:
1. Performance Analysis: Identify high-performing employees and areas where support is needed.
2. Task Prioritization: Analyse which tasks consume the most time to optimize workflows.
3. Resource Allocation: Ensure resources are distributed effectively to maximize productivity.
4. Employee Engagement: Understand patterns in engagement and motivation based on productivity
trends.
Procedure:
# Importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Sample dataset: Employee productivity data
data = {
'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Hours Worked': [40, 35, 45, 50, 38],
'Tasks Completed': [10, 8, 12, 15, 9],
'Efficiency (%)': [90, 85, 92, 88, 87]
}
# Creating a DataFrame
df = pd.DataFrame(data)
# Displaying the dataset
print("Employee Productivity Data:")
print(df)
# Visualizing the data: Bar chart for Hours Worked vs Tasks Completed
plt.figure(figsize=(10, 6))
plt.bar(df['Employee'], df['Hours Worked'], color='blue', alpha=0.7, label='Hours Worked')
plt.bar(df['Employee'], df['Tasks Completed'], color='green', alpha=0.7, label='Tasks Completed')
plt.xlabel('Employee')
plt.ylabel('Count')
plt.title('Hours Worked vs Tasks Completed')
plt.legend()
plt.show()
# Scatter plot: Hours Worked vs Efficiency
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x='Hours Worked', y='Efficiency (%)', hue='Employee', s=100)
plt.title('Employee Efficiency Analysis')
plt.xlabel('Hours Worked')
plt.ylabel('Efficiency (%)')
plt.show()
# Heatmap: Correlation analysis
plt.figure(figsize=(6, 4))
correlation = df[['Hours Worked', 'Tasks Completed', 'Efficiency (%)']].corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Analysis')
plt.show()
# Insights: Identifying the most and least productive employee
most_productive = df.loc[df['Efficiency (%)'].idxmax()]
least_productive = df.loc[df['Efficiency (%)'].idxmin()]
print(f"Most Productive Employee: {most_productive['Employee']} (Efficiency: {most_productive['Efficiency
(%)']}%)")
print(f"Least Productive Employee: {least_productive['Employee']} (Efficiency: {least_productive['Efficiency
(%)']}%)")
Output: