Unit 4_BI
Unit 4_BI
• Data analytics help a business optimize its performance, perform more efficiently, maximize
profit, or make more strategically-guided decisions.
• The techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
• Data analytics relies on a variety of software tools including spreadsheets, data visualization,
reporting tools, data mining programs, and open-source languages for the greatest data
manipulation.
Data Analysis Steps
1.The first step is to determine the data requirements or how the data is grouped. Data may
be separated by age, demographic, income, or gender. Data values may be numerical or
divided by category.
2.The second step in data analytics is the process of collecting it. This can be done through a
variety of sources such as computers, online sources, cameras, environmental sources, or
through personnel.
3.The data must be organized after it's collected so it can be analyzed. This may take place
on a spreadsheet or other form of software that can take statistical data.
4.The data is then cleaned up before analysis. It's scrubbed and checked to ensure that
there's no duplication or error and that it is not incomplete. This step helps correct any
errors before it goes on to a data analyst to be analyzed.
DATA ANALYTICS TYPES
Types of Data Analytics
2.Diagnostic analytics: This focuses more on why something happened. It involves more
diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales? Did that
latest marketing campaign impact sales?
3.Predictive analytics: This moves to what is likely going to happen in the near term. What
happened to sales the last time we had a hot summer? How many weather models predict
a hot summer this year?
4.Prescriptive analytics: This suggests a course of action. We should add an evening shift to
the brewery and rent an additional tank to increase output if the likelihood of a hot
summer is measured as an average of these five weather models and the average is above
58%.
Data Analytics Techniques
Data analysts can use several analytical methods and techniques to process data and extract information. Some of
the most popular methods include:
• Regression analysis entails analyzing the relationship between dependent variables to determine how a change
in one may affect the change in another.
• Factor analysis entails taking a large data set and shrinking it into a smaller data set. The goal of this maneuver is
to attempt to discover hidden trends that would otherwise have been more difficult to see.
• Cohort analysis is the process of breaking a data set into groups of similar data, often into a customer
demographic. This allows data analysts and other users of data analytics to further dive into the numbers relating
to a specific subset of data.
• Monte Carlo simulations model the probability of different outcomes happening. They're often used for risk
mitigation and loss prevention. These simulations incorporate multiple values and variables and often have
greater forecasting capabilities than other data analytics approaches.
• Time series analysis tracks data over time and solidifies the relationship between the value of a data point and
the occurrence of the data point. This data analysis technique is usually used to spot cyclical trends or to project
financial forecasts.
The Role of Data Analytics
Types of Analytical Techniques in BI-Descriptive
Business Intelligence (BI) involves the use of various analytical techniques to help organizations gain
insights from their data. Descriptive analytics, the first stage of the analytics process, focuses on
summarizing historical data to provide a clear picture of what has happened. Here are some common types
of analytical techniques used in descriptive analytics within BI:
Reporting:
Reporting is the foundation of descriptive analytics. It involves presenting data in a structured format,
often in the form of tables, charts, and graphs. Reporting tools like dashboards and scorecards are used
to provide a snapshot of key performance indicators (KPIs) and historical data.
Data Visualization:
Data visualization techniques use charts, graphs, heatmaps, and other graphical representations to make
data more understandable and accessible. Examples include bar charts, line charts, pie charts, and
scatter plots.
1.Data collection: Collecting data from various sources such as sales reports,
customer surveys, social media, etc.
3.Exploratory data analysis: Analyzing the data to find trends, patterns, and
relationships.
4.Data visualization: Creating graphs and charts to visualize the data and
make it easy to understand.
How is Descriptive Analytics Used in BI?
There are several uses for the metrics generated by descriptive analytics, including:
1.Reports: Descriptive analytics is used to provide the primary financial indicators found
in a company's financial statements. Descriptive analytics are often used in other
typical reports to emphasize specific areas of business performance.
3.Dashboards: Dashboards are a tool that executives, managers, and other staff
members can use to monitor progress and organize their daily workload. Dashboards
offer a selection of KPIs and other crucial data that are catered to the needs of each
individual. To help people quickly digest the information, it may be presented as charts
or other visualizations.
The Five Steps Descriptive BI Involves
• Social media, one of the most significant data sources in the current
age, has enormous potential for this type of analytics that can provide
insights and predictions on human behavior. Predictive analytics on
social media can help companies learn about user behavior, make
informed decisions, and boost growth.
How Predictive Analytics Works in Social Media
The iris data are a data frame of 150 measurements of iris petal and
sepal lengths and widths, with 50 measurements for each species of
“setosa,” “versicolor,” and “virginica.” Let us assume that we are doing
some computation on the sepal length.
Three classes in Iris Dataset
There are three classes in this data set (Setosa, Versicolor and Virginica),
each having 50 patterns with four features (sepal length, sepal width,
petal length and petal width). One of the classes (viz. Setosa) is linearly
separable from the other two, while the remaining two are not linearly
separable.
Objective of Iris Dataset
The IRIS dataset covers three classes of flowers: Versicolor, Setosa, and
Virginica, with four features each: 'sepal length', 'sepal width,' 'petal
length,' and 'petal width. ' This IRIS dataset project aims to predict flowers
based on their unique characteristics.
Output:
df.shape
Output:
(150, 6)
# We can see that the dataframe contains 6 columns and 150 rows.
df.info()
Output:
# We can see that only one column has categorical data and all the other columns
are of the numeric type with non-Null entries.
The describe() function applies basic statistical computations on the dataset like extreme
values, count of data points standard deviation, etc. Any missing value or NaN value is
automatically skipped. describe() function gives a good picture of the distribution of data.
df.describe()
Output:
# We can see the count of each column along with their mean value, standard deviation, minimum and maximum values.
# Checking Missing Values We will check if our data contains any missing
values or not. Missing values can occur when no information is provided for
one or more items or for a whole unit. We will use the isnull() method.
df.isnull().sum()
Output:
Data Visualization
Visualizing the target column
• Our target column will be the Species column because at the end we will
need the result according to the species only. Let’s see a count plot for
species.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()