0% found this document useful (0 votes)
27 views

Unit 4_BI

Uploaded by

itskanishka1202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Unit 4_BI

Uploaded by

itskanishka1202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIT-4

BI – DATA ANALYTICS PROCESS


DATA ANALYTICS
Data analytics is the science of analyzing raw data to make conclusions
about information. Many of the techniques and processes of data
analytics have been automated into mechanical processes and
algorithms that work over raw data for human consumption.
Key Takeaways about Data Analytics
• Data analytics is the science of analyzing raw data to make conclusions about that
information.

• Data analytics help a business optimize its performance, perform more efficiently, maximize
profit, or make more strategically-guided decisions.

• The techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.

• Various approaches to data analytics include looking at what happened (descriptive


analytics), why something happened (diagnostic analytics), what is going to happen
(predictive analytics), or what should be done next (prescriptive analytics).

• Data analytics relies on a variety of software tools including spreadsheets, data visualization,
reporting tools, data mining programs, and open-source languages for the greatest data
manipulation.
Data Analysis Steps

The process involved in data analysis involves several steps:

1.The first step is to determine the data requirements or how the data is grouped. Data may
be separated by age, demographic, income, or gender. Data values may be numerical or
divided by category.

2.The second step in data analytics is the process of collecting it. This can be done through a
variety of sources such as computers, online sources, cameras, environmental sources, or
through personnel.

3.The data must be organized after it's collected so it can be analyzed. This may take place
on a spreadsheet or other form of software that can take statistical data.

4.The data is then cleaned up before analysis. It's scrubbed and checked to ensure that
there's no duplication or error and that it is not incomplete. This step helps correct any
errors before it goes on to a data analyst to be analyzed.
DATA ANALYTICS TYPES
Types of Data Analytics

Data analytics is broken down into four basic types:


1.Descriptive analytics: This describes what has happened over a given period of time. Have
the number of views gone up? Are sales stronger this month than last?

2.Diagnostic analytics: This focuses more on why something happened. It involves more
diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales? Did that
latest marketing campaign impact sales?

3.Predictive analytics: This moves to what is likely going to happen in the near term. What
happened to sales the last time we had a hot summer? How many weather models predict
a hot summer this year?

4.Prescriptive analytics: This suggests a course of action. We should add an evening shift to
the brewery and rent an additional tank to increase output if the likelihood of a hot
summer is measured as an average of these five weather models and the average is above
58%.
Data Analytics Techniques

Data analysts can use several analytical methods and techniques to process data and extract information. Some of
the most popular methods include:

• Regression analysis entails analyzing the relationship between dependent variables to determine how a change
in one may affect the change in another.

• Factor analysis entails taking a large data set and shrinking it into a smaller data set. The goal of this maneuver is
to attempt to discover hidden trends that would otherwise have been more difficult to see.

• Cohort analysis is the process of breaking a data set into groups of similar data, often into a customer
demographic. This allows data analysts and other users of data analytics to further dive into the numbers relating
to a specific subset of data.

• Monte Carlo simulations model the probability of different outcomes happening. They're often used for risk
mitigation and loss prevention. These simulations incorporate multiple values and variables and often have
greater forecasting capabilities than other data analytics approaches.

• Time series analysis tracks data over time and solidifies the relationship between the value of a data point and
the occurrence of the data point. This data analysis technique is usually used to spot cyclical trends or to project
financial forecasts.
The Role of Data Analytics
Types of Analytical Techniques in BI-Descriptive
Business Intelligence (BI) involves the use of various analytical techniques to help organizations gain
insights from their data. Descriptive analytics, the first stage of the analytics process, focuses on
summarizing historical data to provide a clear picture of what has happened. Here are some common types
of analytical techniques used in descriptive analytics within BI:

Reporting:
Reporting is the foundation of descriptive analytics. It involves presenting data in a structured format,
often in the form of tables, charts, and graphs. Reporting tools like dashboards and scorecards are used
to provide a snapshot of key performance indicators (KPIs) and historical data.

Data Visualization:
Data visualization techniques use charts, graphs, heatmaps, and other graphical representations to make
data more understandable and accessible. Examples include bar charts, line charts, pie charts, and
scatter plots.

Key Performance Indicators (KPIs):


KPIs are specific, measurable metrics that are critical to an organization's performance. Descriptive
analytics often involves tracking and reporting KPIs to assess historical performance
Drill-Down Analysis:
Drill-down analysis allows users to start with high-level data and progressively dig deeper into more
detailed data. This is useful for investigating specific issues or understanding the contributing factors behind
certain trends.
Slice and Dice:
This technique involves breaking down data into smaller, more manageable subsets. Users can "slice" data
by selecting a single variable or dimension and "dice" it by selecting multiple variables to get a more detailed
view of the data.
Data Aggregation:
Data aggregation involves summarizing large datasets to provide an overview. Common aggregation
methods include sum, average, count, and maximum/minimum values.
Trend Analysis:
Trend analysis identifies patterns and changes in data over time. Time-series analysis is a common method
used to understand historical trends and predict future patterns.
Cohort Analysis:
Cohort analysis groups data into segments or cohorts based on common characteristics, such as customer
acquisition date or demographics. It helps identify how different groups perform over time.
Data Mining:
Data mining techniques help uncover hidden patterns and relationships in data. This can include
association analysis, clustering, and classification to identify historical trends or correlations.
Descriptive Analytics in BI

There are four main steps in descriptive analytics:

1.Data collection: Collecting data from various sources such as sales reports,
customer surveys, social media, etc.

2.Data preparation: Cleaning and organizing the data so it can be analyzed.

3.Exploratory data analysis: Analyzing the data to find trends, patterns, and
relationships.

4.Data visualization: Creating graphs and charts to visualize the data and
make it easy to understand.
How is Descriptive Analytics Used in BI?

There are several uses for the metrics generated by descriptive analytics, including:

1.Reports: Descriptive analytics is used to provide the primary financial indicators found
in a company's financial statements. Descriptive analytics are often used in other
typical reports to emphasize specific areas of business performance.

2.Visualizations: Metrics can be more effectively communicated to a larger audience by


being displayed in charts and other graphic forms.

3.Dashboards: Dashboards are a tool that executives, managers, and other staff
members can use to monitor progress and organize their daily workload. Dashboards
offer a selection of KPIs and other crucial data that are catered to the needs of each
individual. To help people quickly digest the information, it may be presented as charts
or other visualizations.
The Five Steps Descriptive BI Involves

Step 1: Define business metrics.


Step 2: Identify data required.
Step 3: Extract and preprocess data.
Step 4: Data Analysis.
Step 5: Present data.
• Measures of Frequency.
• Measures of Central Tendency.
• Measures of Dispersion.
Predictive Social Media Analytics
Predictive Analytics in Social
Media
• Predictive analytics is the application of statistical algorithms, data,
and machine learning techniques to predict the probabilities of future
outcomes based on past data.

• Social media, one of the most significant data sources in the current
age, has enormous potential for this type of analytics that can provide
insights and predictions on human behavior. Predictive analytics on
social media can help companies learn about user behavior, make
informed decisions, and boost growth.
How Predictive Analytics Works in Social Media

• Predictive analytics for social media is based on gathering and


analyzing huge quantities of data that users generate. The data
comprises user activities like comments, shares, likes comments,
posts, and likes, along with demographic information such as gender,
age, location, and other preferences.

• The data collected is examined and processed by algorithms for


machine learning to detect patterns and connections between the
various variables. The algorithms create predictions based on the
information gathered from the analysis.
Benefits of Using Predictive Analytics in Social Media
Exploratory Data Analysis (EDA) is a technique to analyze data using
some visual Techniques. With this technique, we can get detailed
information about the statistical summary of the data. We will also be
able to deal with the duplicates values, outliers, and also see some
trends or patterns present in the dataset.

Iris Dataset is considered as the Hello World for data science. It


contains five columns namely – Petal Length, Petal Width, Sepal Length,
Sepal Width, and Species Type. Iris is a flowering plant, the researchers
have measured various features of the different iris flowers and
recorded them digitally.
Exploratory Data Analysis : Iris Dataset

The iris data are a data frame of 150 measurements of iris petal and
sepal lengths and widths, with 50 measurements for each species of
“setosa,” “versicolor,” and “virginica.” Let us assume that we are doing
some computation on the sepal length.
Three classes in Iris Dataset
There are three classes in this data set (Setosa, Versicolor and Virginica),
each having 50 patterns with four features (sepal length, sepal width,
petal length and petal width). One of the classes (viz. Setosa) is linearly
separable from the other two, while the remaining two are not linearly
separable.
Objective of Iris Dataset

The IRIS dataset covers three classes of flowers: Versicolor, Setosa, and
Virginica, with four features each: 'sepal length', 'sepal width,' 'petal
length,' and 'petal width. ' This IRIS dataset project aims to predict flowers
based on their unique characteristics.

Machine learning is about prediction on unseen data or testing data and a


set of algorithms are required to perform task on machine learning. There
are three types of machine learning are called as Supervised, Unsupervised
and Reinforcement learning. We have taken the iris dataset and used K-
Nearest Neighbors (KNN) classification Algorithm. The purpose of is build
the model that is able to automatically recognize the iris species. Tools
used for this are Numpy, Pandas, Matplotlib and machine learning library.
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()

Output:
df.shape

Output:

(150, 6)
# We can see that the dataframe contains 6 columns and 150 rows.

df.info()
Output:

# We can see that only one column has categorical data and all the other columns
are of the numeric type with non-Null entries.
The describe() function applies basic statistical computations on the dataset like extreme
values, count of data points standard deviation, etc. Any missing value or NaN value is
automatically skipped. describe() function gives a good picture of the distribution of data.
df.describe()

Output:

# We can see the count of each column along with their mean value, standard deviation, minimum and maximum values.
# Checking Missing Values We will check if our data contains any missing
values or not. Missing values can occur when no information is provided for
one or more items or for a whole unit. We will use the isnull() method.
df.isnull().sum()
Output:
Data Visualization
Visualizing the target column
• Our target column will be the Species column because at the end we will
need the result according to the species only. Let’s see a count plot for
species.

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()

You might also like