0% found this document useful (0 votes)

34 views

Data analytics_1

Data analytics is the science of analyzing raw data to draw conclusions and optimize business performance through automated techniques and algorithms. It encompasses various types of analysis, including descriptive, diagnostic, predictive, and prescriptive analytics, and is utilized across multiple sectors such as healthcare, retail, and hospitality. The document also distinguishes between data analytics and data analysis, discusses different modeling types, and highlights the importance of evaluation metrics in machine learning.

Uploaded by

Sandeep Venupure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Data analytics_1

Uploaded by

Sandeep Venupure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter 1

Data Analytics
What Is Data Analytics?
Data analytics is the science of analyzing raw data to make conclusions about that information.
Many of the techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
KEY POINTS
 Data analytics is the science of analyzing raw data to make conclusions about that information.
 The techniques and processes of data analytics have been automated into mechanical processes
and algorithms that work over raw data for human consumption.
 Data analytics help a business optimize its performance.
Understanding Data Analytics
Data analytics is a broad term that encompasses many diverse types of data analysis. Any type of
information can be subjected to data analytics techniques to get insight that can be used to
improve things. Data analytics techniques can reveal trends and metrics that would otherwise be
lost in the mass of information. This information can then be used to optimize processes to
increase the overall efficiency of a business or system.
For example, manufacturing companies often record the runtime, downtime, and work queue for
various machines and then analyze the data to better plan the workloads so the machines operate
closer to peak capacity.
Data analytics can do much more than point out bottlenecks in production. Gaming companies
use data analytics to set reward schedules for players that keep the majority of players active in
the game. Content companies use many of the same data analytics to keep you clicking,
watching, or re-organizing content to get another view or another click.
Data analytics is important because it helps businesses optimize their performances.
Implementing it into the business model means companies can help reduce costs by identifying
more efficient ways of doing business and by storing large amounts of data. A company can also
use data analytics to make better business decisions and help analyze customer trends and
satisfaction, which can lead to new—and better—products and services.
Data Analysis Steps
The process involved in data analysis involves several different steps:
1. The first step is to determine the data requirements or how the data is grouped. Data may be
separated by age, demographic, income, or gender. Data values may be numerical or be divided
by category.
2. The second step in data analytics is the process of collecting it. This can be done through a
variety of sources such as computers, online sources, cameras, environmental sources, or
through personnel.
3. Once the data is collected, it must be organized so it can be analyzed. This may take place on a
spreadsheet or other form of software that can take statistical data.
4. The data is then cleaned up before analysis. This means it is scrubbed and checked to ensure
there is no duplication or error, and that it is not incomplete. This step helps correct any errors
before it goes on to a data analyst to be analyzed.

Types of Data Analytics

Data analytics is broken down into basic types.
1. Descriptive analytics: This describes what has happened over a given period of time. Have
the number of views gone up? Are sales stronger this month than last?
2. Diagnostic analytics: This focuses more on why something happened. This involves more
diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales? Did that latest
marketing campaign impact sales?
3. Predictive analytics: This moves to what is likely going to happen in the near term. What
happened to sales the last time we had a hot summer? How many weather models predict a hot
summer this year?
4. Prescriptive analytics: This suggests a course of action. If the likelihood of a hot summer is
measured as an average of these five weather models is above 58%, we should add an evening
shift to the brewery and rent an additional tank to increase output.
5. Mechanistic (most amount of effort): Understand the exact changes in variables
that lead to changes in other variables for individual objects.
6. Exploratory Data Analysis (EDA) is an approach to analyze the data using visual techniques. It is
used to discover trends, patterns, or to check assumptions with the help of statistical summary
and graphical representations.
Data analytics underpins many quality control systems in the financial world, including
the ever-popular Six Sigma program. If you aren’t properly measuring something—
whether it's your weight or the number of defects per million in a production line—it is
nearly impossible to optimize it.
Some of the sectors that have adopted the use of data analytics include the travel and hospitality
industry, where turnarounds can be quick. This industry can collect customer data and figure out
where the problems, if any, lie and how to fix them.
Healthcare combines the use of high volumes of structured and unstructured data and uses data
analytics to make quick decisions. Similarly, the retail industry uses copious amounts of data to
meet the ever-changing demands of shoppers. The information retailers collect and analyze can
help them identify trends, recommend products, and increase profits.
Benefits of Data Analytics

1. Decision making improves

Companies may use the information they obtain from data analytics to guide their decisions,
leading to improved results. Data analytics removes a lot of guesswork from preparing
marketing plans, deciding what material to make, creating goods, and more. With advanced
data analytics technologies, new data can be constantly gathered and analyzed to enhance your
understanding of changing circumstances.

2. Marketing becomes more effective

When businesses understand their customers better, they will be able to sell to them more
efficiently. Data analytics also gives businesses invaluable insights into how their marketing
campaigns work so that they can fine-tune them for better results.

3. Customer service improves

Data analytics provides businesses with deeper insight into their clients, helping them to
customize customer experience to their needs, offer more customization, and create better
relationships with them.

4. The efficiency of operations increases

Data analytics will help businesses streamline their operations, save resources, and improve the
bottom line. When businesses obtain a better idea of what the audience needs, they spend less
time producing advertisements that do not meet the desires of the audience.

Who Is Using Data Analytics?

Data analytics has been adopted by several sectors, such as the travel and hospitality industry,
where turnarounds can be quick. This industry can collect customer data and figure out where the
problems, if any, lie and how to fix them. Healthcare is another sector that combines the use of
high volumes of structured and unstructured data and data analytics can help in making quick
decisions. Similarly, the retail industry uses copious amounts of data to meet the ever-changing
demands of shoppers.
Data Analytics vs Data Analysis
Data analysis, data analytics. Two terms for the same concept? Or different, but related,
terms?

It’s a common misconception that data analysis and data analytics are the same thing.
The generally accepted distinction is:

 Data analytics is the broad field of using data and tools to make business
decisions.
 Data analysis, a subset of data analytics, refers to specific actions.
To explain this confusion—and attempt to clear it up—we’ll look at both terms,
examples, and tools.

What is data analysis?

Consider data analysis one slice of the data analytics pie. Data analysis consists of
cleaning, transforming, modeling, and questioning data to find useful information. (It’s
generally agreed that other slices are other activities, from collection to storage to
visualization.)

The act of data analysis is usually limited to a single, already prepared dataset. You’ll
inspect, arrange, and question the data. Today, in the 2020s, a software or “machine”
usually does a first round of analysis, often directly in one of your databases or tools.
But this is augmented by a human who investigates and interrogates the data with
more context.

When you’re done analyzing a dataset, you’ll turn to other data analytics activities to:

 Give others access to the data

 Present the data (ideally with data visualization or storytelling)
 Suggest actions to take based on the data

Which is better?
Brack Nelson, Marketing Manager at Incrementors SEO Services, suggests that the
outcome of data analytics is more encompassing and beneficial than the output of data
analysis alone.

Mathematical model
A mathematical model is an abstract model that uses mathematical language
to describe the behavior of a system.
Mathematical modelling is the process of describing a real world problem in
mathematical terms, usually in the form of equations, and then using these equations
both to help understand the original problem, and also to discover new features about
the problem.

Suppose you are building a rectangular sandbox for your neighbor's toddler to play in,
and you have two options available based on the building materials you have. The
sandbox can have a length of 8 feet and a width of 5 feet, or a length of 7 feet and a
width of 6 feet, and you want the sandbox to have as large an area as possible. In other
words, you want to determine which dimensions will result in the larger area of a
rectangle. Thankfully, in mathematics, we have a formula for the area (A) of a rectangle
based on its length (l) and width (w).

 A=l×w
Awesome! We can use this formula to figure out which dimensions will make a bigger
sandbox!
We can calculate the two areas by plugging in our lengths and widths for each choice:

5. A1 = 8 × 5 = 40 square feet
6. A2 = 7 × 6 = 42 square feet
We see that a length of 7 feet and a width of 6 feet will result in the larger area of 42
square feet. Problem solved!

Mathematical models are of different types

Linear Models
A linear model is an equation that describes a relationship between two quantities that
show a constant rate of change.

Example
The table below shows the cost of an ice cream cone yy with xx toppings. Write an
equation that models the relationship between xx and y.y.

Toppings x Cost y
x y

2 $3.50

3 $3.75

5 $4.25

Answer
Adding one additional topping costs
If an ice cream cone with two toppings costs $3.50 each topping costs $0.25, then a cone
without any toppings must cost $3.00. Therefore, the rate of change is 0.25, the initial value
is 3, and y=0.25x + 3.
x is number of topping

Non Linear Models

If a regression equation doesn’t follow the rules for a linear model, then it must be a nonlinear
model. It’s that simple! A nonlinear model is literally not linear.

A nonlinear model describes nonlinear relationships in experimental data.

Nonlinear regression models are generally assumed to be parametric, where the
model is described as a nonlinear equation.
Example
The nonlinear regression example below models the relationship between density and electron
mobility.

The equation for the nonlinear regression analysis is too long for the fitted line plot:

Electron Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 *

Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density
Ln^3)

Empirical models
Empirical models are only supported by experimental data.
Empirical modelling is a generic term for activities that create models
by observation and experiment. It relies on observation rather than their theory.
Example
Mechanistic Model
A mechanistic model uses a theory to predict what will happen in the real
world. The alternative approach, empirical modeling, studies real-world
events to develop a theory.
Mechanistic models are useful if you have good data for making predictions.
For example, if you're designing a new plane, there's lots of information
on how plane design affects the plane's interaction with air pressure, wind
speed and gravity. You'd want to make some empirical tests before taking
passengers aboard, but mechanistic models can give you a good start.
Deterministic models
Deterministic models assume there's no variation in results. which does not
have any probabilistic (random) elements. Its output is determined when the set of
inputs and relationships in the model have been specified.
A deterministic model assumes certainty in all aspects.
Example, the conversion between Celsius and Kelvin is deterministic,
because the formula is not random…it is an exact formula that will always
give you the correct answer (assuming you perform the calculations
correctly):
Kelvin = Celsius + 273.15.
Stochastic Models
For a model to be stochastic, it must have a random variable where a level of
uncertainty exists. Due to the uncertainty present in a stochastic model, the results
provide an estimate of the probability of various outcomes.

Example
Stochastic investment models attempt to forecast the variations of prices,
returns on assets (ROA), and asset classes—such as bonds and stocks—
over time.
Black Box Model
In science, computing, and engineering, a black box is a device, system, or
object which produces useful information without revealing any information
about its internal workings. The explanations for its conclusions remain
opaque or “black.”
Financial analysts, hedge fund managers, and investors may use software
that is based on a black-box model in order to transform data into a useful
investment strategy.
Black box is shorthand for models that are sufficiently complex that they are
not straightforwardly interpretable to humans.
 A black box model receives inputs and produces outputs but its
workings are unknowable.
 Black box models are increasingly used to drive decision-making in the
financial markets.
 Technology advances, particularly in machine learning capabilities,
make it impossible for a human mind to analyze or understand
precisely how black box models produce their conclusions.
Machine learning techniques that have greatly contributed to the growth and
sophistication of black box models are closely related, particularly relevant to
machine learning.

Descriptive modeling

Descriptive modeling is a mathematical process that describes real-world events and the
relationships between factors responsible for them. The process is used by consumer-
driven organizations to help them target their marketing and advertising efforts.

In descriptive modeling, customer groups are clustered according to demographics,

purchasing behavior, expressed interests and other descriptive factors. Statistics can
identify where the customer groups share similarities and where they differ. The most active
customers get special attention because they offer the greatest ROI (return on investment).

The main aspects of descriptive modeling include:

 Customer segmentation: Partitions a customer base into groups with various

impacts on marketing and service.
 Value-based segmentation: Identifies and quantifies the value of a customer to
the organization.
 Behavior-based segmentation: Analyzes customer product usage and
purchasing patterns.
 Needs-based segmentation: Identifies ways to capitalize on motives that drive
customer behavior.

Descriptive modeling can help an organization to understand its customers, but predictive
modeling is necessary to facilitate the desired outcomes.

Evaluation Metrics
Introduction
Evaluation metrics are tied to machine learning tasks. There are
different metrics for the tasks of classification and regression. Some
metrics, like precision-recall, are useful for multiple tasks.
Classification and regression are examples of supervised learning,
which constitutes a majority of machine learning applications. Using
different metrics for performance evaluation, we should be able to
improve our model’s overall predictive power before we roll it out for
production on unseen data. Without doing a proper evaluation of the
Machine Learning model by using different evaluation metrics, and
only depending on accuracy, can lead to a problem when the
respective model is deployed on unseen data and may end in poor
predictions.
In the next section, I’ll discuss the Classification evaluation metrics
that could help in the generalization of the ML classification model.
Classification Metrics
Classification is about predicting the class labels given input data. In
binary classification, there are only two possible output classes(i.e.,
Dichotomy). In multiclass classification, more than two possible
classes can be present. I’ll focus only on binary classification.

A very common example of binary classification is spam detection,

where the input data could include the email text and metadata
(sender, sending time), and the output label is either “spam” or “not
spam.” (See Figure) Sometimes, people use some other names also
for the two classes: “positive” and “negative,” or “class 1” and
“class 0.”

Figure — Email spam detection is a binary classification problem (source: From Book — Evaluating Machine Learning Model — O’Reilly)

There are many ways for measuring classification performance.

Accuracy, confusion matrix, log-loss, and AUC-ROC are some of the
most popular metrics. Precision-recall is a widely used metrics for
classification problems.
Accuracy
Accuracy simply measures how often the classifier correctly
predicts. We can define accuracy as the ratio of the number of
correct predictions and the total number of predictions.

When any model gives an accuracy rate of 99%, you might think
that model is performing very good but this is not always true and
can be misleading in some situations. I am going to explain this with
the help of an example.

Consider a binary classification problem, where a model can achieve

only two results, either model gives a correct or incorrect prediction.
Now imagine we have a classification task to predict if an image is a
dog or cat as shown in the image. In a supervised learning
algorithm, we first fit/train a model on training data, then test the
model on testing data. Once we have the model’s predictions from
the X_test data, we compare them to the true y_values (the correct
labels).
We feed the image of the dog into the training model. Suppose the
model predicts that this is a dog, and then we compare the
prediction to the correct label. If the model predicts that this image
is a cat and then we again compare it to the correct label and it
would be incorrect.

We repeat this process for all images in X_test data. Eventually,

we’ll have a count of correct and incorrect matches. But in reality, it
is very rare that all incorrect or correct matches hold equal value.
Therefore one metric won’t tell the entire story.
Accuracy is useful when the target class is well balanced but is not
a good choice for the unbalanced classes. Imagine the scenario
where we had 99 images of the dog and only 1 image of a cat
present in our training data. Then our model would always predict
the dog, and therefore we got 99% accuracy. In reality, Data is
always imbalanced for example Spam email, credit card fraud, and
medical diagnosis. Hence, if we want to do a better model
evaluation and have a full picture of the model evaluation, other
metrics such as recall and precision should also be considered.

Confusion Matrix is a performance measurement for the machine

learning classification problems where the output can be two or
more classes. It is a table with combinations of predicted and actual
values.

A confusion matrix is defined as thetable that is often used

to describe the performance of a classification model on a
set of the test data for which the true values are known.
It is extremely useful for measuring the Recall, Precision, Accuracy,
and AUC-ROC curves.

Let’s try to understand TP, FP, FN, TN with an example of pregnancy

analogy.
True Positive: We predicted positive and it’s true. In the image, we
predicted that a woman is pregnant and she actually is.

True Negative: We predicted negative and it’s true. In the image, we

predicted that a man is not pregnant and he actually is not.

False Positive (Type 1 Error)- We predicted positive and it’s false. In

the image, we predicted that a man is pregnant but he actually is
not.

False Negative (Type 2 Error)- We predicted negative and it’s false.

In the image, we predicted that a woman is not pregnant but she
actually is.
We discussed Accuracy, now let’s discuss some other metrics of the
confusion matrix

Precision —Precision explains how many of the correctly predicted

cases actually turned out to be positive. Precision is useful in the
cases where False Positive is a higher concern than False Negatives.
The importance of Precision is in music or video recommendation
systems, e-commerce websites, etc. where wrong results could lead
to customer churn and this could be harmful to the business.

Precision for a label is defined as the number of true positives

divided by the number of predicted positives.

2. Recall (Sensitivity) — Recall explains how many of the actual

positive cases we were able to predict correctly with our model. It is
a useful metric in cases where False Negative is of higher concern
than False Positive. It is important in medical cases where it doesn’t
matter whether we raise a false alarm but the actual positive cases
should not go undetected!

Recall for a label is defined as the number of true positives divided

by the total number of actual positives.
3. F1 Score — It gives a combined idea about Precision and Recall
metrics. It is maximum when Precision is equal to Recall.

F1 Score is the harmonic mean of precision and recall.

The F1 score punishes extreme values more. F1 Score could be an

effective evaluation metric in the following cases:
 When FP and FN are equally costly.
 Adding more data doesn’t effectively change the outcome
 True Negative is high

4. AUC-ROC — The Receiver Operator Characteristic (ROC) is a

probability curve that plots the TPR(True Positive Rate) against the
FPR(False Positive Rate) at various threshold values and separates
the ‘signal’ from the ‘noise’.

The Area Under the Curve (AUC) is the measure of the ability of a
classifier to distinguish between classes. From the graph, we simply
say the area of the curve ABDE and the X and Y-axis.

From the graph shown below, the greater the AUC, the better is the
performance of the model at different threshold points between
positive and negative classes. This simply means that When AUC is
equal to 1, the classifier is able to perfectly distinguish between all
Positive and Negative class points. When AUC is equal to 0, the
classifier would be predicting all Negatives as Positives and vice
versa. When AUC is 0.5, the classifier is not able to distinguish
between the Positive and Negative classes.

Image Source— https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/

Working of AUC —In a ROC curve, the X-axis value shows False
Positive Rate (FPR), and Y-axis shows True Positive Rate (TPR).
Higher the value of X means higher the number of False
Positives(FP) than True Negatives(TN), while a higher Y-axis value
indicates a higher number of TP than FN. So, the choice of the
threshold depends on the ability to balance between FP and FN.

5. Log Loss — Log loss (Logistic loss) or Cross-Entropy Loss is one of

the major metrics to assess the performance of a classification
problem.

For a single sample with true label y∈{0,1} and a probability

estimate p=Pr(y=1), the log loss is:

Conclusion
Understanding how well a machine learning model will perform on
unseen data is the main purpose behind working with these
evaluation metrics. Metrics like accuracy, precision, recall are good
ways to evaluate classification models for balanced datasets, but if
the data is imbalanced then other methods like ROC/AUC perform
better in evaluating the model performance.

ROC curve isn’t just a single number but it’s a whole curve that
provides nuanced details about the behavior of the classifier. It is
also hard to quickly compare many ROC curves to each other.
Class Imbalance

Data are said to suffer the Class Imbalance Problem when the class
distributions are highly imbalanced. In this context, many classification
learning algorithms have low predictive accuracy for the infrequent
class. Cost-sensitive learning is a common approach to solve this problem.

Class imbalanced datasets occur in many real-world applications where the

class distributions of data are highly imbalanced. For the two-class case,
without loss of generality, one assumes that the minority or rare class is
the positive class, and the majority class is the negative class. Often the
minority class is very infrequent, such as 1% of the dataset. If one applies
most traditional (cost-insensitive) classifiers on the dataset, they are likely
to predict everything as negative (the majority class). This was often
regarded as a problem in learning from highly imbalanced datasets.

Magic Quadrant For Energy TR 227079
No ratings yet
Magic Quadrant For Energy TR 227079
29 pages
Kunci Jawaban Supply Chain Management PreSales Specialist Assessment
50% (2)
Kunci Jawaban Supply Chain Management PreSales Specialist Assessment
19 pages
Mba III Unit I Notes
No ratings yet
Mba III Unit I Notes
9 pages
Forrester DLP Maturity Grid
No ratings yet
Forrester DLP Maturity Grid
20 pages
Ai For Business Leaders
100% (2)
Ai For Business Leaders
25 pages
The Art of Customer Profiling: Why Understanding Audience Is Important Andhowtodoit
No ratings yet
The Art of Customer Profiling: Why Understanding Audience Is Important Andhowtodoit
16 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
Ba Unit 1a
No ratings yet
Ba Unit 1a
18 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
UNIT 3 NIVELACIÓN DE INGLÉS
No ratings yet
UNIT 3 NIVELACIÓN DE INGLÉS
34 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
unit 2
No ratings yet
unit 2
81 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
1 Introduction to Data Analytics
No ratings yet
1 Introduction to Data Analytics
14 pages
Data sci notes
No ratings yet
Data sci notes
88 pages
Data Analytics - Definition Uses Examples Process
No ratings yet
Data Analytics - Definition Uses Examples Process
3 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
What is Data Analytics
No ratings yet
What is Data Analytics
44 pages
Data Analytics
No ratings yet
Data Analytics
183 pages
Data Analysis
No ratings yet
Data Analysis
34 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Data Analytics
100% (3)
Data Analytics
11 pages
Unit 1
No ratings yet
Unit 1
50 pages
Data Analytics 1
No ratings yet
Data Analytics 1
3 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
12 pages
Data Analytics
No ratings yet
Data Analytics
7 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
Data Analytics Unit1-4
No ratings yet
Data Analytics Unit1-4
195 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
9 pages
Introduction to Data Science and Data Analytics
No ratings yet
Introduction to Data Science and Data Analytics
72 pages
What Are Data Analytics
No ratings yet
What Are Data Analytics
19 pages
AI PL-300
No ratings yet
AI PL-300
193 pages
1overview of Data Analysis
No ratings yet
1overview of Data Analysis
3 pages
2.Data analysis Vs analytics
No ratings yet
2.Data analysis Vs analytics
6 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
4 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
19 pages
data analysis vs analytics
No ratings yet
data analysis vs analytics
4 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
Unit1
No ratings yet
Unit1
21 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
35 pages
Power BI - Notes
No ratings yet
Power BI - Notes
13 pages
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
No ratings yet
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
52 pages
Business Analytics
No ratings yet
Business Analytics
42 pages
Data Analytics Unit-1
No ratings yet
Data Analytics Unit-1
83 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
Unit I
No ratings yet
Unit I
47 pages
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
No ratings yet
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
25 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Module 2 Data Analytics and Its Type
No ratings yet
Module 2 Data Analytics and Its Type
9 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
What Is Data Analytics?
No ratings yet
What Is Data Analytics?
56 pages
Data Analytics Guide
No ratings yet
Data Analytics Guide
10 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Business Analytics Prelim
No ratings yet
Business Analytics Prelim
12 pages
Additional Notes BADS
No ratings yet
Additional Notes BADS
9 pages
Technical Seminar 2 - Copy (1)[2]
No ratings yet
Technical Seminar 2 - Copy (1)[2]
22 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Fabric
No ratings yet
Fabric
46 pages
Nandini internship ppt.2222pptx
No ratings yet
Nandini internship ppt.2222pptx
15 pages
Master The Top 8 Trends in Financial Services
No ratings yet
Master The Top 8 Trends in Financial Services
22 pages
Banking, Finance and Insurance Domain
No ratings yet
Banking, Finance and Insurance Domain
14 pages
IMS Class Assignment
No ratings yet
IMS Class Assignment
19 pages
Abulag, Lovely T. Bsit Iii-B
No ratings yet
Abulag, Lovely T. Bsit Iii-B
5 pages
Startups Energia PDF
No ratings yet
Startups Energia PDF
33 pages
Plaza Analytics P Bie Requirements
No ratings yet
Plaza Analytics P Bie Requirements
12 pages
Coolserve Inc.: "Be Cool - Save Energy, No To Wastes"
No ratings yet
Coolserve Inc.: "Be Cool - Save Energy, No To Wastes"
3 pages
Me Operations-Excellence Offering
No ratings yet
Me Operations-Excellence Offering
22 pages
Week 1 - Concept of Social Media Analytics
No ratings yet
Week 1 - Concept of Social Media Analytics
21 pages
Concept Wave Rapid CRM
No ratings yet
Concept Wave Rapid CRM
13 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
IXLQuickStart Diagnostic
No ratings yet
IXLQuickStart Diagnostic
8 pages
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
No ratings yet
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
6 pages
HR Analytics Ebook 1
100% (1)
HR Analytics Ebook 1
57 pages
Branch Banking vs. Digital Banking and How To Go Forward: Presented By: Ashish Kumar
No ratings yet
Branch Banking vs. Digital Banking and How To Go Forward: Presented By: Ashish Kumar
15 pages
Ok 1
No ratings yet
Ok 1
1 page
ms-information-science-machine-learning-university-arizona
No ratings yet
ms-information-science-machine-learning-university-arizona
14 pages
MATH6200 - Data Analysis
No ratings yet
MATH6200 - Data Analysis
4 pages
Oracle BI EE
No ratings yet
Oracle BI EE
7 pages
Grow More Faculty of Management Finance Topic List For Sip
No ratings yet
Grow More Faculty of Management Finance Topic List For Sip
5 pages
BRD - LMS
No ratings yet
BRD - LMS
8 pages
Deliverables by Workstream Design 2011 PDF
No ratings yet
Deliverables by Workstream Design 2011 PDF
1 page
Petroleum Data Analytics
No ratings yet
Petroleum Data Analytics
2 pages

Data analytics_1

Uploaded by

Data analytics_1

Uploaded by

Chapter 1

Types of Data Analytics

1. Decision making improves

2. Marketing becomes more effective

3. Customer service improves

4. The efficiency of operations increases

Who Is Using Data Analytics?

What is data analysis?

 Give others access to the data

Mathematical models are of different types

Non Linear Models

A nonlinear model describes nonlinear relationships in experimental data.

Electron Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 *

In descriptive modeling, customer groups are clustered according to demographics,

The main aspects of descriptive modeling include:

 Customer segmentation: Partitions a customer base into groups with various

A very common example of binary classification is spam detection,

There are many ways for measuring classification performance.

Consider a binary classification problem, where a model can achieve

We repeat this process for all images in X_test data. Eventually,

Confusion Matrix is a performance measurement for the machine

A confusion matrix is defined as thetable that is often used

Let’s try to understand TP, FP, FN, TN with an example of pregnancy

True Negative: We predicted negative and it’s true. In the image, we

False Positive (Type 1 Error)- We predicted positive and it’s false. In

False Negative (Type 2 Error)- We predicted negative and it’s false.

Precision —Precision explains how many of the correctly predicted

Precision for a label is defined as the number of true positives

2. Recall (Sensitivity) — Recall explains how many of the actual

Recall for a label is defined as the number of true positives divided

F1 Score is the harmonic mean of precision and recall.

The F1 score punishes extreme values more. F1 Score could be an

4. AUC-ROC — The Receiver Operator Characteristic (ROC) is a

Image Source— https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/

5. Log Loss — Log loss (Logistic loss) or Cross-Entropy Loss is one of

For a single sample with true label y∈{0,1} and a probability

Class imbalanced datasets occur in many real-world applications where the

You might also like