Da CH1 Slqa

Data science is applying analytics techniques and principles to extract valuable information from data for business decision making and strategic planning. The purpose of data science is to find patterns in data to understand the world. It includes developing strategies for analyzing, preparing, exploring, analyzing, visualizing, and building models from data using programming languages. Analytics is discovering, interpreting, and communicating significant patterns in data to see insights that may otherwise be unseen.

Uploaded by

Sushant Thite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views6 pages

Da CH1 Slqa

Uploaded by

Sushant Thite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

1. What is data science? What is its purpose? Explain in detail.

= 1) Data science is the field of applying advanced analytics techniques and scientific
principles to extract valuable information from data for business decision-making, strategic
planning and other uses. 2) The purpose of data science is to find patterns. Understanding
patterns means understanding the world. 3) A data science can include developing
strategies for analysing data, preparing data for analysis, exploring, analyzing, and
visualizing data, building models with data using programming languages, such as Python
and R, and deploying models into applications.
2. Define the term analytics.
= Analytics is the process of discovering, interpreting, and communicating significant
patterns in data. Quite simply, analytics helps us see insights and meaningful data that we
might not otherwise detect.
3. Enlist types of data analytics.
= Predictive (forecasting), Descriptive (business intelligence and data mining), Prescriptive
(optimization and simulation), Diagnostic analytics. Cognitive Analytics.
4. Define data analysis.
= Data Analysis is the process of systematically applying statistical and/or logical
techniques to describe and illustrate, condense and recap, and evaluate data.3)
5. Define mathematical model.
= A structural model of a system is a mathematical relationship between one or several
input variables and parameters and one or several output variables.
6. What is the purpose of diagnostic analytics?
= The purpose of diagnostic analytics is to determine the root cause of an occurrence or
trend.
7. Define class imbalance.
= A classification data set with skewed class proportions is called imbalanced. Classes that
make up a large proportion of the data set are called majority classes. Those that make up
a smaller proportion are minority classes.
8. Differentiate between predictive analytics and prescriptive analytics. Any two
points.
= Predictive Analytics predicts what is most likely to happen in the future. Predictive
analytics provides you with the raw material for making informed decisions, while
prescriptive analytics provides you with data-backed decision options that you can weigh
against one another. Prescriptive Analytics recommends actions you can take to affect
those outcomes.
9. Define exploratory analysis.
= Exploratory Data Analysis refers to the critical process of performing initial investigations
on data so as to discover patterns, to spot anomalies, to test hypothesis and to check
assumptions with the help of summary statistics and graphical representations.
10. Define linear model.
= Linear models describe a continuous response variable as a function of one or more
predictor variables. They can help you understand and predict the behavior of complex
systems or analyse experimental, financial, and biological data.
11. What is model evaluation?
= Model evaluation is the process of using different evaluation metrics to understand a
machine learning model's performance, as well as its strengths and weaknesses.
12. Define predictive analytics.
= Predictive analytics is a branch of advanced analytics that makes predictions about future
outcomes using historical data combined with statistical modeling, data mining techniques
and machine learning.
13. What is the purpose of AUC and AOC curves?
= AUC - ROC curve is a performance measurement for the classification problems at
various threshold settings. ROC is a probability curve and AUC represents the degree or
measure of separability.
14. Define baseline model.
= A baseline model is essentially a simple model that acts as a reference in a machine
learning project.
15. Define descriptive analytics.
= Descriptive analytics is the process of using current and historical data to identify trends
and relationships.
16. Define the terms metric and classifier.
= Metrics describe the exact numbers that make up the data. Classification is
about predicting the class labels given input data.
2. What is data analytics? Enlist its different roles. Also state its advantages and
disadvantages.
= 1) Data analytics (DA) is the process of examining data sets in order to find trends and
draw conclusions about the information they contain. 2) A business intelligence analyst's
primary job is to extract value from their company's data. At most companies, BI Analysts
need to be comfortable analyzing data, working with SQL, and creating data visualizations
and models.
Advantages: Data analytics helps businesses get real-time insights about sales,
marketing, finance, product development, and more. It allows teams within businesses to
collaborate and achieve better results. It is useful for businesses to analyse past business
performance and optimize future business processes. Disadvantages: The companies
may exchange these useful customer databases for their mutual benefits.The cost of data
analytics tools vary based on applications and features supported. Moreover some of the
data analytics tools are complex to use and require training.
5. Differentiate between data analysis and data analytics.
= Analysis: *It is described as a particularized form of analytics. *It analyzes the data by
focusing on insights into business data. *It uses different tools to analyze data such as
Rapid Miner, Open Refine, Node XL, KNIME, etc. *A Descriptive analysis can be
performed on this. *One cannot find anonymous relations with the help of this. *It supports
inferential analysis.
Analytics: *It is described as a traditional form or generic form of analytics. *It supports
decision making by analyzing enterprise data. *It uses various tools to process data such
as Tableau, Python, Excel, etc. *Descriptive analysis cannot be performed on this.
*One can find anonymous relations with the help of this. *It does not deal with inferential
analysis.
9. Write a short note on: Mechanistic analytics.
= Mechanistic analysis is for measuring the exact changes in variables that lead to other
changes in other variables.
3. With the help of diagram describe lifecycle of data analytics.
= Diagram: Discovery < > Data Preparation < > Model Planning < > Model Building < >
Communication Results > Operationalize
1) Discovery: The data science team is trained and researches the issue.Create context
and gain understanding. Learn about the data sources that are needed and accessible to
the project. The team comes up with an initial hypothesis, which can be later confirmed with
evidence. 2) Data Preparation: Methods to investigate the possibilities of pre-processing,
analysing, and preparing data before analysis and modelling. It is required to have an
analytic sandbox. The team performs, loads, and transforms to bring information to the data
sandbox. Data preparation tasks can be repeated and not in a predetermined
sequence.Some of the tools used commonly for this process include - Hadoop, Alpine
Miner, Open Refine, etc. 3)Model Planning: The team studies data to discover the
connections between variables. Later, it selects the most significant variables as well as the
most effective models. In this phase, the data science teams create data sets that can be
used for training for testing, production, and training goals. The team builds and
implements models based on the work completed in the modelling planning phase. Some
of the tools used commonly for this stage are MATLAB and STASTICA. 4) Model
Building: The team creates datasets for training, testing as well as production use. The
team is also evaluating whether its current tools are sufficient to run the models or if they
require an even more robust environment to run models. Tools that are free or open-source
or free tools Rand PL/R, Octave, WEKA. Commercial tools - MATLAB, STASTICA.
5) Communication Results: Following the execution of the model, team members will
need to evaluate the outcomes of the model to establish criteria for the success or failure of
the model. The team is considering how best to present findings and outcomes to the
various members of the team and other stakeholders while taking into consideration
cautionary tales and assumptions. The team should determine the most important findings,
quantify their value to the business and create a narrative to present findings and
summarize them to all stakeholders.6) Operationalize: The team distributes the benefits of
the project to a wider audience. It sets up a pilot project that will deploy the work in a
controlled manner prior to expanding the project to the entire enterprise of users. This
technique allows the team to gain insight into the performance and constraints related to
the model within a production setting at a small scale and then make necessary
adjustments before full deployment. The team produces the last reports, presentations, and
codes. Open source or free tools such as WEKA, SQL, MADlib, and Octave.
4. Explain four layers in data analytics framework diagrammatically.
= Use Cases: use case is the manner in which the business user leverages data and the
analytics system to derive insights to answer tangible business questions for decision
making. Data Sets: A data set is a collection of related, discrete items of related data that
may be accessed individually or in combination or managed as a whole entity. A data set is
organized into some type of data structure. , Data Collection: The DCF is a custom
application developed by our engineers to collect data from a number of inventory systems
implemented across our client's estates., Data Preparation: Data preparation is the
process of cleaning and transforming raw data prior to processing and analysis., Learning:
and Intelligent Actions: A system that deliverer trustworthy, reliable data, while also
providing intelligence about said data, or metadata..
6. What are the types of data analytics? Describe two of them in detail.
= there are four main types of analysis: Descriptive, diagnostic, predictive, and prescriptive.
Predictive analytics: is a branch of advanced analytics that makes predictions about
future outcomes using historical data combined with statistical modeling, data mining
techniques and machine learning. Prescriptive analytics is the process of using data to
determine an optimal course of action. By considering all relevant factors, this type of
analysis yields recommendations for next steps. Prescriptive analytics can cut through the
clutter of immediate uncertainty and changing conditions. It can help prevent fraud, limit
risk, increase efficiency, meet business goals, and create more loyal customers.
8. What exploratory analytics? What is its purpose? Explain with example.
= 1) Exploratory Data Analysis (EDA) is an approach to analyze the data using visual
techniques.2) critical process of performing initial investigations on data so as to discover
patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of
summary statistics and graphical representations. 3) You are open to the fact that any
number of people might buy any number of different types of shoes. You visualize the data
using exploratory data analysis to find that most customers buy 1-3 different types of
shoes. Sneakers, dress shoes, and sandals seem to be the most popular ones.
10. What is mathematical model? List its types. Explain two of them in detail.
= 1) Mathematical modelling of complex, large-scale random systems. Advanced methods
for the analysis of complex structured and unstructured data sets, in particular financial
data sets. Rough path analysis, signatures and data analysis.
2) Linear Algebra & Calculus. ...Statistics. ...Machine Learning/Statistical
Model. ...Operation Research.
3) Linear algebra & calculus would be considered the most basic. This is especially true
given the “Deep Learning” environment that we are in. Deep learning requires us to
understand linear algebra & calculus, to understand how it works, for example forward
propagation, backward propagation, parameters setting etc.
Statistics are simple statistics like measurement of centrality, distributions and different
probability distributions (Weibull, Poisson etc), Baye’s Theorem (there’s a strong emphasis
on it when it comes to learning about Artificial Intelligence later), hypothesis testing etc.
12. What is baseline model? Enlist two of them in detail.
= 1) A baseline model is essentially a simple model that acts as a reference in a machine
learning project. Its main function is to contextualize the results of trained models. Baseline
models usually lack complexity and may have little predictive power. Regardless, their
inclusion is a necessity for many reasons.
here are three types of the baseline model that are Random Baseline Models: In the real
world, data can not always be predictable. In these such problems, the best baseline model
is a dummy classifier or dummy regressor. That baseline model shows you to your ml
model is learning or not. ML Baseline Models, and Automated ML Baseline Models:
AutoML uses machine learning to analyze the structure and meaning of text data. You can
use AutoML to train an ML model to classify text data, extract information, or understand
the sentiment of authors..
19.Write a short note on: Evaluating value prediction models.
= To get the true value of a predictive model, you have to know how good your model fits
the data. Your model should also withstand the change in the data sets, or being put
through a completely new data set. To start, you need to get clear about what business
challenge this model is helping solve.
13. How to evaluate a model? Describe in detail.
= Model evaluation is the process of using different evaluation metrics to understand a
machine learning model's performance, as well as its strengths and weaknesses. Model
evaluation is important to assess the efficacy of a model during initial research phases, and
it also plays a role in model monitoring.
Evaluation is a process during development of the model to check whether the model is
best fit for the given problem and corresponding data. Keras model provides a function,
evaluate which does the evaluation of the model.
14. Write a short note on: Metrics for evaluating classifiers.
= Classification is about predicting the class labels given input data. In binary classification,
there are only two possible output classes (i.e., Dichotomy). In multiclass classification,
more than two possible classes can be present. I’ll focus only on binary classification. A
very common example of binary classification is spam detection, where the input data could
include the email text and metadata (sender, sending time), and the output label is
either “spam” or “not spam.” Sometimes, people use some other names also for the two
classes: “positive” and “negative,” or “class 1” and “class 0.”

15. What is confusion matrix? How to use it in data analytics? Explain

diagrammatically.
= Confusion Matrix is a performance measurement for the machine learning classification
problems where the output can be two or more classes. It is a table with combinations of
predicted and actual values. A confusion matrix is defined as thetable that is often used to
describe the performance of a classification model on a set of the test data for which the
true values are known. It is extremely useful for measuring the Recall, Precision, Accuracy,
and AUC-ROC curves.

17. What is ROC curve? How to implement it? Explain with example.
= 1) A Receiver Operator Characteristic (ROC) curve is a graphical plot used to show the
diagnostic ability of binary classifiers. It was first used in signal detection theory but is now
used in many other areas such as medicine, radiology, natural hazards and machine
learning.
2) Recipe Objective. *Import the library - GridSearchCv. *Setup the Data. *Spliting the data
and Training the model. *Using the models on test dataset. *Creating False and True
Positive Rates and printing Scores. *Ploting ROC Curves
3) ROC curves are frequently used to show in a graphical way the connection/trade-off
between clinical sensitivity and specificity for every possible cut-off for a test or a
combination of tests. In addition the area under the ROC curve gives an idea about the
benefit of using the test(s) in question.
16. Define accuracy, precision, recall and f-score.
= 1. Precision — Precision explains how many of the correctly predicted cases actually
turned out to be positive. Precision is useful in the cases where False Positive is a higher
concern than False Negatives. The importance of Precision is in music or video
recommendation systems, e-commerce websites, etc. where wrong results could lead to
customer churn and this could be harmful to the business.
Precision for a label is defined as the number of true positives divided by the
number of predicted positives.

2. Recall (Sensitivity) — Recall explains how many of the actual positive cases we were
able to predict correctly with our model. It is a useful metric in cases where False Negative
is of higher concern than False Positive. It is important in medical cases where it doesn’t
matter whether we raise a false alarm but the actual positive cases should not go
undetected!
Recall for a label is defined as the number of true positives divided by the total
number of actual positives.

3. F1 Score — It gives a combined idea about Precision and Recall metrics. It is maximum
when Precision is equal to Recall.
F1 Score is the harmonic mean of precision and recall.

The F1 score punishes extreme values more. F1 Score could be an effective evaluation
metric in the following cases: 1) When FP and FN are equally costly. 2) Adding more data
doesn’t effectively change the outcome 3)True Negative is high
4. AUC-ROC — The Receiver Operator Characteristic (ROC) is a probability curve that plots
the TPR (True Positive Rate) against the FPR (False Positive Rate) at various threshold
values and separates the ‘signal’ from the ‘noise’. The Area Under the Curve (AUC) is the
measure of the ability of a classifier to distinguish between classes. From the graph, we
simply say the area of the curve ABDE and the X and Y-axis.

Volcanic Eruptions
75% (4)
Volcanic Eruptions
2 pages
DA (All CHP.)
No ratings yet
DA (All CHP.)
14 pages
analytics and data science
No ratings yet
analytics and data science
12 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
BA NOTES SHORT
No ratings yet
BA NOTES SHORT
50 pages
DataAnalytics-Chap-1
No ratings yet
DataAnalytics-Chap-1
36 pages
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
HTTTTC- FINAL EXAM
No ratings yet
HTTTTC- FINAL EXAM
4 pages
Q
No ratings yet
Q
28 pages
D.a_introduction to Data Analytics
No ratings yet
D.a_introduction to Data Analytics
16 pages
ixs8h-l8mgc
No ratings yet
ixs8h-l8mgc
40 pages
Crash Course_Introduction to Data Science
No ratings yet
Crash Course_Introduction to Data Science
121 pages
test (1)
No ratings yet
test (1)
7 pages
Data Science
No ratings yet
Data Science
10 pages
DA Notes
No ratings yet
DA Notes
10 pages
DADV_Question Bank_ Important Questions of DADV
No ratings yet
DADV_Question Bank_ Important Questions of DADV
20 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
Question Bank( DA)-1
No ratings yet
Question Bank( DA)-1
14 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
Unit-II (Data Analytics)
100% (1)
Unit-II (Data Analytics)
17 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
6 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Business Analytics Process and Data Exploration
No ratings yet
Business Analytics Process and Data Exploration
38 pages
1) What Is Business Analytics?
No ratings yet
1) What Is Business Analytics?
6 pages
Data Science Interview Best
No ratings yet
Data Science Interview Best
48 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
No ratings yet
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
60 pages
Data Science
No ratings yet
Data Science
14 pages
Unit V
No ratings yet
Unit V
3 pages
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
No ratings yet
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
13 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Fda 1
No ratings yet
Fda 1
5 pages
Unit- 2 PDA[1]
No ratings yet
Unit- 2 PDA[1]
20 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
wk6_Data Analytics
No ratings yet
wk6_Data Analytics
25 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Q
No ratings yet
Q
12 pages
AssignmentBigData
No ratings yet
AssignmentBigData
7 pages
DSA question bank
No ratings yet
DSA question bank
22 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Here is an even more detailed and expanded version of Chapter 1 - Copy
No ratings yet
Here is an even more detailed and expanded version of Chapter 1 - Copy
5 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
BA THEORY
No ratings yet
BA THEORY
10 pages
Data-Science-and-Analytics-Reviewer
No ratings yet
Data-Science-and-Analytics-Reviewer
5 pages
unit 1
No ratings yet
unit 1
5 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Data Analytics Chapter -1
No ratings yet
Data Analytics Chapter -1
42 pages
50 Data Analytics Interview Questions
No ratings yet
50 Data Analytics Interview Questions
10 pages
Introd Ata Lytics
No ratings yet
Introd Ata Lytics
32 pages
Lecture02 Frameworks Platforms-Part1
No ratings yet
Lecture02 Frameworks Platforms-Part1
40 pages
Data Analytics
No ratings yet
Data Analytics
30 pages
CC Paper
No ratings yet
CC Paper
2 pages
Pro
No ratings yet
Pro
4 pages
Fair Online Examination System
No ratings yet
Fair Online Examination System
38 pages
WT Assign 2
No ratings yet
WT Assign 2
8 pages
Synopsis OES
No ratings yet
Synopsis OES
6 pages
Da CH4 Slqa
No ratings yet
Da CH4 Slqa
4 pages
CC5
No ratings yet
CC5
5 pages
CC1
No ratings yet
CC1
6 pages
Autonomous Maze Solving Robot
No ratings yet
Autonomous Maze Solving Robot
2 pages
Penal Code English
No ratings yet
Penal Code English
200 pages
A Hungarian Writer's Point of View Wolf Solent
No ratings yet
A Hungarian Writer's Point of View Wolf Solent
4 pages
00 Differential Equations Notes
No ratings yet
00 Differential Equations Notes
21 pages
Black Composers Orchreport1
No ratings yet
Black Composers Orchreport1
24 pages
Owners Manual: Handling & Safety Instructions
No ratings yet
Owners Manual: Handling & Safety Instructions
37 pages
Issue 48
No ratings yet
Issue 48
3 pages
Theme Analysis The Veldt
No ratings yet
Theme Analysis The Veldt
3 pages
The Lived Experiences of Solo Parents in Selected Municipalities of Zambales
No ratings yet
The Lived Experiences of Solo Parents in Selected Municipalities of Zambales
24 pages
NARA T733 R10 Guide 96
No ratings yet
NARA T733 R10 Guide 96
63 pages
LXT 1 Kings 15:13 LXT 2 Kings 23:4
No ratings yet
LXT 1 Kings 15:13 LXT 2 Kings 23:4
8 pages
Rules For Writing Multiple Choice Questions
No ratings yet
Rules For Writing Multiple Choice Questions
5 pages
Programming Process
No ratings yet
Programming Process
16 pages
The Blind Man 2 May 1917
No ratings yet
The Blind Man 2 May 1917
20 pages
Din en 10149-2
100% (2)
Din en 10149-2
12 pages
Energy Balance, Body Composition and Weight Control
No ratings yet
Energy Balance, Body Composition and Weight Control
48 pages
Fraceologia 4
No ratings yet
Fraceologia 4
1 page
Electives Reviewer
No ratings yet
Electives Reviewer
15 pages
Measuring Food Safety Culture in Food Manufacturing-59-173
100% (2)
Measuring Food Safety Culture in Food Manufacturing-59-173
115 pages
NIRF 2024_Overall_MMCOE_Report
No ratings yet
NIRF 2024_Overall_MMCOE_Report
11 pages
Sai Kumar
No ratings yet
Sai Kumar
2 pages
Bloons TD 6gallery Bloons Wiki Fandom
No ratings yet
Bloons TD 6gallery Bloons Wiki Fandom
1 page
Deploy Prototxt
No ratings yet
Deploy Prototxt
31 pages
R-Series 37-160 KW Brochure A4
No ratings yet
R-Series 37-160 KW Brochure A4
8 pages
Vigyapti SET 2024 Dated 14 11 2024
No ratings yet
Vigyapti SET 2024 Dated 14 11 2024
3 pages
AC - ARK Structural
No ratings yet
AC - ARK Structural
9 pages
DLL - Gr.1 Q3 - Sentences and Non-Sentences
100% (4)
DLL - Gr.1 Q3 - Sentences and Non-Sentences
4 pages
POA Grade 12 EFAL-1
No ratings yet
POA Grade 12 EFAL-1
1 page
Baltimore Afro-American Newspaper, April 17, 2010
No ratings yet
Baltimore Afro-American Newspaper, April 17, 2010
20 pages

Da CH1 Slqa

Uploaded by

Da CH1 Slqa

Uploaded by

1. What is data science? What is its purpose? Explain in detail.

15. What is confusion matrix? How to use it in data analytics? Explain

You might also like