0% found this document useful (0 votes)

41 views

Data Science Important Questions

The document contains a series of questions and answers related to data science concepts, including statistical methods, data types, and ethical considerations in data analysis. Key topics include uniform distribution, survivorship bias, data merging, and the Central Limit Theorem, along with practical applications and examples. It serves as an educational resource for students studying data science under the CBSE curriculum.

Uploaded by

khanrihan77703

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Data Science Important Questions

Uploaded by

khanrihan77703

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

CBSE | DEPARTMENT OF SKILL EDUCATION

DATA SCIENCE (SUBJECT CODE - 419)

Q. 1 A survey of 20 people (10 kids and 10 adults) was taken on the type of food they like to snack. The following
responses were recorded:
● 7 kids liked pizza while the other kids preferred burger
● 5 adults liked burger while the other adults preferred pizza
Based on the information given above, build a two-way frequency table.

Ans.

Q. 2 Explain uniform distribution with the help of an example.

Ans. Every probability distribution is associated with a graph that describes the likelihood of the occurrence of each
event. This type of distribution is called a Uniform Distribution. Example: Rolling a Dice, Tossing a coin, etc.

Q. 3 What is survivorship bias? Give an example.

ANs. The survivorship bias is based on the concept that we usually tend to twist the data sets by focusing on
successful examples and ignoring the failures. Thistype of bias also occurs when we are looking at the competitors.
Example: A hospital is conducting research on trauma patients admitted to the ER, seeking to find out which
procedures work best. However, researchers can only begin their studies if a patient is stable enough to give consent.

Q. 4 Confidential data is maintained in the form of digital as well as physical copies. Suggest two ways in which we can
discard the physical copies of confidential data.
Ans. We can safely discard the data in one ofthe following ways.
 Shredding the Documents
 Burning the Documents
 Cutting up the Documents

Q. 5 Explain the term ‘subset’ and the different ways of subsetting the data.
Ans. For data analysis, we do not need the entire data for consideration. Therefore, instead of working with the whole
data set, we can take a certain part of the data for our analysis. This division of a small set of data from a large set of
data is known as a Subset.
Different ways of subsetting data are:
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
Row-based subset: some rows from the top or bottom of the table are taken into consideration for Row-based
subsetting.
Column-based subset: specific columns from the dataset are taken into consideration for column-based subsetting.
Data-specific subset: Only specific data is taken into consideration for Data specific subset.

Q. 6 Manya, a student of class 10 was learning the topic “Statistical Problem Solving Process”. She did not understand
the concept and had trouble understanding it. Help her by explaining the process with a good example.
Ans. The purpose of the Statistical Problem Solving Processis to collect and analyze data to answer the statistical
investigative questions. Thisinvestigative process involves four components, each of which involves exploring and
addressing variability:
1.Formulate Statistical Investigative
2. Collect/Consider the Data
3.Analyze the Data
4. Interpret the Data

Q. 7 “Given a significantly large sample size from a population with finite variance, the mean of all samples from same
set of population will be roughly equal to the mean of the population”. What is this statement about? Explain it with
the help of an example.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution. The Central Limit Theorem is a statistical
theory stating that given a significantly large sample size from a population with finite variance, the mean of all
samples from the same set of populations will be roughly equal to the mean of the population.

Q. 8 We can perform data merging by implementing data joins on the databases in frame. How many types of joins
are there? Explain.
Ans. We can perform data merging by implementing data joins on the databases in frame. There are three categories
of data joins:
One to One Joins: One to one join is probably one of the simplest join techniques. In this type of join, each row in one
table is linked to a single row in another table using a “key” column.
One to Many Joins In a one to many join, one record in a table can be related to one or many records in another
table.
Many to Many Joins A many to many relationships is said to occur when multiple records in one table are related to
multiple records of other table.

Q. 8 List 3 real-life applications of Standard Deviation.

Ans. 1. Grading tests 2. Evaluating a survey 3. Weather forecasting
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
Q. 9 What is discrete and continuous data?
Ans. Discrete Data is the data that takes only specified values. For example, if you give a test, you can either pass or
fail. So, data is discrete in this case as it has only two specified outcomes.
Continuous Data is the data that can take any value within a given range. This range can be either finite or infinite. For
example, depth of an ocean, weight of a person or length of a road.

Q. 10 Explain selection bias.

Ans. Selection bias in data science refers to the systematic error that occurs when the sample data used for analysis is
not representative of the overall population or the intended target group. This can lead to incorrect conclusions,
distorted models, and poor generalization to unseen data. Selection bias can occur at various stages of the data
lifecycle, including data collection, processing, and analysis.

Q. 11 What is data merging?

Ans. In Data Science, data merging is the process of combining two or more data sets into a single data frame. This
process is necessary when we have raw data stored in multiple files or data tables, that we want to analyze all in one
go.

Q, 12 What are various techniques of safely discarding digital confidential data?

Ans. 1. Once you are done with the job and you no longer need the user data, you can go ahead and clean out the
data from the memory.
2. Even while storing the data in your device, you can encrypt the data to make sure that even in the case of a data
leak, hackers are not able to read your data.
3. You can also format the computer drive/hardisk where the client confidential data was stored for a clean discarding.

Q. 13 Why is z-score important?

Ans. . 1. It gives us an opportunity to calculate the probability of a value occurring within a normal distribution.
2. Z-score allows us to compare two values that are from the different samples.

Q. 14 Give two practical applications of Central theorem.

Ans. Some practical implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate.
2. The Central Limit Theorem can also be used to calculate the mean family income for a specific region.

Q. 15 Explain percentile, quartiles, and deciles, with examples.

Ans. A percentile can be defined as the percentage of the total ordered observations at or below it. Therefore, pth
percentile of a distribution is the value such that p percentage of the ordered observation falls at or below it.
Quartiles of dataset partitions the data into four equal parts, with one-fourth of the data values in each part. The total
of 100% is divided into four equal parts: 25%, 50%, 75% & 100%. Since the median is
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
defined as the middlemost value in the observation, the median will have 50% of the observations at or below it.
Just like quartiles, we have deciles. While quartiles sort the data into four quarters, deciles sort the data into ten
equal parts: the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th,100th. The higher the place in the decile ranking,
the higher is the overall ranking.

Q. 16 Explain principles on ethics that one should follow while performing data analysis.
Ans. 1. Protect Your Customer Privacy does not always mean confidentiality because private data may need to be
audited based on the relevant requirements.
2. The private information that is shared should always be handled with confidentiality Third party companies share
sensitive data, either financial, location related or medical.
3. Customers should always have a clear view of how their data is getting used or traded and should have the
authority to manage the flow of their confidential information across enormous, third party systems.
4. Data should never interfere with human will: Data analytics can average out and at times, even discover who we
are even before we make up our mind.

Q. 17. Explain the concept of Recall Bias in Statistics.

Ans. Recall bias happens when people remember things differently, leading to errors in data. It’s common in studies
where participants are asked to recall past events or behaviors.

Q. 18 What is a two-way frequency table ? Explain its features with a suitable example.
Ans. A two-way table is a statistical table that demonstrates the observed number or frequency for two variables, the
rows indicate one category and the columns indicate the other category. Two-way frequency tables show how many
data points fit in each category.
The table has several features:
• Categories are in the left column and top row
• The counts are placed in the center of the table.
• The totals are at the end of each row and column.
• A sum of all counts (a total) is placed at the bottom right

Q. 19 Explain Central Limit Theorem. Give any two real world scenarios in which it is used.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution.
The Central Limit Theorem is a statistical theory stating that given a significantly large sample size from a population
with finite variance, the mean of all samples from same set of population will be roughly equal to the mean of the
population.
Some practical implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate. The results of news channels
that come with confidence intervals are all calculated using the Central Limit Theorem. 2. The Central Limit Theorem
can also be used to calculate the mean family income for a specific region.

Statistical Aspects of the Microbiological Examination of Foods Second Edition Basil Jarvis instant download
100% (1)
Statistical Aspects of the Microbiological Examination of Foods Second Edition Basil Jarvis instant download
71 pages
Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
CSBS - AD3491 - FDSA - IA 1 - Answer Key
100% (11)
CSBS - AD3491 - FDSA - IA 1 - Answer Key
14 pages
Statistic Matlab Example
No ratings yet
Statistic Matlab Example
7 pages
Paths, Path Products and Regular Expressions: UNIT-3
100% (3)
Paths, Path Products and Regular Expressions: UNIT-3
70 pages
419_DataSceince_MS
No ratings yet
419_DataSceince_MS
6 pages
std 10 Chap 4 Data Merging notes
No ratings yet
std 10 Chap 4 Data Merging notes
4 pages
fds-two-marks
No ratings yet
fds-two-marks
10 pages
CBSE Class 10 Data Science Question Paper 2023
No ratings yet
CBSE Class 10 Data Science Question Paper 2023
11 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
GFG DataScience Interview Questions
No ratings yet
GFG DataScience Interview Questions
64 pages
419_DataSceince_SQP 2023 - 2024-2-7
No ratings yet
419_DataSceince_SQP 2023 - 2024-2-7
6 pages
Copy of Computer Unit - 4
No ratings yet
Copy of Computer Unit - 4
28 pages
Classx DS Student Handbook
No ratings yet
Classx DS Student Handbook
60 pages
Foundations of Data Science Faq 5 Units
No ratings yet
Foundations of Data Science Faq 5 Units
13 pages
DS Assignment No 2
No ratings yet
DS Assignment No 2
21 pages
106 Data Science
No ratings yet
106 Data Science
11 pages
Updated Cs3352 - Foundations of Data Science - Duraimurugan
No ratings yet
Updated Cs3352 - Foundations of Data Science - Duraimurugan
16 pages
419_DataSceince_SQP
No ratings yet
419_DataSceince_SQP
7 pages
DS - Question paper
No ratings yet
DS - Question paper
3 pages
class 8
No ratings yet
class 8
5 pages
CS3552_FODS_QB 2024
No ratings yet
CS3552_FODS_QB 2024
11 pages
419 DataSceince MS
No ratings yet
419 DataSceince MS
6 pages
Word typed Stats Theory
No ratings yet
Word typed Stats Theory
3 pages
Use of Statistics in Data Science
No ratings yet
Use of Statistics in Data Science
11 pages
fds print
No ratings yet
fds print
7 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
UNIT 1
No ratings yet
UNIT 1
34 pages
Fdsa Unit 2
No ratings yet
Fdsa Unit 2
89 pages
Class Notes(4)
No ratings yet
Class Notes(4)
10 pages
Copy of Identifying Patterns
No ratings yet
Copy of Identifying Patterns
6 pages
Ai 10class Final Paper
No ratings yet
Ai 10class Final Paper
7 pages
Data Science Exam Material
No ratings yet
Data Science Exam Material
10 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
fds-2-marks (2)
No ratings yet
fds-2-marks (2)
13 pages
Data Science Interview Questions and Answer
100% (1)
Data Science Interview Questions and Answer
41 pages
AD3491 QB
No ratings yet
AD3491 QB
17 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
FDS IAT 1 QUESTION with answer
No ratings yet
FDS IAT 1 QUESTION with answer
6 pages
MMW
No ratings yet
MMW
3 pages
Data Science -Model Exam Question paper
No ratings yet
Data Science -Model Exam Question paper
2 pages
FDS - 2 SOLVED
No ratings yet
FDS - 2 SOLVED
14 pages
FDS Unit II Update
No ratings yet
FDS Unit II Update
84 pages
QM Questions September 24 by Permal Sajjad
No ratings yet
QM Questions September 24 by Permal Sajjad
27 pages
QM permal Sajjad assign solution
No ratings yet
QM permal Sajjad assign solution
41 pages
Basic Data Science Interview Questions
No ratings yet
Basic Data Science Interview Questions
18 pages
55 Questions
No ratings yet
55 Questions
17 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Interview Questions
No ratings yet
Interview Questions
225 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
No ratings yet
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
7 pages
Assignment - Data Analytics
No ratings yet
Assignment - Data Analytics
12 pages
II Cse Cs3352 Fds QB Unit2
No ratings yet
II Cse Cs3352 Fds QB Unit2
5 pages
FDS notes
No ratings yet
FDS notes
5 pages
FDS UNIT 3 QB
No ratings yet
FDS UNIT 3 QB
18 pages
2 Mark Material
No ratings yet
2 Mark Material
11 pages
Data - Mining 1 18 36
No ratings yet
Data - Mining 1 18 36
19 pages
Data Scientist Interview Questions and Answers PDF
No ratings yet
Data Scientist Interview Questions and Answers PDF
37 pages
Crack_Data_Science_Interview_�_1731300339
No ratings yet
Crack_Data_Science_Interview_�_1731300339
132 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Week 1
100% (1)
Week 1
25 pages
Assignment July-December 2014: Management Programme
No ratings yet
Assignment July-December 2014: Management Programme
14 pages
fds-2-marks
No ratings yet
fds-2-marks
14 pages
Biostatistics Unit 2
No ratings yet
Biostatistics Unit 2
20 pages
Handouts of BIO401 Lesson 1-22
No ratings yet
Handouts of BIO401 Lesson 1-22
270 pages
Errors in Observation
No ratings yet
Errors in Observation
41 pages
Statistics Exam
No ratings yet
Statistics Exam
9 pages
GJESM - Volume 6 - Issue Special Issue (Covid-19) - Pages 21-30
No ratings yet
GJESM - Volume 6 - Issue Special Issue (Covid-19) - Pages 21-30
10 pages
SB11 - Group 1
100% (1)
SB11 - Group 1
33 pages
Statistics and Probability Q3
No ratings yet
Statistics and Probability Q3
29 pages
PR222
No ratings yet
PR222
52 pages
SLeM 2 Math 10 Q1
No ratings yet
SLeM 2 Math 10 Q1
9 pages
Nasimi Effect of Capital Structure
No ratings yet
Nasimi Effect of Capital Structure
13 pages
Characterization of Variability in Highway Pavement Materials and Construction
No ratings yet
Characterization of Variability in Highway Pavement Materials and Construction
12 pages
NaRM Mid
No ratings yet
NaRM Mid
2 pages
MS3B_qb10_2_e
No ratings yet
MS3B_qb10_2_e
37 pages
Module1 - Tutorial Sheet
No ratings yet
Module1 - Tutorial Sheet
2 pages
Intrinsic Dispersivity of Randomly Packed Monodisperse Spheres
No ratings yet
Intrinsic Dispersivity of Randomly Packed Monodisperse Spheres
4 pages
Lesson 4 Measure of Central Tendency or Position Activity 67
No ratings yet
Lesson 4 Measure of Central Tendency or Position Activity 67
3 pages
Core08 M9
No ratings yet
Core08 M9
9 pages
Standard Deviation
No ratings yet
Standard Deviation
22 pages
CFC 2308 Maths Stats LR - Question Paper
No ratings yet
CFC 2308 Maths Stats LR - Question Paper
8 pages
Lesson Plan Template For Mat 3
No ratings yet
Lesson Plan Template For Mat 3
7 pages
Revision Notes For P2
100% (2)
Revision Notes For P2
118 pages
QUALITY C
No ratings yet
QUALITY C
11 pages
Managment Styles of Sport
No ratings yet
Managment Styles of Sport
8 pages
12 Statistical Tables (2 Pages)
No ratings yet
12 Statistical Tables (2 Pages)
6 pages
Normal Distribution
No ratings yet
Normal Distribution
29 pages
Chapter 13, Statistics
No ratings yet
Chapter 13, Statistics
35 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages

Data Science Important Questions

Uploaded by

Data Science Important Questions

Uploaded by

CBSE | DEPARTMENT OF SKILL EDUCATION

DATA SCIENCE (SUBJECT CODE - 419)

Q. 2 Explain uniform distribution with the help of an example.

Q. 3 What is survivorship bias? Give an example.

Q. 8 List 3 real-life applications of Standard Deviation.

Q. 10 Explain selection bias.

Q. 11 What is data merging?

Q, 12 What are various techniques of safely discarding digital confidential data?

Q. 13 Why is z-score important?

Q. 14 Give two practical applications of Central theorem.

Q. 15 Explain percentile, quartiles, and deciles, with examples.

Q. 17. Explain the concept of Recall Bias in Statistics.

You might also like