0% found this document useful (0 votes)
41 views

Data Science Important Questions

The document contains a series of questions and answers related to data science concepts, including statistical methods, data types, and ethical considerations in data analysis. Key topics include uniform distribution, survivorship bias, data merging, and the Central Limit Theorem, along with practical applications and examples. It serves as an educational resource for students studying data science under the CBSE curriculum.

Uploaded by

khanrihan77703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Data Science Important Questions

The document contains a series of questions and answers related to data science concepts, including statistical methods, data types, and ethical considerations in data analysis. Key topics include uniform distribution, survivorship bias, data merging, and the Central Limit Theorem, along with practical applications and examples. It serves as an educational resource for students studying data science under the CBSE curriculum.

Uploaded by

khanrihan77703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

CBSE | DEPARTMENT OF SKILL EDUCATION

DATA SCIENCE (SUBJECT CODE - 419)


Q. 1 A survey of 20 people (10 kids and 10 adults) was taken on the type of food they like to snack. The following
responses were recorded:
● 7 kids liked pizza while the other kids preferred burger
● 5 adults liked burger while the other adults preferred pizza
Based on the information given above, build a two-way frequency table.

Ans.

Q. 2 Explain uniform distribution with the help of an example.


Ans. Every probability distribution is associated with a graph that describes the likelihood of the occurrence of each
event. This type of distribution is called a Uniform Distribution. Example: Rolling a Dice, Tossing a coin, etc.

Q. 3 What is survivorship bias? Give an example.


ANs. The survivorship bias is based on the concept that we usually tend to twist the data sets by focusing on
successful examples and ignoring the failures. Thistype of bias also occurs when we are looking at the competitors.
Example: A hospital is conducting research on trauma patients admitted to the ER, seeking to find out which
procedures work best. However, researchers can only begin their studies if a patient is stable enough to give consent.

Q. 4 Confidential data is maintained in the form of digital as well as physical copies. Suggest two ways in which we can
discard the physical copies of confidential data.
Ans. We can safely discard the data in one ofthe following ways.
 Shredding the Documents
 Burning the Documents
 Cutting up the Documents

Q. 5 Explain the term ‘subset’ and the different ways of subsetting the data.
Ans. For data analysis, we do not need the entire data for consideration. Therefore, instead of working with the whole
data set, we can take a certain part of the data for our analysis. This division of a small set of data from a large set of
data is known as a Subset.
Different ways of subsetting data are:
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
Row-based subset: some rows from the top or bottom of the table are taken into consideration for Row-based
subsetting.
Column-based subset: specific columns from the dataset are taken into consideration for column-based subsetting.
Data-specific subset: Only specific data is taken into consideration for Data specific subset.

Q. 6 Manya, a student of class 10 was learning the topic “Statistical Problem Solving Process”. She did not understand
the concept and had trouble understanding it. Help her by explaining the process with a good example.
Ans. The purpose of the Statistical Problem Solving Processis to collect and analyze data to answer the statistical
investigative questions. Thisinvestigative process involves four components, each of which involves exploring and
addressing variability:
1.Formulate Statistical Investigative
2. Collect/Consider the Data
3.Analyze the Data
4. Interpret the Data

Q. 7 “Given a significantly large sample size from a population with finite variance, the mean of all samples from same
set of population will be roughly equal to the mean of the population”. What is this statement about? Explain it with
the help of an example.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution. The Central Limit Theorem is a statistical
theory stating that given a significantly large sample size from a population with finite variance, the mean of all
samples from the same set of populations will be roughly equal to the mean of the population.

Q. 8 We can perform data merging by implementing data joins on the databases in frame. How many types of joins
are there? Explain.
Ans. We can perform data merging by implementing data joins on the databases in frame. There are three categories
of data joins:
One to One Joins: One to one join is probably one of the simplest join techniques. In this type of join, each row in one
table is linked to a single row in another table using a “key” column.
One to Many Joins In a one to many join, one record in a table can be related to one or many records in another
table.
Many to Many Joins A many to many relationships is said to occur when multiple records in one table are related to
multiple records of other table.

Q. 8 List 3 real-life applications of Standard Deviation.


Ans. 1. Grading tests 2. Evaluating a survey 3. Weather forecasting
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
Q. 9 What is discrete and continuous data?
Ans. Discrete Data is the data that takes only specified values. For example, if you give a test, you can either pass or
fail. So, data is discrete in this case as it has only two specified outcomes.
Continuous Data is the data that can take any value within a given range. This range can be either finite or infinite. For
example, depth of an ocean, weight of a person or length of a road.

Q. 10 Explain selection bias.


Ans. Selection bias in data science refers to the systematic error that occurs when the sample data used for analysis is
not representative of the overall population or the intended target group. This can lead to incorrect conclusions,
distorted models, and poor generalization to unseen data. Selection bias can occur at various stages of the data
lifecycle, including data collection, processing, and analysis.

Q. 11 What is data merging?


Ans. In Data Science, data merging is the process of combining two or more data sets into a single data frame. This
process is necessary when we have raw data stored in multiple files or data tables, that we want to analyze all in one
go.

Q, 12 What are various techniques of safely discarding digital confidential data?


Ans. 1. Once you are done with the job and you no longer need the user data, you can go ahead and clean out the
data from the memory.
2. Even while storing the data in your device, you can encrypt the data to make sure that even in the case of a data
leak, hackers are not able to read your data.
3. You can also format the computer drive/hardisk where the client confidential data was stored for a clean discarding.

Q. 13 Why is z-score important?


Ans. . 1. It gives us an opportunity to calculate the probability of a value occurring within a normal distribution.
2. Z-score allows us to compare two values that are from the different samples.

Q. 14 Give two practical applications of Central theorem.


Ans. Some practical implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate.
2. The Central Limit Theorem can also be used to calculate the mean family income for a specific region.

Q. 15 Explain percentile, quartiles, and deciles, with examples.


Ans. A percentile can be defined as the percentage of the total ordered observations at or below it. Therefore, pth
percentile of a distribution is the value such that p percentage of the ordered observation falls at or below it.
Quartiles of dataset partitions the data into four equal parts, with one-fourth of the data values in each part. The total
of 100% is divided into four equal parts: 25%, 50%, 75% & 100%. Since the median is
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
defined as the middlemost value in the observation, the median will have 50% of the observations at or below it.
Just like quartiles, we have deciles. While quartiles sort the data into four quarters, deciles sort the data into ten
equal parts: the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th,100th. The higher the place in the decile ranking,
the higher is the overall ranking.

Q. 16 Explain principles on ethics that one should follow while performing data analysis.
Ans. 1. Protect Your Customer Privacy does not always mean confidentiality because private data may need to be
audited based on the relevant requirements.
2. The private information that is shared should always be handled with confidentiality Third party companies share
sensitive data, either financial, location related or medical.
3. Customers should always have a clear view of how their data is getting used or traded and should have the
authority to manage the flow of their confidential information across enormous, third party systems.
4. Data should never interfere with human will: Data analytics can average out and at times, even discover who we
are even before we make up our mind.

Q. 17. Explain the concept of Recall Bias in Statistics.


Ans. Recall bias happens when people remember things differently, leading to errors in data. It’s common in studies
where participants are asked to recall past events or behaviors.

Q. 18 What is a two-way frequency table ? Explain its features with a suitable example.
Ans. A two-way table is a statistical table that demonstrates the observed number or frequency for two variables, the
rows indicate one category and the columns indicate the other category. Two-way frequency tables show how many
data points fit in each category.
The table has several features:
• Categories are in the left column and top row
• The counts are placed in the center of the table.
• The totals are at the end of each row and column.
• A sum of all counts (a total) is placed at the bottom right

Q. 19 Explain Central Limit Theorem. Give any two real world scenarios in which it is used.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution.
The Central Limit Theorem is a statistical theory stating that given a significantly large sample size from a population
with finite variance, the mean of all samples from same set of population will be roughly equal to the mean of the
population.
Some practical implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate. The results of news channels
that come with confidence intervals are all calculated using the Central Limit Theorem. 2. The Central Limit Theorem
can also be used to calculate the mean family income for a specific region.

You might also like