Data Science Important Questions
Data Science Important Questions
Ans.
Q. 4 Confidential data is maintained in the form of digital as well as physical copies. Suggest two ways in which we can
discard the physical copies of confidential data.
Ans. We can safely discard the data in one ofthe following ways.
Shredding the Documents
Burning the Documents
Cutting up the Documents
Q. 5 Explain the term ‘subset’ and the different ways of subsetting the data.
Ans. For data analysis, we do not need the entire data for consideration. Therefore, instead of working with the whole
data set, we can take a certain part of the data for our analysis. This division of a small set of data from a large set of
data is known as a Subset.
Different ways of subsetting data are:
CBSE | DEPARTMENT OF SKILL EDUCATION
DATA SCIENCE (SUBJECT CODE - 419)
Row-based subset: some rows from the top or bottom of the table are taken into consideration for Row-based
subsetting.
Column-based subset: specific columns from the dataset are taken into consideration for column-based subsetting.
Data-specific subset: Only specific data is taken into consideration for Data specific subset.
Q. 6 Manya, a student of class 10 was learning the topic “Statistical Problem Solving Process”. She did not understand
the concept and had trouble understanding it. Help her by explaining the process with a good example.
Ans. The purpose of the Statistical Problem Solving Processis to collect and analyze data to answer the statistical
investigative questions. Thisinvestigative process involves four components, each of which involves exploring and
addressing variability:
1.Formulate Statistical Investigative
2. Collect/Consider the Data
3.Analyze the Data
4. Interpret the Data
Q. 7 “Given a significantly large sample size from a population with finite variance, the mean of all samples from same
set of population will be roughly equal to the mean of the population”. What is this statement about? Explain it with
the help of an example.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution. The Central Limit Theorem is a statistical
theory stating that given a significantly large sample size from a population with finite variance, the mean of all
samples from the same set of populations will be roughly equal to the mean of the population.
Q. 8 We can perform data merging by implementing data joins on the databases in frame. How many types of joins
are there? Explain.
Ans. We can perform data merging by implementing data joins on the databases in frame. There are three categories
of data joins:
One to One Joins: One to one join is probably one of the simplest join techniques. In this type of join, each row in one
table is linked to a single row in another table using a “key” column.
One to Many Joins In a one to many join, one record in a table can be related to one or many records in another
table.
Many to Many Joins A many to many relationships is said to occur when multiple records in one table are related to
multiple records of other table.
Q. 16 Explain principles on ethics that one should follow while performing data analysis.
Ans. 1. Protect Your Customer Privacy does not always mean confidentiality because private data may need to be
audited based on the relevant requirements.
2. The private information that is shared should always be handled with confidentiality Third party companies share
sensitive data, either financial, location related or medical.
3. Customers should always have a clear view of how their data is getting used or traded and should have the
authority to manage the flow of their confidential information across enormous, third party systems.
4. Data should never interfere with human will: Data analytics can average out and at times, even discover who we
are even before we make up our mind.
Q. 18 What is a two-way frequency table ? Explain its features with a suitable example.
Ans. A two-way table is a statistical table that demonstrates the observed number or frequency for two variables, the
rows indicate one category and the columns indicate the other category. Two-way frequency tables show how many
data points fit in each category.
The table has several features:
• Categories are in the left column and top row
• The counts are placed in the center of the table.
• The totals are at the end of each row and column.
• A sum of all counts (a total) is placed at the bottom right
Q. 19 Explain Central Limit Theorem. Give any two real world scenarios in which it is used.
Ans. The Central Limit Theorem states that distribution of sample approaches a normal distribution as the sample size
gets larger irrespective of what is the shape of the population distribution.
The Central Limit Theorem is a statistical theory stating that given a significantly large sample size from a population
with finite variance, the mean of all samples from same set of population will be roughly equal to the mean of the
population.
Some practical implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate. The results of news channels
that come with confidence intervals are all calculated using the Central Limit Theorem. 2. The Central Limit Theorem
can also be used to calculate the mean family income for a specific region.