0% found this document useful (0 votes)
9 views

910b7

The document consists of multiple worksheets covering topics related to data, information, data collection, visualization, and ethics in data science. It includes questions and answers on concepts such as data footprints, types of data, data recovery, data visualization techniques, and ethical considerations in handling confidential data. The content is structured into chapters that address the definitions, applications, and implications of data in various contexts.

Uploaded by

subhangi1510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

910b7

The document consists of multiple worksheets covering topics related to data, information, data collection, visualization, and ethics in data science. It includes questions and answers on concepts such as data footprints, types of data, data recovery, data visualization techniques, and ethical considerations in handling confidential data. The content is structured into chapters that address the definitions, applications, and implications of data in various contexts.

Uploaded by

subhangi1510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter-1

INTRODUCTION
Worksheet -1

1) Data and information are the same


a) Yes b) No

2) Social media platforms are responsible for creating data footprints


a) Yes b) No

3) There is no risk of losing data


a) Yes b) No

4) Websites and mobile apps use our search history to provide personalized offers
a) Yes b) No

5) Which of the following is not in DIKW Model?


a) Data b) Information c) Security d) Knowledge

6) Should you keep a data recovery plan?


a) Yes b) No

7) What is your data footprint?


a) The data trail left by you when you surf the internet b) The time you spend on your computer
c) The number of electronics you buy in a year d) The number of apps you have on your Mobile

8) How long is your data footprint visible?


a) It depends on the websites you visit b) The data footprint wipes clean after a year
c) It creates a permanent record d) The record expires after a month

9) Who can use or see data from your data footprint?


a) It is visible to professionals, but they need special access to go through the data
b) No one can access data from your digital footprint
c) Only the police have access to the information on your data footprint
d) Your data footprint is potentially visible to anyone

10) You regret posting a particular picture and want to take it down. Is it possible, and how would you
do that?
a) It is a little tricky but can be done by asking a professional to do it. Then no one can see the photo.
b) You can delete the picture by clicking on the delete button. Then no one can see the photo anymore
c) Only the police can delete a picture uploaded by you
d) A photo can be deleted from your account, but someone might have already saved it or copied it.

11) How can you improve your data footprint?


a) It is best not to post anything if you want to stay safe
b) It is not necessary to improve your data footprint
c) Check your social media accounts' privacy settings to make sure you share your posts with people you
trust and know
d) Share your personal details with a good friend or family member so they can help you stay safe online
Chapter-1

INTRODUCTION
Worksheet -2

1. Data can be defined as facts or information which when stored, can be used as a
basis for decision making, calculation, or discussion.
2. Processed, managed, and structured data is called Information.
3. Information is a collection of data that has a logical sense.
4. Data after transformation to information can also be converted to knowledge and
wisdom. This is called the DIKW model.
5. When we use the internet, we send and receive data through the internet. With
all the activities we do on the internet, we create trails of data. These trails of data
are called data footprints.
6. Data footprints can be classified into two categories. – Active and Passive

7. We regularly use several social media platforms and post images or content which
are stored on the media. This is a form of Active data footprint as we have
knowingly shared information about ourselves.
8. Our browsing history, product searches may be stored by search engines.
Organizations use these records for personalized marketing. This is an example
of a passive data footprint.
9. The process of restoring inaccessible, lost, corrupted, damaged, or deleted data is
called data recovery.
10. Some of the reasons for Data Loss are – System Failure (Power failure, Hardware
failure, System Crash), Disaster (Natural disaster, Fire), Crime (Theft, Hacking,
Computer Virus, Ransomware etc), Unintentional actions (Accidental deletion of
files, loss of pen drive or laptop) and Intentional actions (Deletion of files and
programs intentionally)
11. Healthcare, Education, Travel, Online shopping and Online shows are some of the
ways data influences our daily lives.
Chapter-2

Arranging and Collecting Data


Worksheet-1

1. A school named ABC has recorded the total marks of every student in the class. This an example of:

a. Qualitative data b. Quantitative data c. Both qualitative and quantitative data d. None of the above

2. A food delivery app has asked for your feedback on the quality of the food. You have written two paragraphs
to describe the food. This is an example of:

a. Qualitative data b. Quantitative data c. Both qualitative and quantitative data d. None of the above

3. You need to predict what the temperature will be for next Friday. Which algorithm will you use?

a) Clustering b) Regression c) Anomaly detection d) Binary classification

4. You need to predict if your car tyre will last for the next 1000 km. Which algorithm will you use?

a) Clustering b) Regression c) Anomaly detection d) Binary classification

5. Which of the following are the benefits of Big data processing?

a) Business can utilize outside intelligence while making decisions b) Improved customer service
c) Better optimal efficiency d) All of the above

6. The analysis of large amounts of data to see what patterns or other useful information can be found is
known as

a) Data Analysis b) Information Analytics c) Big data Analytics d) Data Analytics

7. Big data analysis does the following except

a) Collects data b) Spreads data c) Organizes data d) Analyzes data

8. Primary data for the research process be collected through

a) Experiment b) Survey c) Both a and b d) None of the above

9. The advantage of secondary data are low cost, speed, availability, and flexibility

a) True b) False

10. The method of getting primary data by watch people is called

a) Survey b) Informative c) Observational d) Experimental


Chapter-2

Arranging and Collecting Data


Worksheet-2

1. Data Collection is defined as the procedure of collecting data for measuring and analyzing accurate

insights using standard validated techniques.

2. There are two types of variables Numerical and Categorical.

3. Data can be divided into two categories, Quantitative and Qualitative

4. Quantitative data are numbers or values that can be measured.

5. Data sources can be classified into Primary and Secondary sources.

6. At times data is already recorded for some other purpose but then re-used for analysis. These are

Secondary data sources.

7. Online surveys, interviews, feedback forms are some methods of collecting Primary data.

8. Web traffic tracking, Satellite data tracking are some methods of collecting Secondary data.

9. When data volume increases certain limits and specialized systems are required to manage the data,

then it is called Big Data.

10. systems capable of extracting statistical insights from a huge amount of data are called Big Data

Systems.

11. Volume, Variety, and Velocity are some of the key characteristics that can define Big Data

12. Binary classification, regression, anomaly detection, clustering are some of the algorithms used to

interpret the data.

13. Univariate data has a single variable.

14. Multivariate data has relationship with multiple parameters.

15. Big Data techniques are widely used in different sectors. Some of them are Health Care, Retail,

Science, Sports, Social Media etc.


16. Based on the type of data, we need to ask five simple questions to the data for interpretation:

a. Is this A or B? – Binary Classification

Example – Q) Will India win this match

b. Is this odd? – Anomaly Detection

Example – Q) You are checking your car tyre pressure. Is the reading regular?

c. How much and how many? – Regression Algorithm

How many goals will your favorite team score in this football match?

d. Can I group the data? – Clustering

Example - Consider a class of 60 students, students can be categorized into groups based on

their height.

e. What should I do know? -- Reinforcement learning

Example - I am a self-driving car. I am at a traffic signal with a red light. What should I do now?
Chapter - 3

Data Visualization
Worksheet-1
1. Data can be visualized using:
a. Graphs
b. Maps
c. Charts
d. All of the above

Answer: d
2. Which of the following statements is false?
a. Data visualization can absorb information quickly.
b. Data visualization decreases the insights and takes slower decisions.
c. Data visualization is a type of visual art.
d. None of the above

Answer: b

3. Which of the following is a use case of data visualization?


a. Healthcare
b. Sales and Marketing
c. Politics/Campaigning
d. All of the above

Answer: d
4. Bar Graph is a
a. One-dimensional graph
b. Two-dimensional graph
c. Graph with no dimension
d. None of the above

Answer: a

5. The data represented through a histogram can help in finding graphically the
a. Median
b. Mean
c. Mode
d. All of the above

Answer: c
6. Pie Chart is a
a. One-dimensional graph
b. Two-dimensional graph
c. Graph with no dimension
d. None of the above

Answer: a

7. Can a Line chart be used to plot multiple variables?


a. True
b. False

Answer: a

8. The height of your classmates is recorded and arranged in ascending order.


The data is represented as a histogram. What type of shape does the
histogram have
a. Right-skewed Distribution
b. Left-skewed Distribution
c. Bimodal Distribution
d. Random

Distribution Answer: b
Chapter - 3

Data Visualization
Worksheet-2

1. Data visualization is the mechanism of representing raw data in the form of graphical
representations that allow users to explore the data and uncover quick insights.

2. Representing data through visualizations like graphs, charts, maps, etc., gives us a
visual context of the data.

3. Data visualization makes complex data simple and enables the human mind to
understand its significance.

4. Visualizations allow us to recognize trends, patterns, and outliers from seemingly


meaningless records of data.

5. Data visualization techniques use visual data in a universal, fast, and powerful way to
communicate information.

6. A dot plot is a graphical display of data using dots.

7. On the Chart Elements section, we may provide the title, subtitle, name of the x-axis
& name of the y axis of the chart.

8. A bar graph is a graphical display of data using bars of different heights. It is possible to
plots the bars vertically or horizontally.

9. A vertical bar graph is called a column chart or graph.

10. The minimum is the smallest value in the data set. The maximum is the largest value
in the data set.

11. The frequency of a data value is the number of times the data value occurs/repeats.

12. A histogram is a graphical illustration of frequency plotted against intervals.

13. Bin widths are the range size.

14. Data points in a normal distribution are as likely to occur on one side of the average
as on the other side of the average.

15. A right-skewed distribution occurs when the data has a range boundary on the left-
hand side of the histogram.
16. A right-skewed distribution is also known as a positively skewed distribution

17. A left-skewed distribution usually occurs when the data has a range boundary on the
histogram's right- hand side.

18. A left-skewed distribution is also known as a negatively skewed distribution.

19. A bimodal distribution has two peaks. In a bimodal distribution, the data should be
separated and analyzed as separate normal distributions.

20. A random distribution lacks an apparent pattern and has several peaks.

21. Multi-variable plots are used to display relationship among several variables

22. Write few real-life uses of data visualizations.

a) Tracking student progress with scorecards


b) Identifying usage trend of a website
c) Monitoring goals and results of a sales executive
d) Visualizing spread and impact of pandemics

23. Understand the different shapes of a histogram and name the type of distribution.

Normal Distribution Right Skewed Distribution Left Skewed Distribution

Bimodel Distribution Random Distribution


Chapter-4
Ethics in Data Science
Worksheet-1

Please choose the correct option in the questions below.

1) Which of the following is not one of the principles in data governance


framework?
a) Protect your customer
b) Data should never institutionalize unfair biases
c) Never collect confidential data from users

2) The private information that is shared should always be handled with


confidentiality
a) True
b) False

3) If you are done with using the confidential data collected from users, you should:
a) Safely store it. We may need it in future for some analysis or reports
b) Effectively destroy it in a way that it is unreadable

4) Confidential data can be stored in which of the following format?


a) Digital Data
b) Physical Copies
c) Both

5) Data should never institutionalize unfair biases


a) True
b) False

6) Digital confidential data should be discarded by


a) Formatting the drive in which data was stored
b) Temporarily deleting the data

7) Which of the following is not the appropriate way of discarding the confidential
data?
a) Shredding the data
b) Cutting the files which contain confidential data
c) Burning the confidential data
d) Crumbling the papers which contain confidential data and throwing it in the
dustbin
Chapter-4
Ethics in Data Science
Worksheet -2

1.The private data acquired from a person with their consent should never be
exposed for use by different businesses or individuals.

2. The private information that is shared should always be handled with


confidentiality

3. Third party companies should always have restrictions on if and how that
information is allowed to be passed forward.

4, Customers should always have a clear view of how their data is getting
used or traded and should have the authority to manage the flow of their
confidential information across enormous, third-party systems.

5. Data should never interfere with human will

6. Data should never institutionalize unfair biases like sexism or racism.

7. Once we are done with the user data, especially confidential data, it is
important that we discard this data in appropriate way to make sure that it
is not accessed by any unauthorized person and it is not misused in anyway.

8. There are two ways in which you may have stored the data – in the digital
format or as a physical copy.

9. Discard the digital data in a proper way to prevent unauthorized access to


the data.

10. Shredding of the documents (Physical copy) which contain confidential


data is an effective way of discarding the data.

11. Do not use confidential customer data for business purposes without
consent.

12. Be transparent with customers on how their data is used.

13. Every confidential data that you possess should be appropriately


discarded.
Worksheet

Computer Virus - A computer virus is a type of malicious software, or malware, that infects
computers and corrupts their data and software.

Ransomware - Ransomware is a type of malware designed to extort money from its victims,
who are blocked or prevented from accessing data on their systems.

Excel

Excel is a spreadsheet program from Microsoft and a component of its Office product group
for business applications.

Cell - A cell is the intersection of a row and a column—in other words, where a row and
column meet. Every cell is identified by its cell address, cell address contains its column number
and row number (If a cell is on the 7th row and on column B, then its address will be B7)

Active cell - The selected cell in which data is entered when you begin typing. Only one cell is
active at a time. The active cell is bounded by a heavy border.

Cell reference - The set of coordinates that a cell occupies on a worksheet. For example, the
reference of the cell that appears at the intersection of column B and row 3 is B3.

Active sheet - The sheet that you're working on in a workbook. The name on the tab of the
active sheet is bold.

Row - In Microsoft Excel, a row runs horizontally across a worksheet's grid structure. Horizontal
rows use Numeric Values such as 1, 2, 3 and 4 as labels.

Column - In Microsoft Excel, a column runs vertically across a worksheet's grid structure.
Vertical columns use letters such as A, B, C and D as labels.

Fill Handle - Fill Handle is a tool that auto-fills the rows/columns following the values pattern of
the selected cells and creates a list of series. The small black square in the lower-right corner of
the selection. When you point to the fill handle, the pointer changes to a black cross.

Address Bar – It shows the address of the active cell. If you have selected more than one cell,
then it will show the address of the first cell in the range.

Formula Bar – The formula bar is an input bar, below the ribbon. It shows the content of the
active cell, and you can also use it to enter a formula in a cell.

Formula - A sequence of values, cell references, names, functions, or operators in a cell that
together produce a new value. A formula always begins with an equal sign (=).

Function - A prewritten formula that takes a value or values, performs an operation, and
returns a value or values. Use functions to simplify and shorten formulas on a worksheet,
especially those that perform lengthy or complex calculations.

You might also like