Unit 8 - Reading Task
Unit 8 - Reading Task
101
Unit 8: LAW
Faced with an unprecedented torrent of information, data scientists have turned to the visual arts to make
sense of big data. The results of this unlikely marriage— often called “data visualizations” or “infographics”—
have repeatedly provided us with new and insightful perspectives on the world around us.
However, time and time again we have seen that data visualizations can easily be manipulated to lie. By
misrepresenting, distorting, or faking the data they visualize, data scientists can twist public opinion to their
benefit and even profit at our expense. We have a natural tendency to trust images more than text. As a result,
we’re easily fooled by data visualizations. Fortunately, there are three easy steps we can follow to save ourselves
from getting duped in the data deluge.
Check the data presentation
The subtlest way a data visualization can fool you is by using visual cues to make data stand out that normally
wouldn’t. Be on the lookout for these visual tricks.
1. Color cues: Color is one popular tool for making certain data
more prominent than the rest. When considering the map below, Kentucky
and Utah (the darkest and the lightest) will most likely stand out to us first.
If the map in Figure 1 were showing percentage of the population that
smokes (where dark colors indicate more smokers and light colors fewer
smokers), we might quickly conclude that Kentucky has a serious
smoking problem. But what if we looked at the raw numbers and saw that
27% of Kentuckians and 23% of Utahans smoke? Now, there’s not so big of a difference after all. Make sure to
look at what the colors actually represent before drawing a conclusion from the visualization.
2. Structural cues: Structure is another popular tool for making data immediately stand out. In the bar charts
in Figure 2, we’re looking at the same data, but with different scales on the y-axes. Notice how such a simple
structural change can make differences in the data look
much more significant. Is an increase of 15 fraudulent
visualizations from last year really “skyrocketing”?
Don’t let the structure of the visualization decide that for
you. Always check the numbers that the visualization is
representing.
Check the data source
Make sure the data source is reliable. Data collected by an amateur is more error-prone than data collected by
a professional scientist. Do a quick Web search to see if the people who collected and organized the data have a
good track record of collecting and distributing data. You should also make sure the data source isn’t biased. A
drug company may be inclined to present fake data showing that their latest drug is more effective than it really
is, or a political campaign may manipulate data to discredit their political opponents. Think twice when
considering data provided by biased groups.
Generally, we can trust data provided by government organizations, university research centers, and non-
partisan organizations. However, we should look more closely at data provided by for-profit companies, political
organizations, and advocacy groups. If the data source isn’t listed, take the data visualization with many grains of
salt.
Check the data alterations
Many data sets require a little bit of housecleaning before they can be visualized, but excessive editing can be
a sign of misrepresented data. Every good data visualization will come with explanations describing how the data
was manipulated from its raw form into the visualization you see. Read the explanations and watch out for the
following data alterations.
1. Excluded data: Ensure that the explanations for excluding that data are reasonable. Sometimes the
“explanation" may be that the data inconveniently contrasted with the story the author wanted to tell.
2. Transformed data: Data transformation, the process of converting data from one format to another format,
can complicate the relationships between data. It’s difficult to interpret a finding such as “The log transform of a
city’s productivity is related to the log transform of the city’s population." See how that doesn’t make any sense
to us in practical terms? While a transformation can make complicated mathematics accessible, it can also
potentially be misleading. Be wary if several transformations have been applied to the data.
3. Statistics: Statistics are an often-abused tool in data science. “Fatal shark attacks have risen 100% this
year" sounds like an alarming statistic until you realize that only one person was fatally attacked by a shark last
year. Check the raw numbers when data visualizations present only the
statistics.
Comparing statistics is even trickier. If a survey shows that 50% of Latinos
and only 30% of Caucasians enjoy watching baseball, those results could easily
have been purely due to chance if the survey interviewed only 20 people of each
ethnicity (Figure 3).
If the visualization doesn’t indicate the researchers’ confidence in the
comparison (called statistical significance), then we shouldn’t be confident in
their comparisons. If the details on the data alterations aren’t provided with the
visualization, always keep in mind how easy it is to make data lie when it’s
visualized.
Remember: To save yourself from getting tricked by deceitful data, check
the presentation, data source, and alterations.
A. Complete the sentences with the correct form of the given words/ phrases
campaign (n.) - manipulate (v.) - scale (n.) - transformation (n.) - distort (v.) - misleading (adj.) - skyrocket (v.) -
unprecedented (adj.) - error-prone (adj.) - prominent (adj.) - take … with a grain of salt (phr.) - visualize (v.)
distort
1. A graph must not ____________ important data by making differences look greater than they actually are.
2. The photograph was in a(n) prominent
_____________ position on the home page of the website.
3. The ___________
scale of the map made the distance look short, but it took all day to drive between the two cities.
campaignwas successful; a week’s worth of ads increased sales by 20 percent.
4. Our last advertising ______________
visualize the data described in the text.
5. A good graph helps readers to _____________
6. The number of complaints last year skyrocketed
____________ as readers became more aware of data misrepresentation.
7. _______________
Take that email promising large cash rewards _______________
with a grain of salt
manipulated
8. The picture has been _________________ to hide cracks.
error-proneprocess.
9. Analyzing data can often be a complex and _____________
misleading because it does not state the number of people interviewed.
10. The chart is ______________
11. After so many transformations
_______________, it was impossible to retrieve the raw data.
12. The newspaper took the _______________ decision to publish an article explaining its policy for creating
infographics. unprecedented
(the paragraphs haven't been numbered yet).
B. Where can you find the information below? Write the paragraph number. (Color cues...
1. Colors are commonly used in data representations to deliberately deceit readers. Paragraph: visualization)
___
2. Infographics help us visualize large amounts of information. Paragraph: 1(Faced
___ with... around us)
3. Some sources of data are generally trusted to present data accurately. Paragraph: ___
from "Check the data source....
grains of salt."
4. It is important to know how the information in an infographic has been changed from the raw numbers.
Paragraph: ___ from "Statistics .....present only the statistics"
5. Pharmaceutical companies may produce infographics that exaggerate the benefits of drugs they make.
Paragraph: ___ "Make sure the data source is reliable......by biased group"
6. It is easy to be manipulated by infographics because most people believe what they see more than what they
from "However, time and time again.... in the data deluge."
read. Paragraph: ___
C. Which pieces of advice are recommended in the article? Choose the correct answers
1. Make sure the scientists who collected the data are experienced professionals.
2. Do not trust infographics that use color to show differences.
3. Read the scale on the vertical axis (or y-axis) carefully.
4. You should be highly suspicious of infographics that do not indicate the source of the data.
5. It is important to check the actual numbers when reading a graph that displays percentages.
6. You should look carefully for the date when the information was collected, as infographics sometimes
present old information.