0% found this document useful (0 votes)
23 views

Stats What Question 2015

Uploaded by

jamilkhann
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Stats What Question 2015

Uploaded by

jamilkhann
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

INSIGHTS | P E R S P E C T I V E S

Many of these lines were found to be of (10). The scrambling of the five protons with STATISTICS
other carbocations, such as CH3+, C2H3+, spin 1/2 produces a total nuclear spin angu-
C2H2+, and CH2+ [for a review, see (8)]. Each
time I proudly reported these discoveries,
Olah responded, “impressive, but what
lar momentum I according to the formula
[D1/2]5 = D5/2 + 4D3/2 + 5D1/2 What is the
about CH5+?” After “weeding out” those
thousands of understood spectral lines, the
This formula means that each of the lev-
els of CH5+ have I = 5/2 (A1), 3/2 (G1), or 1/2 question?
remaining messy spectrum was undeci- (H1), and the number of levels are in the
pherable, and the 900 lines of CH5+ were re- ratio of 1:4:5 and the CoDiffs 1:16:25. Each Mistaking the type of
ported without assignment (1). Even purely
empirical attempts at finding some regular-
level also has a definite parity of + or – and
their numbers are equal. The CoDiffs re-
question being considered
ity of the spectrum were not successful. ported by Asvany et al. are for levels with is the most common error
Asvany et al. have been able to determine
the energy separation between several pairs
I = 3/2 and 1/2 and the same parity. The
next step will be to find CoDiffs with dif-
in data analysis
of lowest levels using the action spectros- ferent parity, which the authors note could
copy invented by Schlemmer and Gerlich be tackled by applying their method for far- By Jeffery T. Leek and Roger D. Peng
(9). The proton affinity of CH4 (5.72 eV) is infrared spectroscopy.

O
slightly greater than that of CO2 (5.68 eV) so The results by Asvany et al. put the ex- ver the past 2 years, increased focus
the reaction CH5+ + CO2 → CH4 + CO2H+ is periment far ahead of the theory. To date, on statistical analysis brought on by
endothermic. Addition of a resonant 3.3-µm Wang and Carrington’s computation (11), the era of big data has pushed the

Downloaded from http://science.sciencemag.org/ on October 17, 2017


(0.37 eV) laser photon makes this reaction based on the potential energy surface (PES) issue of reproducibility out of the
exothermic. Thus, they can do spectroscopy of Jin et al. (12), seems to be the only fron- pages of academic journals and into
by counting CO2H+ ions rather than pho- tal attack to this problem, but it does not the popular consciousness (1). Just
tons. This paradigm shift from photon- to include rotation. A brute-force variational weeks ago, a paper about the relationship
calculation of the five protons with an ac- between tissue-specific cancer incidence
curate PES may be the way to solve this and stem cell divisions (2) was widely misre-
problem. Such treatment has been success- ported because of misunderstandings about
“As in Olah’s chemistry on ful for H3+, but the formalism and compu- the primary statistical argument in the pa-
tation will be much more demanding for a per (3). Public pressure has contributed to
Earth, CH5+ is pivotal for five-proton system. the massive recent adoption of reproducible
producing hydrocarbons As in Olah’s chemistry on Earth, CH5+ research tools, with corresponding improve-
is pivotal for producing hydrocarbons in ments in reproducibility. But an analysis
in space… I anticipate that space. The lines list by Asvany et al. suf- can be fully reproducible and still be wrong.
this enfant terrible will be fices for detecting interstellar CH5+, but we Even the most spectacularly irreproducible
badly need strongest lines for I = 5/2 (A1). analyses—like those underlying the ongoing
caught in interstellar space The classical CH3+ ion is yet to be detected, lawsuits (4) over failed genomic signatures
far ahead of its theoretical so detection of the nonclassical CH5+ will for chemotherapy assignment (5)—are ulti-
be difficult but worth a try. Once the far- mately reproducible (6). Once an analysis is
understanding…” infrared transitions are observed, including reproducible, the key question we want to
I = 5/2 levels, more sensitive observational answer is, “Is this data analysis correct?” We
ion-counting spectroscopy has increased techniques can be used. I anticipate that have found that the most frequent failure in
the sensitivity—instead of needing 1013 ions, this enfant terrible will be caught in inter- data analysis is mistaking the type of ques-
103 CH5+ suffice. Also, trapped ions can be stellar space far ahead of its theoretical un- tion being considered.
cooled to cryogenic temperature, which derstanding, which will take at least a few Any specific data analysis can be broadly
leads to a 100 times increase in accuracy. more decades. ■ classified into one of six types (see the fig-
The 2897 lines observed by Asvany et al. at ure). The least challenging of these is a
REFERENCES
10 K demonstrate the complexity of the CH5+ descriptive data analysis, which seeks to
1. E. T. White, J. Tang, T. Oka, Science 284, 135 (1999).
spectrum. In contrast, CH4 at 10 K has only 2. P. R. Schreiner, S.-J. Kim, H. F. Schaefer, III, P. von Ragué summarize the measurements in a single
four rotational levels with quantum num- Schleyer, J. Chem. Phys. 99, 3716 (1993). data set without further interpretation. An
ber J < 3 populated and only 10 transitions 3. H. Müller, W. Kutzelnigg, J. Noga, W. Klopper, J. Chem. Phys. example is the United States Census, which
106, 1863 (1997).
can be observed. The 300 times increase in 4. O. Asvany, K. M. T. Yamada, S. Brünken, A. Potapov, S. aims to describe how many people live in
spectral density from CH4 to CH5+ is caused Schlemmer, Science 347, 1346 (2015). different parts of the United States, leaving
by proton scrambling and inversion motion. 5. T. Oka, Phys. Rev. Lett. 45, 531 (1980). the interpretation and use of these counts to
6. G. A. Olah, Angew. Chem. Int. Ed. Engl. 34, 1393 (1995).
Rotational assignments could be made that 7. C. S. Gudeman, M. H. Begemann, J. Pfaff, R. J. Saykally,
Congress and the public.
differ from those they reported, but these are Phys. Rev. Lett. 50, 727 (1983). An exploratory data analysis builds on
minor details—the CoDiff values are correct 8. T. Oka, J. Phys. Chem. A 117, 9308 (2013). a descriptive analysis by searching for
9. S. Schlemmer, T. Kuhn, E. Lescop, D. Gerlich, Int. J. Mass
and the key to advancing our understanding. Spectrom. 185–187, 589 (1999).
discoveries, trends, correlations, or rela-
In spite of the complexity, each quantum 10. P. R. Bunker, B. Ostojić, S. Yurchenko, J. Mol. Struct. tionships between the measurements to
level can be specified by using the total pro- 695–696, 253 (2004). generate ideas or hypotheses. The four-star
11. X.-G. Wang, T. Carrington Jr., J. Chem. Phys. 129, 234102
ton spin quantum number and the parity planetary system Tatooine was discovered
(2008).
12. Z. Jin, B. J. Braams, J. M. Bowman, J. Phys. Chem. A 110,
1569 (2006).
Department of Chemistry and Department of Astronomy and Bloomberg School of Public Health, Johns Hopkins University,
Astrophysics, The Enrico Fermi Institute, University of Chicago, Baltimore, MD 21205, USA. E-mail: [email protected], jtleek@
Chicago, IL 60637, USA. E-mail: [email protected] 10.1126/science.aaa6935 gmail.com

1314 20 MARCH 2015 • VOL 347 ISSUE 6228 sciencemag.org SCIENCE

Published by AAAS
always and exclusively leads to a specific,
deterministic behavior in another. For ex-
Data analysis fowchart ample, data analysis has shown how wing
design changes air flow over a wing, leading
to decreased drag. Outside of engineering,
Not a No
data Did you summarize the data? mechanistic data analysis is extremely chal-
analysis lenging and rarely achievable.
Yes Mistakes in the type of data analysis and
therefore the conclusions that can be drawn
Yes
Descriptive Did you report the summaries without from data are made regularly. In the last 6
interpretation? months, we have seen inferential analyses
No
of the relationship between cellphones and
No brain cancer interpreted as causal (11) or the
Exploratory Did you quantify whether your discoveries exploratory analysis of Google search terms
are likely to hold in a new sample?
related to flu outbreaks interpreted as a pre-
dictive analysis (12). The mistake is so com-
Yes
mon that it has been codified in standard
No phrases (see the table).
Are you trying to predict Are you trying to fgure out how changing the
Determining which question is being
measurement(s) for individuals? average of one measurement afects another?
asked can be even more complicated when

Downloaded from http://science.sciencemag.org/ on October 17, 2017


Yes multiple analyses are performed in the same
No Yes
study or on the same data set. A key danger
Is the efect you are looking for an average is causal creep—for example, when a ran-
Inferential Predictive
efect or a deterministic efect? domized trial is used to infer causation for
a primary analysis and data from secondary
analyses are given the same weight. To ac-
Average Deterministic curately represent a data analysis, each step
in the analysis should be labeled according
to its original intent.
Causal Mechanistic
Confusion between data analytic ques-
tion types is central to the ongoing repli-
when amateur astronomers explored public a subset of measurements (the features) cation crisis, misconstrued press releases
astronomical data from the Kepler tele- to predict another measurement (the out- describing scientific results, and the contro-
scope (7). An exploratory analysis like this come) on a single person or unit. Web sites versial claim that most published research
seeks to make discoveries, but can rarely like FiveThirtyEight.com use polling data to findings are false (13, 14). The solution is to
confirm those discoveries. Follow-up stud- predict how people will vote in an election. ensure that data analytic education is a key
ies and additional data were needed to con- Predictive data analyses only show that you component of research training. The most
firm the existence of Tatooine (8). can predict one measurement from another; important step in that direction is to know
An inferential data analysis quantifies they do not necessarily explain why that the question. ■
whether an observed pattern will likely hold choice of prediction works.
REFERENCES
beyond the data set in hand. This is the most A causal data analysis seeks to find out
1. “How science goes wrong,” The Economist, 19 October
common statistical analysis in the formal what happens to one measurement on av- 2013; see www.economist.com/news/leaders/21588069-
scientific literature. An example is a study erage if you make another measurement scientific-research-has-changed-world-now-
of whether air pollution correlates with life change. Such an analysis identifies both the it-needs-change-itself-how-science-goes-wrong.
2. C. Tomasetti, B. Vogelstein, Science 347, 78 (2015).
expectancy at the state level in the United magnitude and direction of relationships 3. See www.bbc.com/news/magazine-30786970.
States (9). In nonrandomized experiments, between variables on average. For example, 4. Duke’s Legal Stance: We Did No Harm, The Cancer
it is usually only possible to determine the decades of data show a clear causal rela- Letter Publications (2015); see www.cancerletter.com/
articles/20150123_2.
existence of a relationship between two mea- tionship between smoking and cancer (10). 5. A. Potti et al., Nat. Med. 12, 1294 (2006).
surements, but not the underlying mecha- If you smoke, it is certain that your risk of 6. K. A. Baggerly, K. R. Coombes, Ann. Appl. Stat. 3, 1309
nism or the reason for it. cancer will increase. The causal effect is real, (2009).
7. “Planet with four stars discovered by citizen astrono-
Going beyond an inferential data analysis, but it affects your average risk. mers,” Wired UK (2012); see www.wired.co.uk/news/
which quantifies the relationships at popu- Finally, a mechanistic data analysis seeks archive/2012-10/15/four-starred-planet.
lation scale, a predictive data analysis uses to show that changing one measurement 8. M. E. Schwamb et al.; http://arxiv.org/abs/1210.3612
(2013).
9. A. W. Correia et al., Epidemiology 24, 23 (2013).
10. O. A. Panagiotou et al., Cancer Res. 74, 2157 (2014).
Common mistakes 11. E. Oster, Cellphones Do Not Give You Brain Cancer,
FiveThirtyEight (2015); see http://fivethirtyeight.com/
REAL QUESTION TYPE PERCEIVED QUESTION TYPE PHRASE DESCRIBING ERROR features/cellphones-do-not-give-you-brain-cancer/.
12. D. M. Lazer, R. Kennedy, G. King, A. Vespignani, The Parable
of Google Flu: Traps in Big Data Analysis (2014); see
Inferential Causal “Correlation does not imply causation” http://dash.harvard.edu/handle/1/12016836.
Exploratory Inferential “Data dredging” 13. L. R. Jager, J. T. Leek, Biostatistics 15, 1 (2014).
14. A. Gelman, K. O’Rourke, Biostatistics 15, 18 (2014).
Exploratory Predictive “Overftting”
Descriptive Inferential “n of 1 analysis” Published online 26 February 2015;
10.1126/science.aaa6146

SCIENCE sciencemag.org 20 MARCH 2015 • VOL 347 ISSUE 6228 1315


Published by AAAS
What is the question?
Jeffery T. Leek and Roger D. Peng

Science 347 (6228), 1314-1315.


DOI: 10.1126/science.aaa6146originally published online February 26, 2015

Downloaded from http://science.sciencemag.org/ on October 17, 2017


ARTICLE TOOLS http://science.sciencemag.org/content/347/6228/1314

RELATED http://science.sciencemag.org/content/sci/348/6234/512.full
CONTENT
http://stke.sciencemag.org/content/sigtrans/8/371/fs7.full

REFERENCES This article cites 7 articles, 1 of which you can access for free
http://science.sciencemag.org/content/347/6228/1314#BIBL

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Use of this article is subject to the Terms of Service

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive
licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. The title
Science is a registered trademark of AAAS.

You might also like