Using R For Data Analysis and Graphing in An Introductory Physics Laboratory
Using R For Data Analysis and Graphing in An Introductory Physics Laboratory
1. Introduction
An experiment is not completed when the experimental data are collected. Usually,
the data require some processing, an analysis, and possibly some sort of visualization.
The three most commonly encountered approaches to data analysis and graphing are
manual method using grid paper
general purpose spreadsheet software
specialized scientific graphing software
The manual method involving grid paper and a pencil often still seems to be the
one favoured by the teachers. Indeed it has a certain pedagogical merit, for it does
not hide anything in black boxes: the student has to work through all the steps himor herself. It is also the preferred option in the environments where students cannot
be expected to own a personal computer. However, students find the method tedious,
painstaking and old-fashioned. In all honesty, their teachers do not use this method
in their own research work.
General purpose spreadsheet software like Microsoft Excel or the freely available
OpenOffice Calc contains most functions required for data analysis and graphing.
They are commonly available and the students are generally well-versed in their use.
However, spreadsheets have their drawbacks. They are difficult to debug, and they
often require an educated operator in order to produce a visually satisfying graph
without visual distractions.
Finally, there exists specialized software for scientific graphing such as Origin
(formerly MicroCal, Inc., now OriginLab), SigmaPlot (formerly Jandel Scientific, now
4e04
2e04
0e+00
[( cm)1]
6e04
0.000
0.002
0.004
0.006
0.008
0.010
c [mol/L]
Since we only have five data points, the data were entered directly into the
program rather than being read from an external data file. The c() function
is used to concatenate data into a vector.
R encourages programming with vectors. In the expression k/resistance, each
element of the resulting vector is computed as a reciprocal value of the element
of the original vector, multiplied by k.
Inspired by the TEX typesetting system, R offers a capable method of entering
mathematical expressions into graph labels using the expression() function [10].
The plot() function is the generic function for plotting objects in R; here we
used it to produce a scatterplot.
The lm() function is used to fit any of several linear models [11]; we used it to
perform a simple bivariate regression. R is not overly talkative, and lm() does
not produce any output on screen, it merely creates the object cond.fit, which
we can later manipulate at will. In the example we used abline(cond.fit) to
plot the regression line atop the data points. If we want to print the coefficients,
we can use coef(cond.fit).
The complete sequence of commands can be saved into a script file, e.g.,
electrolyte.R, and can be executed at any later time using the source() function:
> source("electrolyte.R")
2.4. Case 2: Nonlinear dependence
Not all dependencies are linear. Sometimes, they can be linearized by an appropriate
transformation (e.g., logarithmic), which reduces the task of finding the optimal fitted
curve back to the linear case. With a computer at hand, however, this is not necessary.
In this example, we examine the time dependence of voltage in a circuit with two
capacitors (figure 2). The student first charges the capacitor C1 and then monitors
the voltage as C1 discharges. The data, saved as in a CSV format:
R2
C2
U [V]
10
12
100
200
300
400
500
t [s]
10
0
frequency
15
10
12
decays / 10 s
introduction course has been recommended even in the case of graduate students of
computational biology [8]. The best synergy might be achieved when students take an
introductory course in statistics before the physics laboratory course or simultaneously
with it.
In many ways, the position of R in the area of data analysis and scientific graphing
can be compared to the position TEX holds in typesetting texts with mathematical
content. Both are free software packages, maintained by a tightly knit network
of users/developers rather than a corporation. Both were designed as complete
programming languages, which enabled the community to extend their usefulness with
numerous add-on packages. Both, admittedly, have a relatively steep learning curve.
Both found their use in their specific niches, which received a certain degree of neglect
by the mainstream software industry, and seem fairly entrenched there.
4. Conclusions
The extent of utilization that R has recently experienced in various scientific disciplines
signifies it is more than a marginal phenomenon. In the paper, we have demonstrated
that R can be used productively for for data analysis and graphing in an introductory
physics laboratory, and have illustrated its use on a few experiments taken from an
actual laboratory course. The examples include a linear dependence, a non-linear
dependence, and a histogram. The positive and negative aspects of R were discussed
against three options often used for data analysis and graphing: manual graphing using
grid paper, general purpose spreadsheet software, and specialized scientific graphing
software.
Acknowledgments
This work has been supported by the Slovenian Research Agency through grant J32268.
References
[1] Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. J. Comput.
Graph. Stat., 5:299314, 1996.
[2] R Development Core Team. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN 3-900051-07-0.
[3] R. A. Becker and J. M. Chambers. S: An Interactive Environment for Data Analysis and
Graphics. Wadsworth & Brooks/Cole, Pacific Grove, CA, USA, 1984.
[4] R. A. Becker, J. M. Chambers, and A. R. Wilks. The New S Language: A Programming
Environment for Data Analysis and Graphics. Wadsworth & Brooks/Cole, Pacific Grove,
CA, USA, 1988.
[5] John M. Chambers. Programming with Data: A guide to the S language. Springer, New York,
NY, USA, 1998.
[6] Nicholas J. Horton, Elizabeth R. Brown, and Linjuan Qian. Use of R as a toolbox for
mathematical statistics exploration. Am. Stat., 58:343357, 2004.
[7] Jeff Racine and Rob Hyndman. Using R to teach econometrics. J. Appl. Econ., 17:175189,
2002.
[8] Stephen J. Eglen. A quick guide to teaching R programming to computational biology students.
PLOS Comput. Biol., 5:e1000482, 2009.
[9] Bojan Bo
zi
c, Jure Derganc, Gregor Gomis
cek, Vera Kralj-Igli
c, Janja Majhenc, Primo
z Peterlin,
s. Vaje iz biofizike. Institut za biofiziko MF, 6th edition, 2003.
Sasa Svetina, and Bostjan Zek
In Slovenian.
10
[10] Paul M. Murrell and Ross Ihaka. An approach to providing mathematical annotation in plots.
J. Comput. Graph. Stat., 9:582599, 2000.
[11] John M. Chambers. Linear models. In John M. Chambers and Trevor J. Hastie, editors,
Statistical Models in S, chapter 4. Chapman & Hall/CRC, Boca Raton, FL, USA, 1991.
[12] Lillian C. McDermott, Mark L. Rosenquist, and Emily H. van Zee. Student difficulties in
connecting graphs and physics: Examples from kinematics. Am. J. Phys., 55:503513, 1987.
[13] Robert J. Beichner. Testing student interpretation of kinematics graphs. Am. J. Phys., 62:750
762, 1994.
c. Sketching graphsan efficient way of
[14] Vida Kari
z Merhar, Gorazd Planinsi
c, and Mojca Cepi
probing students conceptions. Eur. J. Phys., 30:163175, 2009.
[15] Roy Barton. Why do we ask pupils to plot graphs? Phys. Educ., 33:366367, 1998.
[16] R. Dory. Spreadsheets for physics. Comput. Phys., 2:7074, 1988.
[17] Linda Webb. Spreadsheets in physics teaching. Phys. Educ., 28:7782, 1993.
[18] B. A. Cooke. Some ideas for using spreadsheets in physics. Phys. Educ., 32(2):8087, 1997.
[19] Rick Guglielmino. Using spreadsheets in an introductory physics lab. Phys. Teach., 27:175178,
1989.
[20] M. Krieger and J. Stith. Spreadsheets in the physics laboratory. Phys. Teach., 28:378384,
1990.
[21] L. L
evesque. Simple smoothing technique to reduce data scattering in physics experiments.
Eur. J. Phys., 29:155162, 2008.
[22] Gaj Vidmar. Statistically sound distribution plots in Excel. Metodol. Zv., 4(1):8398, 2007.
[23] Yu-Sung Su. Its easy to produce chartjunk using Microsoft Excel 2007 but hard to make good
graphs. Comput. Stat. Data Anal., 52:45944601, 2008.
[24] Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire,
CT, USA, 2nd edition, 2001.
[25] Paul A. Tukey. Exploratory Data Analysis. Adddison-Wesley, 1977.
[26] John M. Chambers, William S. Cleveland, Beat Kleiner, and Paul A. Tukey. Graphical Methods
for Data Analysis. Wadsworth & Brooks/Cole, Pacific Grove, CA, USA, 1983.
[27] William S. Cleveland. Visualizing Data. Hobart Press, Summit, NJ, USA, 1993.
[28] William S. Cleveland. The Elements of Graphing Data. Hobart Press, Summit, NJ, USA, 1994.
[29] Leland Wilkinson. The Grammar of Graphics. Springer-Verlag, New York, NY, USA, 2nd
edition, 2005.
[30] Matthias Schwab, Martin Karrenbach, and Jon Claerbout. Making scientific computations
reproducible. Comput. Sci. Eng., 2(6):6167, 2000.
[31] Friedrich Leisch and Anthony J. Rossini. Reproducible statistical research. Chance, 16(2):4650,
2003.