0% found this document useful (0 votes)
49 views64 pages

Introduction To KS

This document outlines a course on computational statistics. The course covers topics such as different statistical software like SPSS, R, and Excel; data preparation, management, and visualization; generating statistical distributions; simple linear regression and correlation; basic R programming; resampling methods; statistical inference including estimation and hypothesis testing of means, proportions, and variances; analysis of variance; and introduction to design of experiments. Students will complete a group project and presentation. Reference materials and websites with additional resources are also listed.

Uploaded by

erwindzeko80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views64 pages

Introduction To KS

This document outlines a course on computational statistics. The course covers topics such as different statistical software like SPSS, R, and Excel; data preparation, management, and visualization; generating statistical distributions; simple linear regression and correlation; basic R programming; resampling methods; statistical inference including estimation and hypothesis testing of means, proportions, and variances; analysis of variance; and introduction to design of experiments. Students will complete a group project and presentation. Reference materials and websites with additional resources are also listed.

Uploaded by

erwindzeko80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Computational Statistics

Setia Pramana
2015

Computational Statistics

Course Outline
Introduction
Different Statistical Software
Data Preparation, Management, Manipulation,
Summarization with:
SPSS
R (R Commander)
Ms. Excel

Data Tabulation and Visualization

Computational Statistics

Course Outline
Generate Different Statistical Distribution (with
Rcmdr)
Simple Linear Regression and Correlation
Basic R Programming
Developing Simple Graphical User Interface in R
Resampling Methods
Statistical Inference (Point and interval
estimation)
Computational Statistics

Course Outline
Hypothesis testing: one, two sample t-test (test
for mean difference, proportion and variance)
Analysis of Variance (Anova): one and two way
Anova.
Introduction to Design of Experiment
Final Project

Computational Statistics

Course Workload

20% Theory, 80% practice


Group Project (5 students)
Presentation every week
R code would be provided
Slides can be seen at :
http://www.slideshare.net/hafidztio/

Computational Statistics

Reference Books

Computational Statistics

Reference Books
John Maindonald dan W. John Braun. Data Analysis and
Graphics Using R an Example-Based Approach. 3rd
Edition. Cambridge University Press: Cambridge.2010.
John Fox. Journal of Statistical Software, The R
Commander : A Basic-Statistics Graphical User Interface
to R.Volume 14, Issue 9, September 2005.
Chris Beeley. Web Application Development with R
Using Shiny. Packt Publishing: Birmingham.2013.
SPSS Statistics Base Users Guide 17.0. Polar
Engineering and Consulting : Chicago, 2007.

Computational Statistics

Reference Books
Jurusan Komputasi Statistik. Modul Mata Kuliah
Komputasi Statistik. 2014
Kerns, G. Jays. Introduction to Probability and Statistics
Using R. E book. GNU Free Documentation License.
2010.
Geof H. Givens dan Jennifer A. Hoeting. Computational
Statistics, 2nd edition. John Wiley and Sons : New
Jersey. 2013
Jochen Voss. Statistical Computing. E book. 2011.
Brent B. Welch, Ken Jones dan Jeffrey Hobbs. Practical
Programming in Tcl and Tk. 4Th edition. Prentice Hall
PTR: New Jersey.2003.
Computational Statistics

Other Materials
https://sites.google.com/site/biostatinfocor
e/home/rworkshop
https://sites.google.com/site/biostatinfocor
e/biostatistics-workshop

Computational Statistics

Introduction

Computational Statistics

10

Statistics?

Computational Statistics

11

Computational Statistics

12

What is Statistics?
Statistics: is the science which deals with
collection, classification and tabulation of
numerical facts as the basis for explanation,
description and comparison of phenomenon.

Computational Statistics

13

Observations on the
Bills of
Mortality (1662)
Recorded Plague
related death for
100 years
Computational Statistics

14

What is Statistics?
Exploring data: Using graphical and numerical
techniques to study patterns and departures from
patterns (in order to interpreting data)
Sampling and experimentation: Clarifying the
question, deciding on methods of collection and analysis
to produce valid information.
Anticipating patterns: Exploring random phenomena
using probability and simulation. Probability is our tool for
anticipating distributions...
Statistical Inference: Estimating population parameters
and testing hypothesis
Computational Statistics

15

Statistical thinking will one day be as


necessary for efficient citizenship as the
ability to read and write HG Well

Computational Statistics

16

Areas of Statistics
Two areas of statistics:
Descriptive Statistics: collection, presentation,
and description of sample data.
Inferential Statistics: making decisions and
drawing conclusions about populations.

Computational Statistics

17

Statistics Descriptive

What is your conclusion?


The fatality rate is:
40% in the group of drivers who did not wear seat belts
20%in drivers who did wear seat belts

Seat belts appear to save lives


Computational Statistics

18

Inferential Statistics
Are results applicable to the population of all drivers?
(generalization)
Does wearing seat belts save lives? (assess strength of
evidence)
Is the fatality rate of those not wearing seat belts higher than
the fatality rate of those wearing seat belts? (comparison)
How many lives can be saved by wearing seat belts?
(prediction)
Do other variables influence the conclusion? For example:
the age of driver, alcohol use, type of car, speed at impact
(ask more questions)
Computational Statistics

19

Statistics and the Technology


The electronic technology has had a tremendous effect
on the field of statistics.
Many statistical techniques are repetitive in nature:
computers and calculators are good at this.
Lots of statistical software packages: R, MINITAB,
SYSTAT, STATA, SAS, Statgraphics, SPSS, MS Excel,
and calculators.

Computational Statistics

20

Available Statistical Packages

Computational Statistics

21

Available Statistical Packages


Proprietary
Excel
SPSS
MINITAB
SAS
Stata
Statistica
Many more

Free Software
LibreOffice Calc
R
CS Pro
WinBugs
EpiInfo
Many more..

Computational Statistics

22

Computational Statistics

23

Computational Statistics

24

Computational Statistics

25

Computational Statistics

26

Microsoft Excel

Computational Statistics

27

Which one do you use?

Why?

Computational Statistics

28

Statistical Software Used

Computational Statistics

29

Statistical Software Used

Computational Statistics

30

R is HOT !

Computational Statistics

31

R is HOT !
R is HOT !

Computational Statistics
http://r4stats.com/articles/popularity/

32

R is HOT !

Computational Statistics
http://r4stats.com/articles/popularity/

33

R is HOT !

Computational Statistics
http://r4stats.com/articles/popularity/

34

What is R?
A language and environment for statistical computing and
graphics.
An integrated suite of software facilities for data
manipulation, calculation and graphical display.
First appeared in 1996 by Prof. Ross Ihaka and Robert
Gentleman of the University of Auckland, NZ.
GNU software -> Free. Similar like S language.
Open source, maintained and developed by a community
of developers.
Works in Windows, Unix,
MacOs
Computational Statistics
35

R includes
Effective data handling and storage facility,
A suite of operators for calculations on arrays, in particular
matrices
A large, coherent, integrated collection of intermediate
tools for data analysis,
Graphical facilities for data analysis and display either onscreen or on hardcopy
Well-developed, simple and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.
http://www.r-project.org/

Computational Statistics

36

Why R?
It is not only statistical software but
also a language
5000 add-on packages lots of preprepared packages (http://cran.rproject.org/web/packages/)
With many applications http://cran.rproject.org/web/views/,
http://www.revolutionanalytics.com/rlanguage-features-applications-andextensions#thirdparty .
Access to powerful, cutting-edge
Computational Statistics
analytics

37

Why R?
Flexible (complex or standard statistical practices, bayesian
modelling, GIS map building, building interactive web
applications, building interactive tests, etc. )
We can make our own package and publish it
Great Graphics and data visualization
Can be used for High Performance Computer Clusters
Well Supported by R Community (http://www.inside-r.org/rresources-web)
And many more..

Computational Statistics

38

Why R?
Can be integrated with other languages (C/C++,
Java).
R can interact with many data sources and other
statistical packages (SAS, Stata, SPSS, and Minitab).
For the high performance computing task
multiple cores, either on a single machine or across a
network.

Computational Statistics

39

But..
R has no warranty
Command Line Interface : difficult for some users.
Users must learn a new way of thinking about data
and data analysis sequence
Thats all .. I guess

Computational Statistics

40

Companies using R in 2013

The New York Times routinely uses R for interactive and print data
visualization.
Google has more than 500 R users.
The FDA supports the use of R for clinical trials of new drugs.
The National Weather Service uses R to predict the extent of flooding
events.
Zillow uses R to model housing prices.
The Consumer Financial Protection Bureau uses R and other open
source tools.
Twitter uses R for data science applications on the Twitter database.
FourSquare uses R to develop its recommendation engine.
Facebook uses R to model all sorts of user behaviour.
Source: Revolutionanalytics
Computational Statistics

41

R Library/packages

IsoGene

nlme
lme4

foreign
zoo

R Base Packages

survival

reshape2

ggplot2
zoo

Computational Statistics

42

My R Packages

IsoGene
IsoGeneGUI
nea
neaGUI
biclustGUI
OCRME
More detail: http://setiopramono.wordpress.com/rprogramming/

Computational Statistics

43

R For Cutting Edge


Technologies

Computational Statistics

44

R Graphics and Visualization

R provides wide range graphics and visualizations


Basic Plots: bar plots, basic 3D plots, heatmap.,etc
Geographic Maps
Projection Maps
Social Network Graphs
Animated graphics and movies (animation)
Motion Charts (GoogleViz)
Interactive Graphics (rggobi)
Image format: BMP, JPEG, PDF, PNG etc
and.many more
Computational Statistics

45

R Graphics

Computational Statistics

46

R Graphics

RCircos

Computationalhttps://gjabel.wordpress.com/
Statistics

47

R Graphics

A map of worldwide email traffic

Computational Statistics

48

R Graphics
Facebook connections between city centers around the world

Computational Statistics

49

R Graphical User Interfaces


R uses Command line interface and it is preferred for
advanced users allows direct control, more accurate,
flexible and the analysis is reproducible.
Requires good knowledge of the language difficult for
beginners or less frequent users.
R provides tools for building GUIs RGUI

Computational Statistics

50

R GUI Projects
Integrated development environment (IDE)/Script
Editors aimed to provide feature-rich environments to
edit R scripts and code: Rstudio (www.rstudio.com),
and architect (www.Openanalytics.eu)
Web based application: the Rweb (Banfield, 1999),
R.Net (www.u.arizona.edu/~ryckman/Net.php),
or gWidgetsWWW (Verzani, 2012).

Computational Statistics

51

R GUI Projects
Python: OpenMeta-Analyst (Wallace et al, 2012)
Java: JGR (Java GUI for R), Deducer (Fellows, 2012),
and Glotaran (Snellenburg, 2012).
Php: R-php (http://dssm.unipa.it/R-php/)
Other extensions connect R to graphical toolboxes for
developing menus and dialog boxes: Tcltk, Gtk.

Computational Statistics

52

R Studio

Download from
Rstudio.com
Powerfull IDE
(Integrated
Development
Environment) for R.

Computational Statistics

53

RGUI Developed using tcltk

Computational Statistics

54

RGUI: RCommander
Rcommander.com
Helpful for R beginner
Install inside R

Computational Statistics

55

RGUI using C#: Wires


Developed by STIS
students
For Spatial Data
Analysis
Still developing

Computational Statistics

56

RGUI using C#: Wires

Computational Statistics

57

RGUI: Web Based App

Computational Statistics

58

WebBUGS
Conducting Bayesian
Statistical Analysis
Online
Combines
OpenBUGS and R

www.webbugs.psychstat.org
Computational Statistics

59

RGUI: Shiny
A new package from Rstudio to build interactive web
applications with R.
Really Easy!
Build useful web applications with only a few lines of
codeno JavaScript required.
Self learning: http://shiny.rstudio.com/
http://www.showmeshiny.com/

Computational Statistics

60

RGUI using Shiny: FAST

Figure 5. FAST main page

Computational Statistics

61

Dynamic Report Generation


Sweave
knitr
markdown

Computational Statistics

62

Want to Learn R? Need Help?


Lots of Self learning Resources
http://www.rdatamining.com/resources/onlinedocs
Blogs:
Software
R
Python
SAS
Stata

# Blogs
550
60
40
11

Blogs Source
R-Bloggers.com
SciPy.org
PROC-X.com, sasCommunity.org Planet
Stata-Bloggers.com

User Group: Stockholm R User group, etc


Indonesia/Jakarta?
https://sites.google.com/site/biostatinfocore/introduction-to-r
Computational Statistics

63

Need Help?

Number of R- or SAS-related posts to Stack Overflow by week.


Computational Statistics

64

You might also like