SlideShare a Scribd company logo
From Lisp to Clojure/Incanter and R
             An Introduction




             Shane M. Conway
              January 7, 2010
“Back to the Future”
• The goal of this presentation is to draw some rough
  comparisons between Incanter and R.
• There has been a not insubstantial amount of discussion over
  the “future of R”.
• Ross Ihaka, a co-creater of R, has been especially vocal over
  his concerns of R’s performance (see his homepage for more
  detail). In “Back to the Future: Lisp as a Base for a Statistical
  Computing System” (August 2008) Ihaka and Duncan Temple
  Lang (of UC Davis and Omegahat) state:
      “The application of cutting-edge statistical methodology is limited by the capabilities of
      the systems in which it is implemented. In particular, the limitations of R mean that
      applications developed there do not scale to the larger problems of interest in practice.
      We identify some of the limitations of the computational model of the R language that
      reduces its effectiveness for dealing with large data efficiently in the modern era.
        We propose developing an R-like language on top of a Lisp-based engine for statistical
      computing that provides a paradigm for modern challenges and which leverages the
      work of a wider community.”
Lisp and Fortran
• Modern programming languages began primarily with two
  languages that had different philosophies and goals: Fortran
  and Lisp. They came from different sides of academia:
   – Physicists and engineers wanted numeric computations to be run
     in the most efficient way to solve concrete problems
   – Mathematicians were interested in algorithmic research for solving
     more abstract problems
• Both R and Clojure are based on the Lisp model of “functional
  programming” where everything is treated as an object.
• The name Lisp comes from "list processing," and it is
  sometimes said that everything in Lisp is a list.
Timeline
• Looking at the history of programming languages is complex,
  as new languages tend to be informed by all prior
  developments.

• 1950/60s: Fortran (54), Lisp (58), Cobol (59), APL (62), Basic
  (64)
• 1970s: Pascal (70), C (72), S (75), SQL (78)
• 1980s: C++ (83), Erlang (86), Perl (87)
• 1990s: Haskell (90), Python (91), Java (91), R (93), Ruby (93),
  Common Lisp (94), PHP (95)
• 2000s: C# (00), Scala (03), Groovy (04), F# (05), Clojure (07),
  Go (09)
R
• S began as a project at Bell Laboratories in 1975, involving
  John Chambers, Rick Becker, Doug Dunn, Paul Tukey, and
  Graham Wilkinson.
• R is a “Scheme-like” language. R is written primarily in C and
  Fortran, although it is being extended through other
  languages (e.g. Java).
JVM
• The Java Virtual Machine (JVM) is very similar in theory to the
  Common Language Runtime (CLR) for the .Net framework: it
  provides a virtual machine for the execution of programs.
• Offers memory and other resource management (garbage
  collection), JIT, a type system.
• JVM was designed for Java, but it operates on Java bytecode
  so it can be used by other languages such as Jython, JRuby,
  Groovy, Scala, and Clojure.
Clojure
• Clojure is a Lisp language that runs on the JVM. It was
  released in 1997 by Rich Hickey, who continues to be the
  primary contributor.
   – “Clojure (pronounced like closure) is a modern dialect of the Lisp
     programming language. It is a general-purpose language supporting
     interactive development that encourages a functional programming
     style, and simplifies multithreaded programming. Clojure runs on the
     Java Virtual Machine and the Common Language Runtime. Clojure
     honors the code-as-data philosophy and has a sophisticated Lisp
     macro system.”
• Clojure can be used interactively (REPL) or compiled and
  deployed as an executable. REPL stands for “read-eval-print
  loop”.
Incanter
• Incanter is a Clojure-based, R-like platform for statistical
  computing and graphics, created by David Edgar Liebke.
   – Incanter “leverages both the power of Clojure, a dynamically-typed,
     functional programming language, and the rich set of libraries
     available on the JVM for accessing, processing, and visualizing data. At
     its core are the Parallel Colt numerics library, a multithreaded version
     of Colt, the JFreeChart charting library, the Processing visualization
     library, as well as several other Java and Clojure libraries.”
• http://www.jstatsoft.org/v13 “Lisp-Stat, Past, Present and
  Future” in Journal of Statistical Software Vol. 13, Dec. 2004
• Why Incanter? The primary reason is easy access to Java.
Comparison
Similarities:                    Differences:
• They can both be used          • R requires more effort to
   interactively (for Clojure:      integrate with Java
   REPL)                         • R influenced more by C and
• They are both functional,         Fortran
   based on Scheme               • Clojure can be compiled
• Both languages have type       • Clojure is not OO, while R
   inference                        has S3, S4, and r.oo
• “Code as data”                 • Clojure has many more data
                                    types
                                 • R is more of a DSL
Tradeoffs
Advantages:                       Disadvantages
• Clojure runs on the JVM, so     • Incanter is very immature in
  it can reference any Java          comparison; there is no
  library, and can be called by      equivalent to CRAN
  other languages on the JVM      • Clojure has 339 questions
• Clojure natively deals with        on stackoverflow compared
  concurrency                        to 562 for R
• Vectors/Lists/etc. in Clojure   • Clojure/Incanter are each
  allow you to add/remove            primarily developed by 1
                                     person; no Core team
Using Clojure/Incanter
• Clojure is a set of jars, so it can be used from the command
  line by calling java.
• To use Incanter, just load the desired library into a Clojure
  session:
   – (use '(incanter core stats charts))
• Many IDE options:
   – I use Eclipse for all my development (R: StatET, Python: Pydev, C/C++:
     CDT: http://code.google.com/p/counterclockwise/ and
     http://www.ibm.com/developerworks/opensource/library/os-eclipse-
     clojure/index.html
   – Using Emacs: http://incanter-blog.org/2009/12/20/getting-started/
Hello World
• R takes syntax from both Lisp and C.

  // Java
  public void hello(String name) {
        System.out.println("Hello, " + name);
  }

  ; Clojure
  (defn hello [name]
         (println "Hello," name))

  #R
  hello <- function(name) {
         print(paste("Hello,“, name))
  }
Basic Syntax
Statements in R use more of a C-   (+ 1 2) ; => 3           `+`(1,2) # => 3
like syntax                        (range 3) ; => (1 2 3)   seq(1,3) # => (1 2 3)
Getting help                       (doc functionname)       help(functionname)
Checking an object type            (type objectname)        class(objectname)
Timing performance                 (time functioncall)      System.time(functioncall)
Browsing the workspace             (ns-publics 'user)       ls()
Nagivating the workspace           (all-ns)                 search()
Collections
Lists                               [def stooges ["Moe" "Larry"            stooges <- c(“Moe”, “Larry”,
                                    "Curly" "Shemp"]]                      “Curly”, “Shemp”)
Vectors                             (def stooges ["Moe" "Larry"            stooges <- c(“Moe”, “Larry”,
                                    "Curly" "Shemp"])                      “Curly”, “Shemp”)

Maps                                (def popsicle-map                      popsicle.map <-
                                     {:red :cherry, :green :apple,         list(“red”=“cherry”,
                                    :purple :grape})                       “green”=“apply”,
                                    def popsicle-map                       “purple”=“grape”)
                                     (sorted-map :red :cherry, :green
                                    :apple, :purple :grape))
Matrix (does not exist as part of   (def A (matrix [[1 2 3] [4 5 6] [7 8   A <- matrix(1:9, nrow=3)
Clojure)                            9]]))
                                    (def A2 (matrix [1 2 3 4 5 6 7 8 9]
                                    3))
Count                               (count stooges)                        length(stooges)
Filtering                           (filter #(> (count %) 3) stooges)      stooges[nchar(stooges)==3]
                                    (some #(= % "Moe") stooges)            stooges*stooges==“Moe”+
Matrices
Matrix (does not exist as part of   (def A (matrix [[1 2 3] [4 5 6] [7 8   A <- matrix(1:9, nrow=3)
Clojure)                            9]]))
                                    (def A2 (matrix [1 2 3 4 5 6 7 8 9]
                                    3))
Dimensions                          (dim A)                                dim(a)
                                    (ncol A)                               ncol(a)
                                    (nrow A)                               nrow(a)
Filtering                           (use 'incanter.datasets)               iris[1,1]
                                    (def iris (to-matrix (get-dataset      Iris[,-1]
                                    :iris)))
                                    (sel iris 0 0)
                                    (sel iris :rows 0 :cols 0)
                                    (sel iris :except-cols 1)
Statistics
Quantile      (quantile (range 10))           quantile(1:10)
Sampling      (sample (range 100) :size 10)   sample(1:100, 10)
Mean          (mean (range 10))               mean(1:10)
Skewness      (skewness (range 10))           moments::skewness(rnorm(100)
                                              )
Regression    (linear-model y x)              lm(y ~ x)
Correlation   (correlation x y)               cor(x, y)
              (correlation matrix)            cor(x)
Loops
• Several different ways to loop in     • Some examples of the same
  Clojure:                                sequence in R:

    ;; Version 1                           for(i in seq(1, 20, 2)) print(i)
    (loop [i 1]
      (when (< i 20)                    • R also makes heavy usage of the
       (println i)                        apply family of functions (Clojure
       (recur (+ 2 i))))                  also has an apply function):

    ;; Version 2                           sapply(seq(1, 20, 2), print)
    (dorun (for [i (range 1 20 2)]
          (println i)))                 • R also has a while() function.

    ;; Version 3
    (doseq [i (range 1 20 2)]
      (println i))
Java and Clojure
• Clojure interacts with Java seamlessly. A trivial example:

  (. javax.swing.JOptionPane (showMessageDialog nil "Hello World"))


• Or a slightly more advanced example:

   (defn fetch-xml [uri]
    (xml-zip
     (parse
      (org.xml.sax.InputSource.
       (java.io.StringReader.
        (slurp* (java.net.URI. (re-gsub #"s+" "+" (str uri)))))))))
R and Java: RJava
• Calling Java code from R (and vice versa) can be done with the
  RJava package and JRI.
  #R
  helloJavaWorld <- function(){
    hjw <- .jnew("HelloJavaWorld") # create instance of class
    out <- .jcall(hjw, "S", "sayHello") # invoke sayHello method
    return(out)
  }

  // Java
  public class HelloJavaWorld {
           public String sayHello() {
                         String result = new String("Hello Java World!");
                         return result;
           }
           public static void main(String[] args) {
           }
  }
Java and R: JRI
• JRI allows you to pass R commands to an R console and get
  results back:
  import org.rosuda.JRI.Rengine;
  ...

         Rengine re=new Rengine(args, false, new Rexecutor());
         REXP x;
         re.eval("data(iris)",false);
         System.out.println(x=re.eval("iris"));
         RVector v = x.asVector();
         if (v.getNames()!=null) {
                     System.out.println("has names:");
                     for (Enumeration e = v.getNames().elements() ; e.hasMoreElements() ;) {
                                   System.out.println(e.nextElement());
                     }
         }
Performance
• Problem Number 1 from Project Euler: “Find the sum of all
  the multiples of 3 or 5 below 1000.” I just changed this to
  be a count instead:
user=> (defn divisible-by-3-or-5? [num]   divby3or5 <- function(n) {
   (or (== (mod num 3) 0)(== (mod num         n[(n %% 3) == 0 | (n %% 5) == 0]
   5) 0)))                                }
user=> (time (println (count (filter      system.time(print(length(divby3or5(1:10
    divisible-by-3-or-5? (range               000000))))
    10000000)))))
4666667                                   [1] 4666667
"Elapsed time: 29321.981146 msecs"          user system elapsed
nil                                        12.21 0.22 12.70

• This is a trivial example, but R significantly outperformed
  on this simple operation.
• A more thorough benchmarking of Incantor is necessary.
Final Thoughts
• Clojure/Incanter is a very promising programming language
  based on Lisp. It provides functional programming with a
  seamless Java integration and native concurrency.
• R has a remarkable user community of dedicated scientists
  and mathematicians which is continuing to grow.
  Performance issues can be mitigated by using parallelization
  (e.g. MPI), and there are efforts to create compilers that
  promise 10x speed improvements.
• Incanter can be used in the place of R for projects that use
  relatively basic statistics, and that have a reliance on Java
  (especially for something that is web-based).
Resources
Some useful resources for Clojure/Incanter
• http://clojure.org/
• http://incanter.org/
• http://java.ociweb.com/mark/clojure/article.html
• http://en.wikibooks.org/wiki/Clojure_Programming/

For R:
• http://r-project.org and http://cran.r-project.org
• http://www.stats.uwo.ca/faculty/murdoch/2864/Flourish.pdf

Lastly, http://rosettacode.org/ has good examples for both
   languages.

More Related Content

What's hot (20)

PDF
Nx tutorial basics
Deepakshankar S
 
KEY
R for Pirates. ESCCONF October 27, 2011
Mandi Walls
 
PDF
Compact and safely: static DSL on Kotlin
Dmitry Pranchuk
 
PDF
Hadoop + Clojure
elliando dias
 
PDF
Python 2.5 reference card (2009)
gekiaruj
 
PDF
Futures e abstração - QCon São Paulo 2015
Leonardo Borges
 
PDF
core.logic introduction
Norman Richards
 
PDF
Hw09 Hadoop + Clojure
Cloudera, Inc.
 
PDF
Rainer Grimm, “Functional Programming in C++11”
Platonov Sergey
 
PPTX
19. Java data structures algorithms and complexity
Intro C# Book
 
PPT
Oop lecture9 13
Shahriar Robbani
 
PDF
Clojure intro
Basav Nagur
 
PDF
Python Cheat Sheet
GlowTouch
 
PDF
Thinking Functionally In Ruby
Ross Lawley
 
PDF
The Ring programming language version 1.7 book - Part 39 of 196
Mahmoud Samir Fayed
 
PDF
Coscup2021-rust-toturial
Wayne Tsai
 
PDF
Coscup2021 - useful abstractions at rust and it's practical usage
Wayne Tsai
 
PDF
Swift for TensorFlow - CoreML Personalization
Jacopo Mangiavacchi
 
PDF
Why Haskell
Susan Potter
 
Nx tutorial basics
Deepakshankar S
 
R for Pirates. ESCCONF October 27, 2011
Mandi Walls
 
Compact and safely: static DSL on Kotlin
Dmitry Pranchuk
 
Hadoop + Clojure
elliando dias
 
Python 2.5 reference card (2009)
gekiaruj
 
Futures e abstração - QCon São Paulo 2015
Leonardo Borges
 
core.logic introduction
Norman Richards
 
Hw09 Hadoop + Clojure
Cloudera, Inc.
 
Rainer Grimm, “Functional Programming in C++11”
Platonov Sergey
 
19. Java data structures algorithms and complexity
Intro C# Book
 
Oop lecture9 13
Shahriar Robbani
 
Clojure intro
Basav Nagur
 
Python Cheat Sheet
GlowTouch
 
Thinking Functionally In Ruby
Ross Lawley
 
The Ring programming language version 1.7 book - Part 39 of 196
Mahmoud Samir Fayed
 
Coscup2021-rust-toturial
Wayne Tsai
 
Coscup2021 - useful abstractions at rust and it's practical usage
Wayne Tsai
 
Swift for TensorFlow - CoreML Personalization
Jacopo Mangiavacchi
 
Why Haskell
Susan Potter
 

Similar to From Lisp to Clojure/Incanter and RAn Introduction (20)

PDF
The Rise of Dynamic Languages
greenwop
 
PDF
An Analytics Toolkit Tour
Rory Winston
 
PDF
Clojure - An Introduction for Lisp Programmers
elliando dias
 
ODP
Clojure made simple - Lightning talk
John Stevenson
 
PDF
Clojure Interoperability
rik0
 
PDF
SimpleR: tips, tricks & tools
Rob Hyndman
 
PDF
Clojure made-simple - John Stevenson
JAX London
 
PDF
I know Java, why should I consider Clojure?
sbjug
 
PDF
Los Angeles R users group - Nov 17 2010 - Part 2
rusersla
 
PDF
Learning notes of r for python programmer (Temp1)
Chia-Chi Chang
 
PDF
Continuation Passing Style and Macros in Clojure - Jan 2012
Leonardo Borges
 
KEY
The Return of the Living Datalog
Mike Fogus
 
PPTX
R programming Language
SarthakBhargava7
 
PDF
Introduction to R software, by Leire ibaibarriaga
DTU - Technical University of Denmark
 
ZIP
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
Howard Lewis Ship
 
ODP
Clojure made really really simple
John Stevenson
 
PDF
Clojure values
Christophe Grand
 
PPT
R programming by ganesh kavhar
Savitribai Phule Pune University
 
PDF
Clojure: Functional Concurrency for the JVM (presented at OSCON)
Howard Lewis Ship
 
PPT
R Programming for Statistical Applications
drputtanr
 
The Rise of Dynamic Languages
greenwop
 
An Analytics Toolkit Tour
Rory Winston
 
Clojure - An Introduction for Lisp Programmers
elliando dias
 
Clojure made simple - Lightning talk
John Stevenson
 
Clojure Interoperability
rik0
 
SimpleR: tips, tricks & tools
Rob Hyndman
 
Clojure made-simple - John Stevenson
JAX London
 
I know Java, why should I consider Clojure?
sbjug
 
Los Angeles R users group - Nov 17 2010 - Part 2
rusersla
 
Learning notes of r for python programmer (Temp1)
Chia-Chi Chang
 
Continuation Passing Style and Macros in Clojure - Jan 2012
Leonardo Borges
 
The Return of the Living Datalog
Mike Fogus
 
R programming Language
SarthakBhargava7
 
Introduction to R software, by Leire ibaibarriaga
DTU - Technical University of Denmark
 
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
Howard Lewis Ship
 
Clojure made really really simple
John Stevenson
 
Clojure values
Christophe Grand
 
R programming by ganesh kavhar
Savitribai Phule Pune University
 
Clojure: Functional Concurrency for the JVM (presented at OSCON)
Howard Lewis Ship
 
R Programming for Statistical Applications
drputtanr
 
Ad

More from elliando dias (20)

PDF
Clojurescript slides
elliando dias
 
PDF
Why you should be excited about ClojureScript
elliando dias
 
PDF
Functional Programming with Immutable Data Structures
elliando dias
 
PPT
Nomenclatura e peças de container
elliando dias
 
PDF
Geometria Projetiva
elliando dias
 
PDF
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
PDF
Javascript Libraries
elliando dias
 
PDF
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
PDF
Ragel talk
elliando dias
 
PDF
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
PDF
Introdução ao Arduino
elliando dias
 
PDF
Minicurso arduino
elliando dias
 
PDF
Incanter Data Sorcery
elliando dias
 
PDF
Rango
elliando dias
 
PDF
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
PDF
The Digital Revolution: Machines that makes
elliando dias
 
PDF
Hadoop - Simple. Scalable.
elliando dias
 
PDF
Hadoop and Hive Development at Facebook
elliando dias
 
PDF
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 
PDF
FleetDB A Schema-Free Database in Clojure
elliando dias
 
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
elliando dias
 
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
elliando dias
 
Geometria Projetiva
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
Ragel talk
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
elliando dias
 
Minicurso arduino
elliando dias
 
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 
FleetDB A Schema-Free Database in Clojure
elliando dias
 
Ad

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 

From Lisp to Clojure/Incanter and RAn Introduction

  • 1. From Lisp to Clojure/Incanter and R An Introduction Shane M. Conway January 7, 2010
  • 2. “Back to the Future” • The goal of this presentation is to draw some rough comparisons between Incanter and R. • There has been a not insubstantial amount of discussion over the “future of R”. • Ross Ihaka, a co-creater of R, has been especially vocal over his concerns of R’s performance (see his homepage for more detail). In “Back to the Future: Lisp as a Base for a Statistical Computing System” (August 2008) Ihaka and Duncan Temple Lang (of UC Davis and Omegahat) state: “The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implemented. In particular, the limitations of R mean that applications developed there do not scale to the larger problems of interest in practice. We identify some of the limitations of the computational model of the R language that reduces its effectiveness for dealing with large data efficiently in the modern era. We propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community.”
  • 3. Lisp and Fortran • Modern programming languages began primarily with two languages that had different philosophies and goals: Fortran and Lisp. They came from different sides of academia: – Physicists and engineers wanted numeric computations to be run in the most efficient way to solve concrete problems – Mathematicians were interested in algorithmic research for solving more abstract problems • Both R and Clojure are based on the Lisp model of “functional programming” where everything is treated as an object. • The name Lisp comes from "list processing," and it is sometimes said that everything in Lisp is a list.
  • 4. Timeline • Looking at the history of programming languages is complex, as new languages tend to be informed by all prior developments. • 1950/60s: Fortran (54), Lisp (58), Cobol (59), APL (62), Basic (64) • 1970s: Pascal (70), C (72), S (75), SQL (78) • 1980s: C++ (83), Erlang (86), Perl (87) • 1990s: Haskell (90), Python (91), Java (91), R (93), Ruby (93), Common Lisp (94), PHP (95) • 2000s: C# (00), Scala (03), Groovy (04), F# (05), Clojure (07), Go (09)
  • 5. R • S began as a project at Bell Laboratories in 1975, involving John Chambers, Rick Becker, Doug Dunn, Paul Tukey, and Graham Wilkinson. • R is a “Scheme-like” language. R is written primarily in C and Fortran, although it is being extended through other languages (e.g. Java).
  • 6. JVM • The Java Virtual Machine (JVM) is very similar in theory to the Common Language Runtime (CLR) for the .Net framework: it provides a virtual machine for the execution of programs. • Offers memory and other resource management (garbage collection), JIT, a type system. • JVM was designed for Java, but it operates on Java bytecode so it can be used by other languages such as Jython, JRuby, Groovy, Scala, and Clojure.
  • 7. Clojure • Clojure is a Lisp language that runs on the JVM. It was released in 1997 by Rich Hickey, who continues to be the primary contributor. – “Clojure (pronounced like closure) is a modern dialect of the Lisp programming language. It is a general-purpose language supporting interactive development that encourages a functional programming style, and simplifies multithreaded programming. Clojure runs on the Java Virtual Machine and the Common Language Runtime. Clojure honors the code-as-data philosophy and has a sophisticated Lisp macro system.” • Clojure can be used interactively (REPL) or compiled and deployed as an executable. REPL stands for “read-eval-print loop”.
  • 8. Incanter • Incanter is a Clojure-based, R-like platform for statistical computing and graphics, created by David Edgar Liebke. – Incanter “leverages both the power of Clojure, a dynamically-typed, functional programming language, and the rich set of libraries available on the JVM for accessing, processing, and visualizing data. At its core are the Parallel Colt numerics library, a multithreaded version of Colt, the JFreeChart charting library, the Processing visualization library, as well as several other Java and Clojure libraries.” • http://www.jstatsoft.org/v13 “Lisp-Stat, Past, Present and Future” in Journal of Statistical Software Vol. 13, Dec. 2004 • Why Incanter? The primary reason is easy access to Java.
  • 9. Comparison Similarities: Differences: • They can both be used • R requires more effort to interactively (for Clojure: integrate with Java REPL) • R influenced more by C and • They are both functional, Fortran based on Scheme • Clojure can be compiled • Both languages have type • Clojure is not OO, while R inference has S3, S4, and r.oo • “Code as data” • Clojure has many more data types • R is more of a DSL
  • 10. Tradeoffs Advantages: Disadvantages • Clojure runs on the JVM, so • Incanter is very immature in it can reference any Java comparison; there is no library, and can be called by equivalent to CRAN other languages on the JVM • Clojure has 339 questions • Clojure natively deals with on stackoverflow compared concurrency to 562 for R • Vectors/Lists/etc. in Clojure • Clojure/Incanter are each allow you to add/remove primarily developed by 1 person; no Core team
  • 11. Using Clojure/Incanter • Clojure is a set of jars, so it can be used from the command line by calling java. • To use Incanter, just load the desired library into a Clojure session: – (use '(incanter core stats charts)) • Many IDE options: – I use Eclipse for all my development (R: StatET, Python: Pydev, C/C++: CDT: http://code.google.com/p/counterclockwise/ and http://www.ibm.com/developerworks/opensource/library/os-eclipse- clojure/index.html – Using Emacs: http://incanter-blog.org/2009/12/20/getting-started/
  • 12. Hello World • R takes syntax from both Lisp and C. // Java public void hello(String name) { System.out.println("Hello, " + name); } ; Clojure (defn hello [name] (println "Hello," name)) #R hello <- function(name) { print(paste("Hello,“, name)) }
  • 13. Basic Syntax Statements in R use more of a C- (+ 1 2) ; => 3 `+`(1,2) # => 3 like syntax (range 3) ; => (1 2 3) seq(1,3) # => (1 2 3) Getting help (doc functionname) help(functionname) Checking an object type (type objectname) class(objectname) Timing performance (time functioncall) System.time(functioncall) Browsing the workspace (ns-publics 'user) ls() Nagivating the workspace (all-ns) search()
  • 14. Collections Lists [def stooges ["Moe" "Larry" stooges <- c(“Moe”, “Larry”, "Curly" "Shemp"]] “Curly”, “Shemp”) Vectors (def stooges ["Moe" "Larry" stooges <- c(“Moe”, “Larry”, "Curly" "Shemp"]) “Curly”, “Shemp”) Maps (def popsicle-map popsicle.map <- {:red :cherry, :green :apple, list(“red”=“cherry”, :purple :grape}) “green”=“apply”, def popsicle-map “purple”=“grape”) (sorted-map :red :cherry, :green :apple, :purple :grape)) Matrix (does not exist as part of (def A (matrix [[1 2 3] [4 5 6] [7 8 A <- matrix(1:9, nrow=3) Clojure) 9]])) (def A2 (matrix [1 2 3 4 5 6 7 8 9] 3)) Count (count stooges) length(stooges) Filtering (filter #(> (count %) 3) stooges) stooges[nchar(stooges)==3] (some #(= % "Moe") stooges) stooges*stooges==“Moe”+
  • 15. Matrices Matrix (does not exist as part of (def A (matrix [[1 2 3] [4 5 6] [7 8 A <- matrix(1:9, nrow=3) Clojure) 9]])) (def A2 (matrix [1 2 3 4 5 6 7 8 9] 3)) Dimensions (dim A) dim(a) (ncol A) ncol(a) (nrow A) nrow(a) Filtering (use 'incanter.datasets) iris[1,1] (def iris (to-matrix (get-dataset Iris[,-1] :iris))) (sel iris 0 0) (sel iris :rows 0 :cols 0) (sel iris :except-cols 1)
  • 16. Statistics Quantile (quantile (range 10)) quantile(1:10) Sampling (sample (range 100) :size 10) sample(1:100, 10) Mean (mean (range 10)) mean(1:10) Skewness (skewness (range 10)) moments::skewness(rnorm(100) ) Regression (linear-model y x) lm(y ~ x) Correlation (correlation x y) cor(x, y) (correlation matrix) cor(x)
  • 17. Loops • Several different ways to loop in • Some examples of the same Clojure: sequence in R: ;; Version 1 for(i in seq(1, 20, 2)) print(i) (loop [i 1] (when (< i 20) • R also makes heavy usage of the (println i) apply family of functions (Clojure (recur (+ 2 i)))) also has an apply function): ;; Version 2 sapply(seq(1, 20, 2), print) (dorun (for [i (range 1 20 2)] (println i))) • R also has a while() function. ;; Version 3 (doseq [i (range 1 20 2)] (println i))
  • 18. Java and Clojure • Clojure interacts with Java seamlessly. A trivial example: (. javax.swing.JOptionPane (showMessageDialog nil "Hello World")) • Or a slightly more advanced example: (defn fetch-xml [uri] (xml-zip (parse (org.xml.sax.InputSource. (java.io.StringReader. (slurp* (java.net.URI. (re-gsub #"s+" "+" (str uri)))))))))
  • 19. R and Java: RJava • Calling Java code from R (and vice versa) can be done with the RJava package and JRI. #R helloJavaWorld <- function(){ hjw <- .jnew("HelloJavaWorld") # create instance of class out <- .jcall(hjw, "S", "sayHello") # invoke sayHello method return(out) } // Java public class HelloJavaWorld { public String sayHello() { String result = new String("Hello Java World!"); return result; } public static void main(String[] args) { } }
  • 20. Java and R: JRI • JRI allows you to pass R commands to an R console and get results back: import org.rosuda.JRI.Rengine; ... Rengine re=new Rengine(args, false, new Rexecutor()); REXP x; re.eval("data(iris)",false); System.out.println(x=re.eval("iris")); RVector v = x.asVector(); if (v.getNames()!=null) { System.out.println("has names:"); for (Enumeration e = v.getNames().elements() ; e.hasMoreElements() ;) { System.out.println(e.nextElement()); } }
  • 21. Performance • Problem Number 1 from Project Euler: “Find the sum of all the multiples of 3 or 5 below 1000.” I just changed this to be a count instead: user=> (defn divisible-by-3-or-5? [num] divby3or5 <- function(n) { (or (== (mod num 3) 0)(== (mod num n[(n %% 3) == 0 | (n %% 5) == 0] 5) 0))) } user=> (time (println (count (filter system.time(print(length(divby3or5(1:10 divisible-by-3-or-5? (range 000000)))) 10000000))))) 4666667 [1] 4666667 "Elapsed time: 29321.981146 msecs" user system elapsed nil 12.21 0.22 12.70 • This is a trivial example, but R significantly outperformed on this simple operation. • A more thorough benchmarking of Incantor is necessary.
  • 22. Final Thoughts • Clojure/Incanter is a very promising programming language based on Lisp. It provides functional programming with a seamless Java integration and native concurrency. • R has a remarkable user community of dedicated scientists and mathematicians which is continuing to grow. Performance issues can be mitigated by using parallelization (e.g. MPI), and there are efforts to create compilers that promise 10x speed improvements. • Incanter can be used in the place of R for projects that use relatively basic statistics, and that have a reliance on Java (especially for something that is web-based).
  • 23. Resources Some useful resources for Clojure/Incanter • http://clojure.org/ • http://incanter.org/ • http://java.ociweb.com/mark/clojure/article.html • http://en.wikibooks.org/wiki/Clojure_Programming/ For R: • http://r-project.org and http://cran.r-project.org • http://www.stats.uwo.ca/faculty/murdoch/2864/Flourish.pdf Lastly, http://rosettacode.org/ has good examples for both languages.