This document provides an overview of using Clojure for data science. It discusses why Clojure is suitable for data science due to its functional programming capabilities, performance on the JVM, and rich library ecosystem. It introduces core.matrix, a Clojure library that provides multi-dimensional array programming functionality through Clojure protocols. The document covers core.matrix concepts like array creation and manipulation, element-wise operations, broadcasting, and optional support for mutability. It also discusses core.matrix implementation details like the performance benefits of using Clojure protocols.
From Lisp to Clojure/Incanter and RAn Introductionelliando dias
This document provides a comparison between the statistical computing languages R and Clojure/Incanter. It discusses the histories and philosophies behind Lisp, Fortran, R and Clojure. Key differences noted are that Clojure runs on the Java Virtual Machine, allowing it to leverage Java libraries, while R is primarily written in C and Fortran. Incanter is presented as a Clojure-based platform for statistical computing and graphics that is more immature than R but allows easier access to Java capabilities. Basic syntax comparisons are provided.
A talk given to the Bristol Clojurians on 21st April 2015.
The book (print and ebook) is available here: http://cljds.com/cljds-book
This document summarizes a presentation about machine learning. It begins with a definition of machine learning as giving computers the ability to learn without being explicitly programmed. It then provides examples of tasks that machine learning can perform, such as spam filtering and stock market prediction. The document notes that machine learning works to some degree but not perfectly. It introduces a company called Nuroko that is building a machine learning toolkit with certain desirable properties such as being general purpose, powerful, scalable, real-time, and pragmatic. The document explains why the company chose Clojure as its programming language and provides an overview of some key machine learning concepts and abstractions like vectors, coders, tasks, modules, and algorithms. It concludes
Procedural Content Generation with ClojureMike Anderson
This document provides an introduction to procedural content generation with Clojure. It defines procedural content generation as the programmatic generation of content using algorithms, which may incorporate random or pseudo-random processes. It gives some examples of content types like images, music, and game content that can be generated procedurally. The document then introduces Clojure as a functional programming language that runs on the JVM and has data types like keywords, symbols, and immutable collections. It describes the Clisk library for Clojure, which allows functional composition of images. The document provides examples of generating simple images and transforming them using techniques like scaling, offsetting, and warping with noise functions. It demonstrates live coding of images using Clisk.
The document provides an agenda for a two-day workshop on Clojure. Day one covers Clojure overviews and fundamentals including syntax, functions, flow control, and collections. Day two covers additional topics like testing, concurrency, polymorphism, performance, and tooling. The document also provides background on Clojure being a Lisp designed for functional programming and concurrency on the JVM.
Presentation given at the 2013 Clojure Conj on core.matrix, a library that brings muli-dimensional array and matrix programming capabilities to Clojure
The document discusses how abstraction is central to programming and how Clojure is a good language for creating abstractions, noting that Clojure provides primitive expressions, means of combination through functions, and means of abstraction through functions, records, multimethods and protocols to build complex programs from simple ideas.
These are the outline slides that I used for the Pune Clojure Course.
The slides may not be much useful standalone, but I have uploaded them for reference.
This document discusses various iteration techniques in Java including for loops, iterators, and enhanced for loops. It provides examples of iterating over lists, sets, maps, and arrays. It also summarizes common object methods like toString(), equals(), hashCode(), and finalize(). The finalize() method is called by the garbage collector before an object is destroyed to allow for cleanup.
This document provides a concise reference card summarizing key aspects of the Python 2.5 programming language, including variable types, basic syntax, object orientation, modules, exceptions, input/output, and the standard library. It covers topics like numbers, sequences, dictionaries, sets, functions, classes, imports, exceptions, files, and common library modules.
The document outlines topics covered in a NetworkX tutorial, including installation, basic classes, generating graphs, analyzing graphs, saving/loading graphs, and plotting graphs with Matplotlib. Specific sections cover local and cluster installation of NetworkX, adding nodes and edges to graphs along with attributes, basic graph properties like number of nodes/edges and neighbors, simple graph generators, random graph generators, and the algorithms package.
Example of using Kotlin lang features for writing DSL for Spark-Cassandra connector. Comparison Kotlin lang DSL features with similar features in others JVM languages (Scala, Groovy).
The document discusses using Clojure for Hadoop programming. Clojure is a dynamic functional programming language that runs on the Java Virtual Machine. The document provides an overview of Clojure and how its features like immutability and concurrency make it well-suited for Hadoop. It then shows examples of implementing Hadoop MapReduce jobs using Clojure by defining mapper and reducer functions.
19. Java data structures algorithms and complexityIntro C# Book
In this chapter we will compare the data structures we have learned so far by the performance (execution speed) of the basic operations (addition, search, deletion, etc.). We will give specific tips in what situations what data structures to use.
The document discusses using Clojure for Hadoop programming. It introduces Clojure as a new Lisp dialect that runs on the Java Virtual Machine. It then covers Clojure's data types and collections. The remainder of the document demonstrates how to write mappers and reducers for Hadoop jobs using Clojure, presenting three different approaches to defining jobs.
This document summarizes core.logic, a relational logic programming library for Clojure. It provides examples of core.logic concepts like unification, conde, fresh, membero, distincto, everyg, lvar, finite domains, and using core.logic to solve logic puzzles like map coloring, rock paper scissors, cryptarithmetic, and sudoku. Core.logic allows defining relations and facts to constrain logic variables and find all solutions that satisfy the goals.
The document discusses functional programming concepts in Ruby. It begins by stating that functional programming and Enumerable methods can be useful in Ruby. It then provides examples of various Enumerable methods like zip, select, partition, map, and inject. It encourages thinking functionally by avoiding side effects, mutating values, and using functional parts of the standard library. The document concludes by suggesting learning a true functional language to further improve functional programming skills.
The document discusses programming with futures in Java and Scala. It introduces futures in Java 8 using CompletableFuture and shows how they allow composing asynchronous operations without blocking threads. It then discusses how streams and futures in Java 8 share similar composition concepts using thenApply and thenCompose. The talk moves on to introduce more abstract concepts from category theory - monads, foldables and monoids. It shows how these concepts can be implemented for futures and lists to provide generic sequencing and folding of asynchronous and synchronous operations in a precise way.
The document compares TypeScript and Rust by providing examples of common programming concepts like variables, functions, collections, and iterators. It shows how concepts are implemented similarly in both languages, though the syntax differs. Key points covered include declaring immutable and mutable variables, defining and calling functions, working with collections like arrays/vectors through methods like map and filter, and how iterators are implemented and consumed in each language.
Coscup2021 - useful abstractions at rust and it's practical usageWayne Tsai
This document provides a summary of a presentation in Chinese about useful abstractions and syntax in Rust. It begins with an introduction of the speaker and their background. The content covers why Rust is useful, collections and iterators in Rust, the Option and Result enums, and concludes with a discussion of how Rust is being used. Key points include:
- Rust provides memory safety and high performance through its borrowing system and compiler checks
- Collections like vectors can be iterated over and methods like map, filter and collect allow transforming and collecting values
- Option and Result are useful for handling errors and absent values, avoiding panics
- Fast fail validation can be done by chaining Results with and
Rainer Grimm, “Functional Programming in C++11”Platonov Sergey
C++ это мультипарадигменный язык, поэтому программист сам может выбирать и совмещать структурный, объектно-ориентированный, обобщенный и функциональный подходы. Функциональный аспект C++ особенно расширился стандартом C++11: лямбда-функции, variadic templates, std::function, std::bind. (язык доклада: английский).
The Logical Burrito - pattern matching, term rewriting and unificationNorman Richards
The document summarizes key concepts related to unification, including pattern matching, term rewriting, and unification. It provides examples of these concepts in languages like ML, Mathematica, Prolog, and Clojure. Unification allows terms to be matched and variables substituted so that terms become identical. Pattern matching is used for conditional dispatch. Term rewriting uses rule-based substitutions to reduce terms. Prolog demonstrates how unification works for logic programming.
Monads, also known as Kleisli triples in Category Theory, are an (endo-)functor together with two natural transformations, which are surprisingly useful in pure languages like Haskell, but this talk will NOT reference monads. Ever. (Well, at least not in this talk.)
Instead what I intend to impress upon an audience of newcomers to Haskell is the wide array of freely available libraries most of which are liberally licensed open source software, intuitive package management, practical build tools, reasonable documentation (when you know how to read it and where to find it), interactive shell (or REPL), mature compiler, stable runtime, testing tools that will blow your mind away, and a small but collaborative and knowledgeable community of developers. Oh, and some special features of Haskell - the language - too!
Building a website in Haskell coming from Node.jsNicolas Hery
This document summarizes Nicolas Hery's experience building a website in Haskell after coming from a Node.js background. It discusses choosing a web framework in Haskell, using types to document data, handling optional values, refactoring code, and deploying to Docker and Heroku. It also notes both benefits of Haskell like compiler-checked refactoring but also challenges like syntax and documentation.
The document discusses exporting models trained with S4TF to CoreML format in Swift.
It provides code to:
1. Generate Swift data structures from CoreML protobuf definitions to represent models
2. Export an S4TF model defined with layers, weights, and hyperparameters to the CoreML format
3. Compile, make predictions, and perform personalization/training using the exported CoreML model
The personalization process involves:
1. Generating training data
2. Preparing batch providers for input/output
3. Configuring and running a training task on the CoreML model
4. Saving the retrained model
The document suggests automating the export process by extending S
The Ring programming language version 1.7 book - Part 39 of 196Mahmoud Samir Fayed
The document provides documentation on various functions in the Ring programming language stdlib related to string manipulation, lists, stacks, queues, hashtables and other data types and classes. It includes the syntax and examples of using over 45 functions and methods, such as TrimLeft() and TrimRight() to remove spaces from strings, ListAllFiles() to get files in a folder, and classes for common data types like strings, lists, stacks and queues with methods like add(), remove(), sort() etc.
This document summarizes a presentation about the speaker's 3 years of experience using Clojure. Some key points include: the speaker wanted an operationally sane environment with good tooling and performance on the JVM; Clojure provided a productive, concise language with an excellent concurrency story and stability; Clojure's consistent, well-designed core and easy upgrades have made it a fun and motivating language to work with.
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to trully fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way. This presentation is a meditation on how I approach data problems with Clojure, what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Talk delivered at :clojureD 2016 http://www.clojured.de/
This document discusses various iteration techniques in Java including for loops, iterators, and enhanced for loops. It provides examples of iterating over lists, sets, maps, and arrays. It also summarizes common object methods like toString(), equals(), hashCode(), and finalize(). The finalize() method is called by the garbage collector before an object is destroyed to allow for cleanup.
This document provides a concise reference card summarizing key aspects of the Python 2.5 programming language, including variable types, basic syntax, object orientation, modules, exceptions, input/output, and the standard library. It covers topics like numbers, sequences, dictionaries, sets, functions, classes, imports, exceptions, files, and common library modules.
The document outlines topics covered in a NetworkX tutorial, including installation, basic classes, generating graphs, analyzing graphs, saving/loading graphs, and plotting graphs with Matplotlib. Specific sections cover local and cluster installation of NetworkX, adding nodes and edges to graphs along with attributes, basic graph properties like number of nodes/edges and neighbors, simple graph generators, random graph generators, and the algorithms package.
Example of using Kotlin lang features for writing DSL for Spark-Cassandra connector. Comparison Kotlin lang DSL features with similar features in others JVM languages (Scala, Groovy).
The document discusses using Clojure for Hadoop programming. Clojure is a dynamic functional programming language that runs on the Java Virtual Machine. The document provides an overview of Clojure and how its features like immutability and concurrency make it well-suited for Hadoop. It then shows examples of implementing Hadoop MapReduce jobs using Clojure by defining mapper and reducer functions.
19. Java data structures algorithms and complexityIntro C# Book
In this chapter we will compare the data structures we have learned so far by the performance (execution speed) of the basic operations (addition, search, deletion, etc.). We will give specific tips in what situations what data structures to use.
The document discusses using Clojure for Hadoop programming. It introduces Clojure as a new Lisp dialect that runs on the Java Virtual Machine. It then covers Clojure's data types and collections. The remainder of the document demonstrates how to write mappers and reducers for Hadoop jobs using Clojure, presenting three different approaches to defining jobs.
This document summarizes core.logic, a relational logic programming library for Clojure. It provides examples of core.logic concepts like unification, conde, fresh, membero, distincto, everyg, lvar, finite domains, and using core.logic to solve logic puzzles like map coloring, rock paper scissors, cryptarithmetic, and sudoku. Core.logic allows defining relations and facts to constrain logic variables and find all solutions that satisfy the goals.
The document discusses functional programming concepts in Ruby. It begins by stating that functional programming and Enumerable methods can be useful in Ruby. It then provides examples of various Enumerable methods like zip, select, partition, map, and inject. It encourages thinking functionally by avoiding side effects, mutating values, and using functional parts of the standard library. The document concludes by suggesting learning a true functional language to further improve functional programming skills.
The document discusses programming with futures in Java and Scala. It introduces futures in Java 8 using CompletableFuture and shows how they allow composing asynchronous operations without blocking threads. It then discusses how streams and futures in Java 8 share similar composition concepts using thenApply and thenCompose. The talk moves on to introduce more abstract concepts from category theory - monads, foldables and monoids. It shows how these concepts can be implemented for futures and lists to provide generic sequencing and folding of asynchronous and synchronous operations in a precise way.
The document compares TypeScript and Rust by providing examples of common programming concepts like variables, functions, collections, and iterators. It shows how concepts are implemented similarly in both languages, though the syntax differs. Key points covered include declaring immutable and mutable variables, defining and calling functions, working with collections like arrays/vectors through methods like map and filter, and how iterators are implemented and consumed in each language.
Coscup2021 - useful abstractions at rust and it's practical usageWayne Tsai
This document provides a summary of a presentation in Chinese about useful abstractions and syntax in Rust. It begins with an introduction of the speaker and their background. The content covers why Rust is useful, collections and iterators in Rust, the Option and Result enums, and concludes with a discussion of how Rust is being used. Key points include:
- Rust provides memory safety and high performance through its borrowing system and compiler checks
- Collections like vectors can be iterated over and methods like map, filter and collect allow transforming and collecting values
- Option and Result are useful for handling errors and absent values, avoiding panics
- Fast fail validation can be done by chaining Results with and
Rainer Grimm, “Functional Programming in C++11”Platonov Sergey
C++ это мультипарадигменный язык, поэтому программист сам может выбирать и совмещать структурный, объектно-ориентированный, обобщенный и функциональный подходы. Функциональный аспект C++ особенно расширился стандартом C++11: лямбда-функции, variadic templates, std::function, std::bind. (язык доклада: английский).
The Logical Burrito - pattern matching, term rewriting and unificationNorman Richards
The document summarizes key concepts related to unification, including pattern matching, term rewriting, and unification. It provides examples of these concepts in languages like ML, Mathematica, Prolog, and Clojure. Unification allows terms to be matched and variables substituted so that terms become identical. Pattern matching is used for conditional dispatch. Term rewriting uses rule-based substitutions to reduce terms. Prolog demonstrates how unification works for logic programming.
Monads, also known as Kleisli triples in Category Theory, are an (endo-)functor together with two natural transformations, which are surprisingly useful in pure languages like Haskell, but this talk will NOT reference monads. Ever. (Well, at least not in this talk.)
Instead what I intend to impress upon an audience of newcomers to Haskell is the wide array of freely available libraries most of which are liberally licensed open source software, intuitive package management, practical build tools, reasonable documentation (when you know how to read it and where to find it), interactive shell (or REPL), mature compiler, stable runtime, testing tools that will blow your mind away, and a small but collaborative and knowledgeable community of developers. Oh, and some special features of Haskell - the language - too!
Building a website in Haskell coming from Node.jsNicolas Hery
This document summarizes Nicolas Hery's experience building a website in Haskell after coming from a Node.js background. It discusses choosing a web framework in Haskell, using types to document data, handling optional values, refactoring code, and deploying to Docker and Heroku. It also notes both benefits of Haskell like compiler-checked refactoring but also challenges like syntax and documentation.
The document discusses exporting models trained with S4TF to CoreML format in Swift.
It provides code to:
1. Generate Swift data structures from CoreML protobuf definitions to represent models
2. Export an S4TF model defined with layers, weights, and hyperparameters to the CoreML format
3. Compile, make predictions, and perform personalization/training using the exported CoreML model
The personalization process involves:
1. Generating training data
2. Preparing batch providers for input/output
3. Configuring and running a training task on the CoreML model
4. Saving the retrained model
The document suggests automating the export process by extending S
The Ring programming language version 1.7 book - Part 39 of 196Mahmoud Samir Fayed
The document provides documentation on various functions in the Ring programming language stdlib related to string manipulation, lists, stacks, queues, hashtables and other data types and classes. It includes the syntax and examples of using over 45 functions and methods, such as TrimLeft() and TrimRight() to remove spaces from strings, ListAllFiles() to get files in a folder, and classes for common data types like strings, lists, stacks and queues with methods like add(), remove(), sort() etc.
This document summarizes a presentation about the speaker's 3 years of experience using Clojure. Some key points include: the speaker wanted an operationally sane environment with good tooling and performance on the JVM; Clojure provided a productive, concise language with an excellent concurrency story and stability; Clojure's consistent, well-designed core and easy upgrades have made it a fun and motivating language to work with.
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to trully fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way. This presentation is a meditation on how I approach data problems with Clojure, what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Talk delivered at :clojureD 2016 http://www.clojured.de/
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to truly fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way.
This talk is a meditation on the ideal environment for doing data science and how to (almost) get there. I will cover how I approach data problems with Clojure (and why Clojure in the first place), what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk you will learn about a data layer architecture build around Kafka and Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; and the inferences and automations that can be built on top of that.
The document discusses tips for improving developer productivity by optimizing the edit-build-test cycle. It presents several tools that can help speed up and streamline the development workflow, such as PeepOpen for navigating code, Kerl for managing Erlang versions, Rebar for simplifying build processes, Mochiweb Reloader for automatic reloading of code changes, and Sync for automatic compilation and reloading without using the shell. It also recommends writing unit tests with EUnit and Cover to catch errors and measure code coverage. The overall message is that by reducing inefficiencies in the edit-build-test cycle through these kinds of tools, developers can significantly increase the progress they are able to make.
This document provides an overview of Clojure and why one may want to try it. Some key points include:
- Clojure is a functional programming language that runs on the JVM and allows easy interoperability with Java.
- It has a very small and elegant syntax based on Lisp with sensible macro names and prefix notation.
- Clojure encourages pure functional programming and the use of immutable data structures, while providing tools like Software Transactional Memory to allow safe mutable state changes.
- Its focus on functions as first-class citizens and referential transparency can provide benefits for writing parallel and concurrent code more easily compared to other languages.
20 reasons why we don't need architects (@pavlobaron)Pavlo Baron
This document discusses the changing role of software architects and argues that they are no longer needed. It notes that agility has become mainstream and that conflicts between architects and developers are too large. It suggests that the team as a whole can serve as the architect and that what is needed are tools to help with architecture management, workflow, and reproducing architectures across teams. The document questions what an architect should be in this new landscape, listing roles like visionary, chief motivator, and worker.
This document provides an introduction to the Elixir programming language. It discusses what Elixir and Erlang are, how to install Elixir, manage packages and environments with Mix, set up a new project, run code, and work with basic data types like integers, floats, booleans, strings, lists, tuples, and maps. It also covers conditionals, functions, modules, debugging, exercises, and suggests topics to learn next like iterating, mapping, recursion, OTP, Phoenix, Ecto, and Nerves.
Erlang and XMPP can be used together in several ways:
1. Erlang is well-suited for implementing XMPP servers due to its high concurrency and reliability. ejabberd is an example of a popular Erlang XMPP server.
2. The XMPP protocol can be used to connect Erlang applications and allow them to communicate over the XMPP network. Libraries like Jabberlang facilitate writing Erlang XMPP clients.
3. XMPP provides a flexible messaging backbone that can be extended using Erlang modules. This allows Erlang code to integrate with and enhance standard XMPP server functionality.
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaHakka Labs
Mahesh talks about the buddha-nature of Erlang/OTP, pointing out how the various features of the language tie together into one seamless Fault Tolerant whole. Mahesh emphasizes that Erlang begins and ends with Fault Tolerance. Fault Tolerance is baked into the very genes of Erlang/OTP - something that ends up being amazingly useful when building any kind of system. Mahesh Paolini-Subramanya is the V.P. of R&D at Ubiquiti Networks - a manufacturer of disruptive technology platforms for emerging markets. He has spent the recent past building out Erlang-based massively concurrent Cloud Services and VoIP platforms. Mahesh was previously the CTO of Vocalocity after its merger with Aptela, where he was a founder and CTO.
The document discusses Erlang and scalability. It introduces common scalability killers like synchronization and resource contention. It describes Erlang's design decisions that promote scalability, including processes with no sharing, no implicit synchronization, and concurrency-oriented programming. The document provides examples of thinking concurrently, rules of thumb for scalability, and case studies showing how Erlang scales on multicore systems.
What can be done with Java, but should better be done with Erlang (@pavlobaron)Pavlo Baron
Erlang excels at building distributed, fault-tolerant, concurrent applications due to its lightweight process model and built-in support for distribution. However, Java is more full-featured and is generally a better choice for applications that require more traditional object-oriented capabilities or need to interface with existing Java libraries and frameworks. Both languages have their appropriate uses depending on the requirements of the specific application being developed.
The document describes a talk on thinking in the Clojure way. It discusses Clojure's syntax, core functional programming concepts, recursion and lazy sequences. It also covers Clojure's spirit of pragmatism, correctness through immutable data, uniform interfaces, and use of sequences as computation media. Mutation is handled through reference types that embody well-defined patterns. The document encourages thinking functionally by avoiding imperative and object-oriented habits, and provides Conway's Game of Life as an example of rethinking a typically loop-based problem functionally.
VoltDB and Erlang: two very promising beasts, made for the new parallel world, but still lingering in the wings. Not only are they addressing todays challenges but they are using parallel architectures as corner stone of their new and surprising approach to be faster and more productive. What are they good for? Why are we working to team them up?
Erlang promises faster implementation, way better maintenance and 4 times shorter code. VoltDB claims to be two orders of magnitude faster than its competitors. The two share many similarities: both are the result of scientific research and designed from scratch to address the new reality of parallel architectures with full force.
This talk presents the case for Erlang as server language, where it shines, how it looks, and how to get started. It details Erlang's secret sauce: microprocesses, actors, atoms, immutable variables, message passing and pattern matching. (Note: for a longer version of this treatment of Erlang only see: Why Erlang? http://www.slideshare.net/eonblast/why-erlang-gdc-online-2012)
VoltDB's inner workings are explained to understand why it can be so incredibly fast and still better than its NoSQL competitors. The well publicized Node.js benchmark clocking in at 695,000 transactions per second is described and the simple steps to get VoltDB up and running to see the prodigy from up close.
Source examples are presented that show Erlang and VoltDB in action.
The speaker is creator and maintainer of the Erlang VoltDB driver Erlvolt.
Ruben Amortegui discusses his journey from Perl to Elixir. He was drawn to Elixir for building real-time and IoT applications due to its ability to scale and its functional programming syntax similar to Ruby. He ported his Perl e-commerce cart plugin EcCart to Elixir to familiarize himself with Elixir and functional programming concepts. The ported version of EcCart is now available on Hex.pm and its code is on GitHub. Amortegui found Elixir and its tools like Phoenix, Ecto and Hex made him productive and he enjoys Elixir's approach to functional programming.
Clojure is a LISP-like programming language that runs on the Java Virtual Machine. It was created in 2007 by Rich Hickey and is currently at version 1.1. Clojure is a functional, immutable, and concurrency-oriented language. It features LISP syntax, macros, immutability, functional programming, and easy interoperability with Java. Data structures in Clojure are code, allowing homoiconicity. Clojure also supports lazy sequences, STM-based concurrency without locks, and dynamic behavior via its REPL.
The document is a presentation about Elixir for aspiring Erlang developers. It begins with background information, stating the presentation is aimed at computer science students who have taken a course on Erlang. It then covers an introduction to Elixir and some key differences between Erlang and Elixir such as syntax and semantics. The presentation demonstrates how to create a simple "Hello World" application in Elixir and also how to build a simple web page using Phoenix. It concludes by discussing when it may be appropriate to use Elixir versus other options.
Introduction to Erlang for Python ProgrammersPython Ireland
What is Erlang? Why it is important? Why should Python programmers learn Erlang? How is Erlang different? How is Erlang the same? These and other questions will be answered during this talk, as well as this one: Should Erlang be the new programming language you learn this year?
(defrecord Assistant [name id])
(updatePersonalInfo )
Manager:
(defrecord Manager [name id employees])
(raise )
(extend-type Assistant Employee
(roles [this] "assistant"))
(extend-type Manager Employee
(roles [this] (str "manager of " (count employees))))
85
The Expression Problem
86
The Expression Problem
Add a new
data type
Add a new
operation
Without changing:
- Existing data types
- Existing operations
87
The Expression Problem
Add Employee
Add raise()
Without changing:
- Assistant
The document discusses data structures and algorithms. It defines data structures as a way of organizing data that considers both the items stored and their relationship. Common data structures include stacks, queues, lists, trees, and graphs. Linear data structures store data in a sequence, while non-linear data structures have no inherent sequence. The document also defines algorithms as finite sets of instructions to accomplish tasks and discusses properties like input, output, definiteness, and termination. Common algorithms manipulate linear data structures like arrays and linked lists.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integer types like byte, short, int that can store negative numbers using two's complement. Floating point types are represented as fractions and exponents. Boolean types are either true or false. Character types are stored as numeric codes. String types can have static, limited dynamic, or fully dynamic lengths. User-defined types like enumerations and subranges are also covered. The document also discusses array types including their initialization, operations, and implementation using row-major and column-major ordering. Associative arrays are described as unordered collections indexed by keys. Record and union types are summarized.
This document discusses data structures and algorithms. It begins by defining an algorithm as a set of steps to solve a problem and discusses their key properties. Common data structures like arrays and lists are introduced. Arrays can be one-dimensional, two-dimensional or multi-dimensional. They are stored in memory in either row-major or column-major order. Common applications of data structures discussed include sparse matrices and ordered lists.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integers, floating point numbers, complex numbers, decimals, booleans, characters, and strings. It also covers array types like static, dynamic, and associative arrays. Other topics include records, slices, and unions.
This document contains information about a Data Structures and Algorithms course taught by Professor Yusuf Sahillioğlu at Middle East Technical University. It provides details about the course objectives, textbook, grading breakdown, course outline covering topics like sorting, lists, trees and graphs, and motivational examples demonstrating how data structures can be used to efficiently store and process data. It also introduces some basic C++ concepts like classes, objects, encapsulation and information hiding that will be used in the course.
This document contains information about a Data Structures and Algorithms course taught by Associate Professor Yusuf Sahillioğlu at Middle East Technical University. It provides details about the course instructor, textbook, grading breakdown, course outline covering topics like sorting, lists, trees and graphs, and motivational examples demonstrating how data structures can be used to efficiently store and process data. It also includes an introduction to programming in C++ covering object-oriented concepts like classes, objects, encapsulation and information hiding.
The document discusses various data structures and their operations. It begins with an introduction to linear data structures like arrays, linked lists, stacks and queues. It describes array implementation using sequential storage and linked list implementation using pointers. Common operations on these structures like traversal, insertion, deletion and searching are discussed. The document also covers non-linear data structures like trees and graphs and basic tree traversals. It provides examples of applications of different data structures and concludes with definitions of key terms.
Mca ii dfs u-1 introduction to data structureRai University
This document provides an introduction to data structures. It defines data structures as a way of organizing and storing data in a computer so that it can be used efficiently. The document discusses different types of data structures including primitive, non-primitive, linear and non-linear structures. It provides examples of various data structures like arrays, linked lists, stacks, queues and trees. It also covers important concepts like time complexity, space complexity and Big O notation for analyzing algorithms. Common operations on data structures like search, insert and delete are also explained.
Bca ii dfs u-1 introduction to data structureRai University
This document provides an introduction to data structures. It defines data structures as a way of organizing and storing data in a computer so it can be used efficiently. There are two main types: primitive data structures like integers and characters that are directly operated on by the CPU, and non-primitive structures like arrays and linked lists that are more complex. Key aspects of data structures covered include operations, properties, performance analysis using time and space complexity, and examples of linear structures like arrays and non-linear structures like trees. Common algorithms are analyzed based on their asymptotic worst-case running times.
This document discusses the importance of algorithms and data structures in computer science. It covers common topics in the study of algorithms and data structures including data types, collections, data structures, algorithms, and choosing appropriate data structures and algorithms to solve problems. Key areas covered include linear data structures, trees, graphs, algorithm classification, common algorithm design strategies, and classic algorithms.
The document provides an introduction to the Clojure programming language. It discusses that Clojure is a functional Lisp dialect that runs on the Java Virtual Machine. It extends the principle of code-as-data to include maps and vectors in addition to lists. The document also provides an overview of Clojure's core data structures, functions, concurrency features like atoms and agents, and how to get started with Clojure.
- Arrays revisited
- Value and Reference Semantics of Elements
- A Way to categorize Collections
- Indexed Collections
-- Lists
-- Basic Features and Examples
-- Size and Capacity
This document provides an introduction to data structures and algorithms. It defines data structures as a way of organizing data that considers both the items stored and their relationship. Common data structures include stacks, queues, lists, trees, graphs, and tables. Data structures are classified as primitive or non-primitive based on how close the data items are to machine-level instructions. Linear data structures like arrays and linked lists store data in a sequence, while non-linear structures like trees and graphs do not rely on sequence. The document outlines several common data structures and their characteristics, as well as abstract data types, algorithms, and linear data structures like arrays. It provides examples of one-dimensional and two-dimensional arrays and how they are represented in
Clojure - An Introduction for Lisp Programmerselliando dias
Clojure is a dynamic programming language for the JVM that is Lisp-based and emphasizes functional programming with an emphasis on immutability and built-in support for concurrency. It was designed to expose and embrace the capabilities of the JVM platform while also addressing limitations of other Lisps like Common Lisp and Scheme.
This document discusses compiler architecture and intermediate code generation. It begins by describing the typical phases of a compiler: parsing, static checking, and code generation. It then discusses intermediate code, which ties the front end and back end phases together and is language and machine independent. Various forms of intermediate code are described, including trees, postfix notation, and triple/quadruple intermediate code. The rest of the document focuses on triple/quadruple code, including how it represents expressions, statements, addressing of arrays, and the translation process from source code to triple/quadruple intermediate code.
The document provides an overview of the course contents for Discrete Mathematical Structures. The course covers topics such as mathematical logic, relations, combinatorics, recurrence relations, graph theory, group theory, formal languages, and finite automata over 42 lectures. The course aims to introduce foundational mathematical concepts and techniques that are applicable to computer science and software engineering.
How to Choose the Right Online Proofing Softwareskalatskayaek
This concise guide walks you through the essential factors to evaluate when selecting an online proofing solution. Learn how to compare collaboration features, file-format support, review workflows, integrations, security, and pricing—helping you choose the right proofing software that streamlines feedback, accelerates approvals, and keeps your creative projects on track. Visit cwaysoftware.com for more information and to explore Cway Software’s proofing tools.
Internal Architecture of Database Management SystemsM Munim
A Database Management System (DBMS) is software that allows users to define, create, maintain, and control access to databases. Internally, a DBMS is composed of several interrelated components that work together to manage data efficiently, ensure consistency, and provide quick responses to user queries. The internal architecture typically includes modules for query processing, transaction management, and storage management. This assignment delves into these key components and how they collaborate within a DBMS.
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...smrithimuralidas
The Data Science Course at Yale IT Skill Hub in Coimbatore provides in-depth training in data analysis, machine learning, and AI using Python, R, SQL, and tools like Tableau. Ideal for beginners and professionals, it covers data wrangling, visualization, and predictive modeling through hands-on projects and real-world case studies. With expert-led sessions, flexible schedules, and 100% placement support, this course equips learners with skills for Coimbatore’s booming tech industry. Earn a globally recognized certification to excel in data-driven roles. The Data Analytics Course at Yale IT Skill Hub in Coimbatore offers comprehensive training in data visualization, statistical analysis, and predictive modeling using tools like Power BI, Tableau, Python, and R. Designed for beginners and professionals, it features hands-on projects, expert-led sessions, and real-world case studies tailored to industries like IT and manufacturing. With flexible schedules, 100% placement support, and globally recognized certification, this course equips learners to excel in Coimbatore’s growing data-driven job market.
Understanding Tree Data Structure and Its ApplicationsM Munim
A Tree Data Structure is a widely used hierarchical model that represents data in a parent-child relationship. It starts with a root node and branches out to child nodes, forming a tree-like shape. Each node can have multiple children but only one parent, except for the root which has none. Trees are efficient for organizing and managing data, especially when quick searching, inserting, or deleting is needed. Common types include **binary trees**, **binary search trees (BST)**, **heaps**, and **tries**. A binary tree allows each node to have up to two children, while a BST maintains sorted order for fast lookup. Trees are used in various applications like file systems, databases, compilers, and artificial intelligence. Traversal techniques such as preorder, inorder, postorder, and level-order help in visiting all nodes systematically. Trees are fundamental to many algorithms and are essential for solving complex computational problems efficiently.
Content Moderation Services_ Leading the Future of Online Safety.docxsofiawilliams5966
These services are not just gatekeepers of community standards. They are architects of safe interaction, unseen defenders of user well-being, and the infrastructure supporting the promise of a trustworthy internet.
15 Benefits of Data Analytics in Business Growth.pdfAffinityCore
Explore how data analytics boosts business growth with insights that improve decision-making, customer targeting, operations, and long-term profitability.
delta airlines new york office (Airwayscityoffice)jamespromind
Visit the Delta Airlines New York Office for personalized assistance with your travel plans. The experienced team offers guidance on ticket changes, flight delays, and more. It’s a helpful resource for those needing support beyond the airport.
7. 7
Modern array programming
Standalone environment for statistical
programming / graphics
Python library for array programming
A new language (2012) based on
array programming principles
.... and many others
8. 8
"It is better to have 100 functions
operate on one data structure than
10 functions on 10 data structures."
—Alan Perlis
abstraction
Design wisdom
9. 9
What is an array?
0 1 2
0 1 2
3 4 5
6 7 8
1
2
3
Dimensions Example
Vector
Matrix
3D Array
(3rd order Tensor)
Terminology
N ND Array
0 1 2
3 4 5
6 7 8
0 1 2
3 4 5
6 7 8
0 1 2
3 4 5
6 7 8
...
...
10. 10
Multi-dimensional array properties
0 1 2
3 4 5
6 7 8
0
1
2
0 1 2
Dimension 0
Dimension 1
Dimensions
(ordered and
indexed)
Each of the array
elements is a
regular value
Dimension sizes
together define
the shape of the
array
(e.g. 3 x 3)
11. 11
Arrays = data about relationships
(foo :A :T) => 2
0 1 2 3
4 5 6 7
8 9 10 11
:A
:B
:C
:R :S :T
Set X
Set Y
Each element is a
fact about a
relationship
between a value in
Set X and a value in
Set Y
ND array lookup is analogous to arity-N functions!
:U
12. 12
Why arrays instead of functions?
0 1 2
3 4 5
6 7 8
0
1
2
0 1 2
vs. (fn [i j]
(+ j (* 3 i)))
1. Precomputed values with O(1) access
2. Efficient computation with optimised bulk
operations
3. Data driven representation
13. 13
Principle of array programming:
generalise operations on regular (scalar) values to
multi-dimensional data
(+ 1 2) => 3
(+ ) => 2
14. 14
Contents
Why Clojure for Data Science
Array Programming Essentials
core.matrix
Library Ecosystem Overview
Examples and discussion
20. 20
Array creation
;; Build an array from a sequence
(array (range 5))
=> [0 1 2 3 4]
;; ... or from nested arrays/sequences
(array
(for [i (range 3)]
(for [j (range 3)]
(str i j))))
=> [["00" "01" "02"]
["10" "11" "12"]
["20" "21" "22"]]
21. 21
Shape
;; Shape of a 3 x 2 matrix
(shape [[1 2]
[3 4]
[5 6]])
=> [3 2]
;; Regular values have no shape
(shape 10.0)
=> nil
22. 22
Dimensionality
;; Dimensionality = number of dimensions
;; = length of shape vector
;; = nesting level
(dimensionality [[1 2]
[3 4]
[5 6]])
=> 2
(dimensionality [1 2 3 4 5])
=> 1
;; Regular values have zero dimensionality
(dimensionality “Foo”)
=> 0
23. 23
Scalars vs. arrays
(array? [[1 2] [3 4]])
=> true
(array? 12.3)
=> false
(scalar? [1 2 3])
=> false
(scalar? “foo”)
=> true
Everything is either an array or a scalar
A scalar works as like a 0-dimensional array
30. 30
Broadcasting Rules
1. Designed for elementwise operations
- other uses must be explicit
2. Extends shape vector by adding new leading
dimensions
• original shape [4 5]
• can broadcast to any shape [x y ... z 4 5]
• scalars can broadcast to any shape
3. Fills the new array space by duplication of the original
array over the new dimensions
4. Smart implementations can avoid making full copies by
structural sharing or clever indexing tricks
38. 38
Mutability – the tradeoffs
Avoid mutability. But it’s an option if you really need it.
Pros Cons
Faster
Reduces GC pressure
Standard in many existing
matrix libraries
✘ Mutability is evil
✘ Harder to maintain / debug
✘ Hard to write concurrent code
✘ Not idiomatic in Clojure
✘ Not supported by all core.matrix
implementations
✘ “Place Oriented Programming”
39. 39
Mutability – performance benefit
28
120
0 50 100 150
Mutable add!
Immutable add
Time for addition of vectors* (ns)
* Length 10 double vectors, using :vectorz implementation
4x
performance benefit
40. 40
Mutability – syntax
A core.matrix function name ending with “!” performs mutation
(usually on the first argument only)
(add [1 2] 1)
[2 3]
(add! [1 2] 1)
=> RuntimeException ...... not mutable!
(def a (mutable [1 2])) ;; coerce to a mutable format
=> #<Vector2 [1.0,2.0]>
(add! a 1)
=> #<Vector2 [2.0,3.0]>
44. 44
Lots of trade-offs
Native Libraries vs. Pure JVM
Mutability vs. Immutability
Specialized elements (e.g.
doubles)
vs. Generalised elements (Object,
Complex)
Multi-dimensional vs. 2D matrices only
Memory efficiency vs. Runtime efficiency
Concrete types vs. Abstraction (interfaces / wrappers)
Specified storage format vs. Multiple / arbitrary storage formats
License A vs. License B
Lightweight (zero-copy) views vs. Heavyweight copying / cloning
45. 45
What’s the best data structure?
0 1 2 3 .. 49Length 50 “range” vector:
2. Java double[] array
new double[]
{0, 1, 2, …. 49};
1. Clojure Vector
[0 1 2 …. 49]
3. Custom deftype
(deftype RangeVector
[^long start
^long end])
4. Native vector format
(org.jblas.DoubleMatrix.
params)
48. 48
Clojure Protocols
(defprotocol PSummable
"Protocol to support the summing of all elements in
an array. The array must hold numeric values only,
or an exception will be thrown."
(element-sum [m]))
clojure.core.matrix.protocols
1. Abstract Interface
2. Open Extension
3. Fast Dispatch
49. 49
Protocols are fast and open
89
13.8
7.9
1.9
1.2
0 20 40 60 80 100
Multimethod*
Protocol call
Boxed function call
Primitive function call
Static / inlined code
Open extensionFunction call costs (ns)
✓
✓
✘
✘
✘
* Using class of first argument as dispatch function
50. 50
Typical core.matrix call path
core.matrix
API
(matrix.clj)
(defn esum
"Calculates the sum of all the elements in a
numerical array."
[m]
(mp/element-sum m))
User Code
(esum [1 2 3 4])
Impl.
code
(extend-protocol mp/PSummable
SomeImplementationClass
(element-sum [a]
………))
51. 51
Most protocols are optional
MANDATORY
Required for a working core.matrix implementation
PImplementation
PDimensionInfo
PIndexedAccess
PIndexedSetting
PMatrixEquality
PSummable
PRowOperations
PVectorCross
PCoercion
PTranspose
PVectorDistance
PMatrixMultiply
PAddProductMutable
PReshaping
PMathsFunctionsMutable
PMatrixRank
PArrayMetrics
PAddProduct
PVectorOps
PMatrixScaling
PMatrixOps
PMatrixPredicates
PSparseArray
…..
OPTIONAL
Everything in the API will work without these
core.matrix provides a “default implementation”
Implement for improved performance
52. 52
Default implementations
(extend-protocol mp/PSummable
Number
(element-sum [a] a)
Object
(element-sum [a]
(mp/element-reduce a +)))
clojure.core.matrix.impl.default
Protocol name - from namespace
clojure.core.matrix.protocols
Implementation for any Number
Implementation for an arbitrary Object
(assumed to be an array)
53. 53
Extending a protocol
(extend-protocol mp/PSummable
(Class/forName "[D")
(element-sum [m]
(let [^doubles m m]
(areduce m i res 0.0 (+ res (aget m i))))))
Class to implement protocol for, in
this case a Java array : double[]
Optimised code to add up all the
elements of a double[] array
Add type hint to avoid reflection
54. 54
15-20x
benefit
Speedup vs. default implementation
201
2859
3690
0 1000 2000 3000 4000
(esum v)
"Specialised"
(reduce + v)
(esum v)
"Default"
Timing for element sum of length 100 double array
(ns)
55. 55
Internal Implementations
Implementation Key Features
:persistent-vector Support for Clojure vectors
Immutable
Not so fast, but great for quick testing
:double-array Treats Java double[] objects as 1D arrays
Mutable – useful for accumulating results etc.
:sequence Treats Clojure sequences as arrays
Mostly useful for interop / data loading
:ndarray
:ndarray-double
:ndarray-long
.....
Google Summer of Code project by Dmitry Groshev
Pure Clojure
N-Dimensional arrays similar to NumPy
Support arbitrary dimensions and data types
:scalar-wrapper
:slice-wrapper
:nd-wrapper
Internal wrapper formats
Used to provide efficient default implementations for
various protocols
57. 57
External Implementations
Implementation Key Features
vectorz-clj Pure JVM (wraps Java Library Vectorz)
Very fast, especially for vectors and small-medium
matrices
Most mature core.matrix implementation at present
Clatrix Use Native BLAS libraries by wrapping the Jblas library
Very fast, especially for large 2D matrices
Used by Incanter
parallel-colt-matrix Wraps Parallel Colt library from Java
Support for multithreaded matrix computations
arrayspace Experimental
Ideas around distributed matrix computation
Builds on ideas from Blaze, Chapele, ZPL
image-matrix Treats a Java BufferedImage as a core.matrix array
Because you can?
59. 59
Mixing implementations
(def A (array :persistent-vector (range 5)))
=> [0 1 2 3 4]
(def B (array :vectorz (range 5)))
=> #<Vector [0.0,1.0,2.0,3.0,4.0]>
(* A B)
=> [0.0 1.0 4.0 9.0 16.0]
(* B A)
=> #<Vector [0.0,1.0,4.0,9.0,16.0]>
core.matrix implementations can be mixed
(but: behaviour depends on the first argument)
60. 60
Contents
Why Clojure for Data Science
Array Programming Essentials
core.matrix
Library Ecosystem Overview
Examples and discussion
61. 61
Data Science Libraries for Clojure
• Still not as mature as R or Python, but developing rapidly
• Clojure philosophy of small libraries rather than all-encompassing
frameworks
• Key areas:
• Interactive environments
• Visualisation
• Databases / data access
• Realtime data processing
• Machine Learning
63. 63
Library Description
quil Clojure interface to the Processing
library/environment for dynamic
visualisations
gyptis Clojure + ClojureScript library for
producing Vega.js graphs
imagez Library for generating and
manipulation bitmap images
Visualisation
64. 64
Library Description
Datomic Awesome database supporting
immutable “time travel” over
database history. Great scalability
for reads / analytics
java.jdbc Clojure library for access to SQL
databases. Mature workhorse
Yesql Arguably better way to do SQL in
Clojure
Sparkling Clojure library for Apache Spark
flambo Clojure library for Apache Spark
Cascalog Clojure library for querying and data
processing with Apache Hadoop
many, many, more.....
Databases / data access
65. 65
Library Description
Storm Mature, stream processing librray
for highly scalable realtime
computation over large distribute
clusters of compute nodes
Onyx More modern / better designed
alternative to Storm with growing
traction
core.async “Roll your own” concurrent data
processing pipelines
Realtime Data Processing
66. 66
Library Description
clj-ml Wrapper for the popular and venerable “Weka”
machine learning library for Java
enclog Wrapper for the “Encog” machine learning library
Clortex /
Comportex
Libraries implementing Numenta’s Hierarchical
Temporary Memory model
synaptic Basic neural networks in Clojure
State of the art “Deep Learning” library
Machine Learning
67. 67
Contents
Why Clojure for Data Science
Array Programming Essentials
core.matrix
Library Ecosystem Overview
Examples and discussion
#4: When I say language extension, it is of course in the sense that Clojure seems to have this ability to absorb new paradigms just by plugging in new libraries.
Clojure already stole many good pure functional programming techniques from languages like Haskell
And of course we have the macro meta-programming capabilities from Lisp
More recently we’ve got core.logic bringing in Logic programming, inspired by Prolog and miniKanren
And core.async bringing in the Communicating Sequential Processes with some syntax similar to Go
And core.matrix is designed very much in the same way, to provide array programming capabilities. And if we want to trace the roots of array programming, we can go all the way back to this language called APL
#6: When I say language extension, it is of course in the sense that Clojure seems to have this ability to absorb new paradigms just by plugging in new libraries.
Clojure already stole many good pure functional programming techniques from languages like Haskell
And of course we have the macro meta-programming capabilities from Lisp
More recently we’ve got core.logic bringing in Logic programming, inspired by Prolog and miniKanren
And core.async bringing in the Communicating Sequential Processes with some syntax similar to Go
And core.matrix is designed very much in the same way, to provide array programming capabilities. And if we want to trace the roots of array programming, we can go all the way back to this language called APL
#7: About the same age as Lisp? First specified in 1958
Love the fact that it has its own keyboard, with all these symbols inspired by mathematical notation
And you get some crazy code.
Might seem like a bit of a dinosaur new
#8: Array programming has had quite a renaissance in recent years.
This is because of the increasing important of data science and numerical computing in many fields
- So we’ve seen languages like R that provide an environment for statistical computing
Highlight value of paradigm – clearly a demand for these kind of numerical computing capabilities
#9: Start off with one of my favourite quotes, because it contains a pretty important insight.
“It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures”
There is of course one error here….. (click)
We should of course be talking about an abstraction here, not a concrete data structure.
A great example of this is the sequence abstraction in Clojure – there are literally hundreds of functions that operate on Clojure sequences. Because so many functions produce and consume sequences, it gives you many different ways to compose then together.
And it’s more than just the clojure.core API: other code can build on the same abstraction, which means that the composability extends to any code you write that uses the same abstraction. It makes entire libraries composable.
In some ways I think the key to building systems using simple, composable components is about having shared abstractions.
We’ve taken this principle very much to heart in core.matrix, our abstraction of course is the array - more specifically the multi-dimensional array
And the rest of core.matrix is really all about giving you a powerful set of composable operations you can do with arrays
#10: Overloaded terminology!
- Vector = 1D array (maths / array programming sense) – Also a Clojure vector
- Matrix: conventionally used to indicate a 2 dimensional numerical array,
- Array: in the sense of the N-dimensional array, but also the specific concrete example of a Java array
Dimensions: also overloaded! Here using in the sense of the number of dimensions in an array, but it’s also used to refer to the number of dimensions in a vector space, e.g. 3 dimensional Euclidean space.
If we’re lucky it should be clear from the context what we’re talking about.
#12: Give you an idea about how general array programming can be –
An array is a way of representing a function using data
Instead of computing a value for each combination of inputs, we’re typically pre-computing all such values
#13: Give you an idea about how general array programming can be –
An array is a way of representing a function using data
Instead of computing a value for each combination of inputs, we’re typically pre-computing all such values
#16: Today I’m going to be talking about core.matrix, and it’s quite appropriate that I’m talking about it here today at the Clojure Conj because this project actually came about as a direct result of conversations I had with many people at last year’s Conj
The focus of those discussions was very much about how we could make numerical computing better in Clojure.
And the solution I’ve been working on over the past year along with a number of collaborators is core.matrix, which offers array programming as a language extension to Clojure
#17: Example of adding a 3D array.
Java it’s just a big nested loop…
Clojure you can do it with nested maps, which is a bit more of a functional style, but still you’ve got this three-level nesting
With core.matrix it’s really simple. We just generalise + to arbitrary multi-dimensional arrays and it all just works
Does conciseness matter? Well if you’re writing a lot of code manipulating arrays it’s going to save you quite a bit of time, but more importantly it makes it much easier to avoid errors. Very easy to get off-by-one errors in this kind of code.
core.matrix gives you a nice DSL that does all the index juggling for you
Also it helps you to be mentally much closer to the problem that you are modelling. You ideally want an API that reflects the way that you think about the problem you are solving.
#18: So today I’m going to talk about core.matrix with three different lenses
First I want to talk about the abstraction – what are these arrays?
Then I’m going to talk about the core.matrix API
Implementation: how does this all work, some of the engineering choices we’ve made
#19: So lets talk about the core.matrix API.
This isn’t going to be an exhaustive tour, but I’m going to highlight a few of the key features to give you a taste of what is possible
#20: One of the important API design objectives was to exploit the “natural equivalence of arrays to nested Clojure vectors”.
1D array is a Clojure vector, 2D array is like a vector of vectors
Most things in the core.matrix API work with nested Clojure vectors.
This is nice – gives a natural syntax, and great for dynamic, exploratory work at the REPL.
#21: The most fundamental attribute of an array is probably the shape
#22: The most fundamental attribute of an array is probably the shape
#26: Arrays are compositions of arrays!
This is one of the best signs that you have a good abstraction: if the abstraction can be recursively defined as a composition of the same abstraction.
#27: So of course we have quite a few different functions that let you work with slices of arrays.
Most useful is probably the slices function, which cuts an array into a sequence of its slices
Pretty common to want to do this – imagine if each slice is a row in your data set
#28: We define array versions of the common mathematical operators.
These use the same names as clojure.core
You have to use the clojure.core.matrix.operators namespace if you want to use these names instead of the standard clojure.core operators
#29: Question: what should happen if we add a scalar number to an array?
We have a feature called broadcasting, which allows a lower dimensional array to be treated as a higher dimensional array
#30: The idea of broadcasting also generalises to arrays!
Here the semantics is the same, we just duplicate the smaller array to fill out the shape of the larger array
#31: So we have some rules for broadcasting
Note that it only really makes sense for elementwise operations. You can broadcast arrays explicitly if you want to to, but it only happens automatically for elementwise operations at present.
Can only add leading dimensions.
#32: So lets talk about some higher order functions
Two of my favourite Clojure functions – map and reduce are extremely useful higher order functions
#33: So one of the interesting observations about array programming is that you can also see it as a generalisation of sequences in multiple dimensions, so it probably isn’t too surprising that many of the sequence functions in Clojure actually have a nice array programming equivalent
emap is the equivalent of map, it maps a function over all elements of an array – the key difference is that is preserves the structure of the array so here we’re mapping over a 2x2 matrix, and therefore we get a 2x2 result
ereduce is the equivalent of reduce over all elements
eseq is a handy bridge between core.matrix arrays and regular Clojure sequences – it just returns all the elements of an array in order
Note row-major ordering of eseq and ereduce
#39: Basically mutability is horrible. You should be avoiding it as much as you can
But it turns out that it is needed in some cases – performance matters for numerical work
Mutability OK for library implementers, e.g. accumulation of a result in a temporary array
Once a value is constructed, shouldn’t be mutated any more
#40: Usually 4x performance benefit isn’t a big deal – unless it happens to be your bottleneck
There are cases where it might be important: e.g. if you are crunching through a lot of data and need to add to some sort of accumulator…
#41: Mutability OK for library implementers, e.g. accumulation of a result in a temporary array
Once a value is constructed, shouldn’t be mutated any more
#44: Clearly this is insane – why so many matrix libraries?
#45: This explains the problem. But doesn’t really help us….
#47: The point is – there isn’t ever going to be a perfect right answer when choosing a concrete data type to implement an abstraction.
There are always going to be inherent advantages of different approaches
#48: Luckily we have a secret weapon, and I think this is actually what really distinguishes core.matrix from all other array programming systems
#49: Of course the secret weapon is Clojure protocols.
Here’s an example – PSummable protocol is a very simple protocol that allows to to compute the sum of all values in an array
Three things are important to know about
First is that they define an abstract interface – which is exactly what we need to define operations that work on our array abstraction
Secondly they feature open extension: which means that we can solve the expression problem and use protocols with arbitrary types – importantly, this includes types that weren’t written with the protocol in mind – e.g. arbitrary Java classes
Third feature is really fast dispatch – which is important if we want to core.matrix to be useful in high performance situations.
#50: Protocols are really the “sweet spot” of being both fast and open
We benchmarked a pretty wide variety of different function calls
#52: It’s easy to make a working core.matrix implementation!
It’s more work if you want to make it perfom across the whole API
But that’s OK because it can be done incrementally
So hopefully this provides a smooth development path for core.matrix implementations to integrate
#53: The secret is having default implementations for all protocols, that get used if you haven’t extended the protocol for your particular type
Note that the default implementation delegates to another protocol call – this is generally the case, ultimately all these protocol calls have to be implemented in terms of the lower-level mandatory protocols if we want them to work on any array.
#57: Makes some operations very efficient
- For example if you want to transpose an NDArray, you just need to reverse the shape and reverse the strides.
#58: vectorz-clj: probably the best choice if you want general purpose double numerics
clatrix: probably the best choice if you want linear algebra with big matrices
#60: Not only can you switch implementation: you can also mix them!
Actually quite unique capability
How do we do this? Provide generic coercion functionality – so implementations typically use this to coerce second argument to type of the first