SlideShare a Scribd company logo
Crab
                A Python Framework for Building
                    Recommendation Engines
                       Scipy 2011, Austin TX


Marcel Caraciolo Ricardo Caspirro             Bruno Melo
   @marcelcaraciolo       @ricardocaspirro        @brunomelo

                                                               1
What is Crab ?

 A python framework for building recommendation engines
A Scikit module for collaborative, content and hybrid filtering
       Mahout Alternative for Python Developers :D
             Open-Source under the BSD license


             https://github.com/muricoca/crab




                                                                 2
When started ?

It began one year ago
Community-driven, 4 members
Since April,2011 the open-source labs Muriçoca incorporated it
Since April,2011 rewritting it as Scikit




                https://github.com/muricoca/
                                                             3
Knowing Scikits
Scikits are Scipy Toolkits - independent and projects hosted
                under a common namespace.


                       Scikits Image
                     Scikits MlabWrap
                     Scikits AudioLab
                      Scikit Learn
                             ....

           http://scikits.appspot.com/scikits



                                                               4
Knowing Scikits

                        Scikit-Learn

    Machine Learning Algorithms + scientific Python packages
                (Numpy, Scipy and Matplotlib)

           http://scikit-learn.sourceforge.net/


Our goal: Incorporate the Crab as Scikit and incorporate
           some parts of them at Scikit-learn


                                                              5
Why Recommendations ?
The world is an over-crowded place
 !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#




                                      6
Why Recommendations
     * +,&-.$/).#&0#/"1.#$%234(".#                   ?
       $/)#5(&6 7&.2.#"$4,#)$8
                   We are overloaded
     * 93((3&/.#&0#:&'3".;#5&&<.#
         $/)#:-.34#2%$4<.#&/(3/"
Thousands of news articles and blog posts each day
       * =/#>$/&3;#?#@A#+B#4,$//"(.;#
          2,&-.$/).#&0#7%&6%$:.#
 Millions of movies, books and music tracks online
          "$4,#)$8
          Several Places, Offers and Events

     * =/#C"1#D&%<;#."'"%$(#
  Even Friends sometimes we are overloaded !

         2,&-.$/).#&0#$)#:"..$6".#
         ."/2#2&#-.#7"%#)$8




                                                         7
Why Recommendations ?
We really need and consume only a few of them!

   “A lot of times, people don’t know what
   they want until you show it to them.”
                                         Steve Jobs

  “We are leaving the Information age, and
  entering into the Recommendation age.”
                      Chris Anderson, from book Long Tail



                                                            8
Why Recommendations ?
Can Google help ?
  Yes, but only when we really know what we are looking for
           But, what’s does it mean by “interesting” ?
Can Facebook help ?
  Yes, I tend to find my friends’ stuffs interesting
   What if i had only few friends and what they like do not always
                             attract me ?
Can experts help ?
  Yes, but it won’t scale well.
    But it is what they like, not me! Exactly same advice!


                                                                     9
Why Recommendations ?
         Recommendation Systems
Systems designed to recommend to me something I may like




                                                           10
Why Recommendations ?
     !"#$%&"'$"'(')*#*+,)
     Recommendation Systems

      -+*#)+.               -#/')             0#)1#




                                    !
2'              23&4"+')1               5,6           7),*%'"&863


                      Graph Representation




                                                                    11
The current Crab

Collaborative Filtering algorithms
  User-Based, Item-Based and Slope One

Evaluation of the Recommender Algorithms
 Precision, Recall, F1-Score, RMSE




                           Precision-Recall Charts

                                                     12
The current Crab




   Precision-Recall Charts

                             13
The current Crab




                   14
The current Crab




Using REST APIs to deploy the recommender
          django-piston, django-rest, django-tastypie




                                                        15
Crab is already in production

  Brazilian Social Network called Atepassar.com
        Educational network with more than 60.000 students and 3000 video-classes




     Running on Python
    + Numpy + Scipy and
          Django


Backend for Recommendations
MongoDB - mongoengine

   Daily Recommendations
    with Explanations



                                                                                    16
Evaluating your recommender
 Crab implements the most used recommender metrics.
     Precision, Recall, F1-Score, RMSE



     Using matplotlib
     for a plotter utility

 Implement new metrics

Simulations support maybe (??)




                                                  17
Evaluating your recommender
All you have to do is implement your Evaluator




                                                 18
Distributing the recommendation computations


Use Hadoop and Map-Reduce intensively
  Investigating the Yelp mrjob framework   https://github.com/pfig/mrjob



Develop the Netflix and novel standard-of-the-art used
   Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines



The most commonly used is Slope One technique.



                                                                                 19
Why migrate ?
Old Crab running only using Pure Python
     Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries
   High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing!
    Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community
 Join the scientific researchers and machine learning developers around the Globe coding with
                                 Python to help us in this project


                              Be Fast and Furious

                                                                                                  20
Why migrate ?



Numpy optimized with PyPy

     2.x - 48.x Faster



  http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html




                                                                               21
How are we working ?
            Sprints, Online Discussions and Issues




https://github.com/muricoca/crab/wiki/UpcomingEvents

                                                       22
Future Releases
        Planned Release 0.1
   Collaborative Filtering Algorithms working, sample datasets to load and test


        Planned Release 0.11
       Evaluation of Recommendation Algorithms and Database Models support


        Planned Release 0.12
   Recommendation as Services with REST APIs




....



                                                                                  23
Join us!

1. Read our Wiki Page
    https://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issues
    https://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
                     discussion list
                http://groups.google.com/group/scikit-crab




                                                                24
Crab
              A Python Framework for Building
                  Recommendation Engines

           https://github.com/muricoca/crab

Marcel Caraciolo Ricardo Caspirro                            Bruno Melo
   @marcelcaraciolo           @ricardocaspirro                 @brunomelo

                      {marcel, ricardo,bruno}@muricoca.com

                                                                            25
Ad

More Related Content

Viewers also liked (19)

Imerior Crab Meat
Imerior Crab MeatImerior Crab Meat
Imerior Crab Meat
SMART WEB
 
Sistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência ColetivaSistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência Coletiva
Marcel Caraciolo
 
Priyasha Rocky Shores
Priyasha Rocky ShoresPriyasha Rocky Shores
Priyasha Rocky Shores
Shaun Wood
 
Lobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIALobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIA
University of Mumbai
 
Crab Bank Project
Crab Bank ProjectCrab Bank Project
Crab Bank Project
darawahab
 
Mud crab farming tarang shah
Mud crab farming tarang shahMud crab farming tarang shah
Mud crab farming tarang shah
Tarang kumar Shah
 
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Gayatri R. Kachh
 
Seed production mudcrab
Seed production mudcrabSeed production mudcrab
Seed production mudcrab
hoabienHP
 
Mud crab farming in India
Mud crab farming in IndiaMud crab farming in India
Mud crab farming in India
International Aquafeed
 
Mud crab
Mud crabMud crab
Mud crab
Kashyap Kumar
 
Overview Breeding And Seed Production
Overview Breeding And Seed ProductionOverview Breeding And Seed Production
Overview Breeding And Seed Production
Ridzaludin
 
Crab Power Point
Crab Power PointCrab Power Point
Crab Power Point
KerriNor
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
wejia
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Agustin Benito Bethencourt
 
Working away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacksWorking away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacks
Rhonda Bracey
 
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic InsertionAdvancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Ian Walcott-Skinner
 
Presentatie 2 Maart
Presentatie 2 MaartPresentatie 2 Maart
Presentatie 2 Maart
Sjef Kerkhofs
 
Describing exercise
Describing exerciseDescribing exercise
Describing exercise
Sussan Roo
 
Scmad Chapter09
Scmad Chapter09Scmad Chapter09
Scmad Chapter09
Marcel Caraciolo
 
Imerior Crab Meat
Imerior Crab MeatImerior Crab Meat
Imerior Crab Meat
SMART WEB
 
Sistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência ColetivaSistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência Coletiva
Marcel Caraciolo
 
Priyasha Rocky Shores
Priyasha Rocky ShoresPriyasha Rocky Shores
Priyasha Rocky Shores
Shaun Wood
 
Lobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIALobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIA
University of Mumbai
 
Crab Bank Project
Crab Bank ProjectCrab Bank Project
Crab Bank Project
darawahab
 
Mud crab farming tarang shah
Mud crab farming tarang shahMud crab farming tarang shah
Mud crab farming tarang shah
Tarang kumar Shah
 
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Gayatri R. Kachh
 
Seed production mudcrab
Seed production mudcrabSeed production mudcrab
Seed production mudcrab
hoabienHP
 
Overview Breeding And Seed Production
Overview Breeding And Seed ProductionOverview Breeding And Seed Production
Overview Breeding And Seed Production
Ridzaludin
 
Crab Power Point
Crab Power PointCrab Power Point
Crab Power Point
KerriNor
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
wejia
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Agustin Benito Bethencourt
 
Working away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacksWorking away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacks
Rhonda Bracey
 
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic InsertionAdvancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Ian Walcott-Skinner
 
Describing exercise
Describing exerciseDescribing exercise
Describing exercise
Sussan Roo
 

Similar to Introduction to Crab - Python Framework for Building Recommender Systems (20)

Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
Marcel Caraciolo
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
Vanessa S
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
 
Walter api
Walter apiWalter api
Walter api
Nicholas Schiller
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your Repos
Weaveworks
 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source Adventure
Nick Kridler
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Rajesh Rajamani
 
We are Digital Puppets
We are Digital PuppetsWe are Digital Puppets
We are Digital Puppets
Secpro - Security Professionals
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira Shibata
Akira Shibata
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
Vonbo
 
Importance of Developers to HE in the UK
Importance of Developers to HE in the UKImportance of Developers to HE in the UK
Importance of Developers to HE in the UK
Paul Walk
 
The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!
Bruno Rocha
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
Yaroslav Serhieiev
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
Anya Bida
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
Asia Smith
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
Kelle Cruz
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Six Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower ScientistsSix Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower Scientists
David De Roure
 
Why should I learn python
Why should I learn pythonWhy should I learn python
Why should I learn python
grinu
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
Marcel Caraciolo
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
Vanessa S
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your Repos
Weaveworks
 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source Adventure
Nick Kridler
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira Shibata
Akira Shibata
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
Vonbo
 
Importance of Developers to HE in the UK
Importance of Developers to HE in the UKImportance of Developers to HE in the UK
Importance of Developers to HE in the UK
Paul Walk
 
The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!
Bruno Rocha
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
Yaroslav Serhieiev
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
Anya Bida
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
Asia Smith
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
Kelle Cruz
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Six Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower ScientistsSix Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower Scientists
David De Roure
 
Why should I learn python
Why should I learn pythonWhy should I learn python
Why should I learn python
grinu
 
Ad

More from Marcel Caraciolo (20)

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
Marcel Caraciolo
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
Marcel Caraciolo
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Marcel Caraciolo
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
Marcel Caraciolo
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
Marcel Caraciolo
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
Marcel Caraciolo
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
Marcel Caraciolo
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
Marcel Caraciolo
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
Marcel Caraciolo
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
Marcel Caraciolo
 
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
Marcel Caraciolo
 
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks
Marcel Caraciolo
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
Marcel Caraciolo
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
Marcel Caraciolo
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
Marcel Caraciolo
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Marcel Caraciolo
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
Marcel Caraciolo
 
Arquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursosArquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursos
Marcel Caraciolo
 
PyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for FoursquarePyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for Foursquare
Marcel Caraciolo
 
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Marcel Caraciolo
 
Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
Marcel Caraciolo
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
Marcel Caraciolo
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Marcel Caraciolo
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
Marcel Caraciolo
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
Marcel Caraciolo
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
Marcel Caraciolo
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
Marcel Caraciolo
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
Marcel Caraciolo
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
Marcel Caraciolo
 
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
Marcel Caraciolo
 
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks
Marcel Caraciolo
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
Marcel Caraciolo
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
Marcel Caraciolo
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
Marcel Caraciolo
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Marcel Caraciolo
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
Marcel Caraciolo
 
Arquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursosArquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursos
Marcel Caraciolo
 
PyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for FoursquarePyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for Foursquare
Marcel Caraciolo
 
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Marcel Caraciolo
 
Ad

Recently uploaded (20)

Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Unlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive GuideUnlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive Guide
vikasascentbpo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Top 10 IT Help Desk Outsourcing Services
Top 10 IT Help Desk Outsourcing ServicesTop 10 IT Help Desk Outsourcing Services
Top 10 IT Help Desk Outsourcing Services
Infrassist Technologies Pvt. Ltd.
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Unlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive GuideUnlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive Guide
vikasascentbpo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

Introduction to Crab - Python Framework for Building Recommender Systems

  • 1. Crab A Python Framework for Building Recommendation Engines Scipy 2011, Austin TX Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo 1
  • 2. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crab 2
  • 3. When started ? It began one year ago Community-driven, 4 members Since April,2011 the open-source labs Muriçoca incorporated it Since April,2011 rewritting it as Scikit https://github.com/muricoca/ 3
  • 4. Knowing Scikits Scikits are Scipy Toolkits - independent and projects hosted under a common namespace. Scikits Image Scikits MlabWrap Scikits AudioLab Scikit Learn .... http://scikits.appspot.com/scikits 4
  • 5. Knowing Scikits Scikit-Learn Machine Learning Algorithms + scientific Python packages (Numpy, Scipy and Matplotlib) http://scikit-learn.sourceforge.net/ Our goal: Incorporate the Crab as Scikit and incorporate some parts of them at Scikit-learn 5
  • 6. Why Recommendations ? The world is an over-crowded place !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0# 6
  • 7. Why Recommendations * +,&-.$/).#&0#/"1.#$%234(".# ? $/)#5(&6 7&.2.#"$4,#)$8 We are overloaded * 93((3&/.#&0#:&'3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Thousands of news articles and blog posts each day * =/#>$/&3;#?#@A#+B#4,$//"(.;# 2,&-.$/).#&0#7%&6%$:.# Millions of movies, books and music tracks online "$4,#)$8 Several Places, Offers and Events * =/#C"1#D&%<;#."'"%$(# Even Friends sometimes we are overloaded ! 2,&-.$/).#&0#$)#:"..$6".# ."/2#2&#-.#7"%#)$8 7
  • 8. Why Recommendations ? We really need and consume only a few of them! “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long Tail 8
  • 9. Why Recommendations ? Can Google help ? Yes, but only when we really know what we are looking for But, what’s does it mean by “interesting” ? Can Facebook help ? Yes, I tend to find my friends’ stuffs interesting What if i had only few friends and what they like do not always attract me ? Can experts help ? Yes, but it won’t scale well. But it is what they like, not me! Exactly same advice! 9
  • 10. Why Recommendations ? Recommendation Systems Systems designed to recommend to me something I may like 10
  • 11. Why Recommendations ? !"#$%&"'$"'(')*#*+,) Recommendation Systems -+*#)+. -#/') 0#)1# ! 2' 23&4"+')1 5,6 7),*%'"&863 Graph Representation 11
  • 12. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall Charts 12
  • 13. The current Crab Precision-Recall Charts 13
  • 15. The current Crab Using REST APIs to deploy the recommender django-piston, django-rest, django-tastypie 15
  • 16. Crab is already in production Brazilian Social Network called Atepassar.com Educational network with more than 60.000 students and 3000 video-classes Running on Python + Numpy + Scipy and Django Backend for Recommendations MongoDB - mongoengine Daily Recommendations with Explanations 16
  • 17. Evaluating your recommender Crab implements the most used recommender metrics. Precision, Recall, F1-Score, RMSE Using matplotlib for a plotter utility Implement new metrics Simulations support maybe (??) 17
  • 18. Evaluating your recommender All you have to do is implement your Evaluator 18
  • 19. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. 19
  • 20. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and Furious 20
  • 21. Why migrate ? Numpy optimized with PyPy 2.x - 48.x Faster http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html 21
  • 22. How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEvents 22
  • 23. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs .... 23
  • 24. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list http://groups.google.com/group/scikit-crab 24
  • 25. Crab A Python Framework for Building Recommendation Engines https://github.com/muricoca/crab Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo {marcel, ricardo,bruno}@muricoca.com 25