Construindo Sistemas de Recomendação com Python

Sistemas de
Recomendação
usando Python
Marcel Pinheiro Caraciolo
marcel@pingmind.com
@marcelcaraciolo

http://www.pycursos.com

Quem é Marcel ?
Marcel Pinheiro Caraciolo - @marcelcaraciolo

Sergipano, porém Recifense.
Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados
Diretor de Pesquisa e Desenvolvimento no Atépassar
CEO e Co-fundador do PyCursos/ Pingmind
Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)
Minhas áreas de interesse: Computação móvel e Computação inteligente

Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006)
http://aimotion.blogspot.com (sobre I.A. desde 2009)

1.0 2.0

Fonte de Informação Fluxo Contínuo de Informação
VI Encontro do PUG-PE

WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB

USERS VI Encontro do PUG-PE

Usar informação coletiva de
forma efetiva afim de
aprimorar uma aplicação

Intelligence from
Mining Data

User
User
User User
User
Um usuário inﬂuencia outros
por resenhas, notas, recomendações e blogs

Um usuário é inﬂuenciado por outros
por resenhas, notas, recomendações e blogs

aggregation information: lists
ratings
user-generated content
reviews
blogs recommendations

wikis Collective Intelligence voting
Your application bookmarking
Search
tag cloud tagging
saving
Natural Language Processing

Clustering and Harness external content
predictive models

WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB

USERS
antes...

estamos sobrecarregados
de informações

às vezes
procuramos
isso...

eeeeuuuu...

google?

midias sociais?

“A lot of times, people don’t know what
they want until you show it to them.”
Steve Jobs

“We are leaving the Information age, and
entering into the Recommendation age.”
Chris Anderson, from book Long Tail

Recomendações Sociais

Família/Amigos
Amigos/ Família
O Que eu
deveria ler ?

Ref: Flickr-BlueAlgae

“Eu acho que
você deveria ler
Ref: Flickr photostream: jefield estes livros.

Recomendações por Interação

Entrada: Avalie alguns livros

O Que eu
deveria ler ?

Saída:
“Livros que você
pode gostar
são …”

Sistemas desenhados para sugerir algo para mim do meu
interesse!

Netﬂix
- 2/3 dos ﬁlmes alugados vêm de recomendação

Google News
- 38% das notícias mais clicadas vêm de recomendação

Amazon
- 38% das vendas vêm de recomendação

Fonte: Celma & Lamere, ISMIR 2007

!"#$%"#&'"%(&$)")
Nós+,&-.$/).#&0#/"1.#$%234(".#
* estamos sobrecarregados de
informação
$/)#5(&6 7&.2.#"$4,#)$8
* 93((3&/.#&0#:&'3".;#5&&<.#
$/)#:-.34#2%$4<.#&/(3/"
Milhares de artigos e posts
* =/#>$/&3;#?#@A#+B#4,$//"(.;#
novos todos os dias
2,&-.$/).#&0#7%&6%$:.#
"$4,#)$8
* =/#C"1#D&%<;#."'"%$(#
Milhões de Músicas, Filmes e
2,&-.$/).#&0#$)#:"..$6".#
Livros
."/2#2&#-.#7"%#)$8

Milhares de Ofertas e
Promoções

O que pode ser recomendado ?

Contatos em Redes Sociais Artigos
Produtos Messagens de Propaganda
Cursos e-learning Livros
Tags Músicas
Futuras namoradas
Roupas Filmes
Restaurantes
Programas de Tv
Vídeos Papers
Opções de Investimento Proﬁssionais
Módulos de código

E como funciona a
recomendação ?

O que os sistemas de recomendação
realmente fazem ?
1. Prediz o quanto você pode gostar de um certo
produto ou serviço
2. Sugere um lista de N items ordenada de acordo
com seu interese
3. Sugere uma lista de N usuários ordernada
para um produto/serviço
4. Explica a você o porque esses items foram
recomendados
5. Ajusta a predição e a recomendação baseado em
seu feedback e de outros.

Filtragem baseada por Conteúdo

Similar

Duro de O Vento Toy
Armagedon Items
Matar Levou Store

recomenda
gosta

Marcel Usuários

Problemas com filtragem por
conteúdo
1. Análise dos dados Restrita
- Items e usuários pouco detalhados. Pior em áudio ou imagens

2. Dados Especializados
- Uma pessoa que não tem experiência com Sushi não recebe o
melhor restaurante de Sushi da cidade
3. Efeito Portfólio
- Só porque eu vi 1 ﬁlme da Xuxa quando criança, tem que me
recomendar todos dela

Filtragem Colaborativa

O Vento Toy
Thor Armagedon Items
Levou Store

gosta
recomenda

Marcel Rafael Amanda Usuários

Similar

Problemas com filtragem colaborativa
1. Escabilidade
- Amazon com 5M usuários, 50K items, 1.4B avaliações
2. Dados esparsos
- Novos usuários e items que não tem histórico
3. Partida Fria
- Só avaliei apenas um único livro no Amazon!
4. Popularidade
- Todo mundo lê ‘Harry Potter’
5. Hacking
- A pessoa que lê ‘Harry Potter’ lê Kama Sutra

Filtragem Híbrida
Combinação de múltiplos métodos

Duro de O Vento Toy
Armagedon Items
Matar Levou Store

Ontologias
Dados
Símbolicos

Marcel Rafael Luciana Usuários

Como eles são
apresentados ?
Destaques Mais sobre este artista...
Alguem similar a você também gostou disso
O mais popular em seu grupo...
Já que você escutou esta, você pode querer esta...
Lançamentos Escute músicas de artistas similares.
Estes dois item vêm juntos..

Como eles são avaliados ?
Como sabemos se a recomendação é boa ?
Geralmente se divide-se em treinamento/teste (80/20)

Críterios utilizados:
- Erro de Predição: RMSE

- Curva ROC*, rank-utility, F-Measure
*http://code.google.com/p/pyplotmining/

How to build a recommender
system with Python ?

There is one option...

Crab
A Python Framework for Building
Recommendation Engines

https://github.com/python-recsys/crab

How to build a recommender
system with Python ?

There is one option... But it’s still in development!

Crab
A Python Framework for Building
Recommendation Engines

https://github.com/python-recsys/crab

But here we will create one from
Zero with Python!
Find someone similar to you

O Vento Toy
Thor Armagedon Items
Levou Store

like
recommends

Marcel Rafael Amanda Users

Similar

Step Zero with Python!
Movies Ratings Dataset

Movies Ratings Dataset

Mr. X deu nota 4 para
Snow Crash e 2 para
Girl with the Dragon Tatoo,
O que recomendar para ele ?


Descobrimos que Amy é mais similar dentre as opções,

Podemos recomendar um ﬁlme visto por ela com 5 estrelas :)


Mais uma métrica de similaridade: Distância Euclideana

Show me the code!
>>>#Representing the data in Python

Show me the code!
>>>#Representing the data in Python
>>>users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0,
"Norah Jones": 4.5, "Phoenix": 5.0,
"Slightly Stoopid": 1.5,
"The Strokes": 2.5, "Vampire Weekend": 2.0},
"Bill": {"Blues Traveler": 2.0, "Broken Bells": 3.5,
"Deadmau5": 4.0,
"Phoenix": 2.0, "Slightly Stoopid": 3.5,
"Vampire Weekend": 3.0},
"Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0,
"Deadmau5": 1.0, "Norah Jones": 3.0,
"Phoenix": 5, "Slightly Stoopid": 1.0},
"Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0,
"Deadmau5": 4.5, "Phoenix": 3.0,
"Slightly Stoopid": 4.5, "The Strokes": 4.0,
"Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0,
"Norah Jones": 4.0, "The Strokes": 4.0,
"Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0,
"The Strokes": 4.0, "Vampire Weekend": 4.0},
"Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0,
"Norah Jones": 3.0, "Phoenix": 5.0,
"Slightly Stoopid": 4.0, "The Strokes": 5.0},
"Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0,
"The Strokes": 3.0}}

Codiﬁcando o Mahantan

def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common

Codiﬁcando o Mahantan

def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
if commonRatings:
return distance
else:

>>> manhattan(users['Hailey'], users['Veronica'])
2.0
>>> manhattan(users['Hailey'], users['Jordyn'])
1.5
>>>

Codiﬁcando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
if commonRatings:
return pow(distance, 1/2.0)
else:


1.4142135623730951

distance = 0.0
for key in rating1:
if key in rating2:
if commonRatings:
else:

1.4142135623730951

distance = 0.0
for key in rating1:
if key in rating2:
if commonRatings:
else:

>>> euclidean(users['Hailey'], users['Veronica'])

1.4142135623730951

Find the closest users
def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances


>>> computeNearestNeighbor('Hailey', users)
[(2.0, 'Veronica'), (4.0, 'Chan'),(4.0, 'Sam'), (4.5, 'Dan'), (5.0,
'Angelica'), (5.5, 'Bill'), (7.5, 'Jordyn')]
>>>

def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances

>>> computeNearestNeighbor('Hailey', users)
[(2.0, 'Veronica'), (4.0, 'Chan'),(4.0, 'Sam'), (4.5, 'Dan'), (5.0,
'Angelica'), (5.5, 'Bill'), (7.5, 'Jordyn')]
>>>

The recommender
def recommend(username, users):
"""Give list of recommendations"""
# first find nearest neighbor
nearest = computeNearestNeighbor(username, users)[0][1]
recommendations = []
# now find bands neighbor rated that user didn't
neighborRatings = users[nearest]
userRatings = users[username]
for artist in neighborRatings:
if not artist in userRatings:
recommendations.append((artist, neighborRatings[artist]))
recommendations.sort(key=lambda artistTuple: artistTuple[1],
reverse = True)
return recommendations

The recommender

>>> recommend('Hailey', users)
[('Phoenix', 4.0), ('Blues Traveler', 3.0), ('Slightly Stoopid', 2.5)]

>>> recommend('Chan', users)
[('The Strokes', 4.0), ('Vampire Weekend', 1.0)]

>>> recommend('Angelica', users)
[]

The recommender
reverse = True)

>>> recommend('Hailey', users)
[('Phoenix', 4.0), ('Blues Traveler', 3.0), ('Slightly Stoopid', 2.5)]

>>> recommend('Chan', users)
[('The Strokes', 4.0), ('Vampire Weekend', 1.0)]

>>> recommend('Angelica', users)
[]

The recommender

>>> computeNearestNeighbor('Angelica', users)
[(3.5, 'Veronica'), (4.5, 'Chan'), (5.0, 'Hailey'), (8.0, 'Sam'), (9.0,
'Bill'), (9.0, 'Dan'), (9.5, 'Jordyn')]('Hailey', users)

The recommender
reverse = True)

>>> computeNearestNeighbor('Angelica', users)
[(3.5, 'Veronica'), (4.5, 'Chan'), (5.0, 'Hailey'), (8.0, 'Sam'), (9.0,
'Bill'), (9.0, 'Dan'), (9.5, 'Jordyn')]('Hailey', users)

But we need to improve it more...

The Pearson Correlation

Output: -1 (perfect disagreement) to 1 (perfect agreement)

Pearson Correlation
def pearson(rating1, rating2):
sum_xy = 0
sum_x = 0
sum_y = 0
sum_x2 = 0
sum_y2 = 0
n = 0
for key in rating1:
if key in rating2:
n += 1
x = rating1[key]
y = rating2[key]
sum_xy += x * y
sum_x += x
sum_y += y
sum_x2 += x**2
sum_y2 += y**2
# now compute denominator
denominator = sqrt(sum_x2 - (sum_x**2) / n) *
sqrt(sum_y2 -(sum_y**2) / n)
if denominator == 0:
return 0
else:
return (sum_xy - (sum_x * sum_y) / n) / denominator

Pearson Correlation

>>> pearson(users['Angelica'], users['Bill'])
-0.90405349906826993
>>> pearson(users['Angelica'], users['Hailey'])
0.42008402520840293
>>> pearson(users['Angelica'], users['Jordyn'])
0.76397486054754316
>>>

Pearson Correlation
def pearson(rating1, rating2):
sum_xy = 0
sum_x = 0
sum_y = 0
sum_x2 = 0
sum_y2 = 0
n = 0
for key in rating1:
if key in rating2:
n += 1
x = rating1[key]
y = rating2[key]
sum_xy += x * y
sum_x += x
sum_y += y
sum_x2 += x**2
sum_y2 += y**2
# now compute denominator
denominator = sqrt(sum_x2 - (sum_x**2) / n) *
sqrt(sum_y2 -(sum_y**2) / n)
if denominator == 0:
return 0
else:
return (sum_xy - (sum_x * sum_y) / n) / denominator

>>> pearson(users['Angelica'], users['Bill'])
-0.90405349906826993
>>> pearson(users['Angelica'], users['Hailey'])
0.42008402520840293
>>> pearson(users['Angelica'], users['Jordyn'])
0.76397486054754316
>>>

K-nearest Neighbors (kNN)
Find k most similars to you

K-nearest Neighbors (kNN)

Challenge you!

Change people to items

to

{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}

to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.


to

def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]

return result


to

result={}

return result

>> movies=recommendations.transformPrefs(recommendations.users)
>> recommendations.computeNearestNeighbors(‘Blues Traveler’, movies)
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]


to

result={}

return result

>> movies=recommendations.transformPrefs(recommendations.critics)
>> recommendations.computeNearestNeighbors(movies,'Superman Returns')
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]

User Based Filtering até agora!

Problemas de Escalabilidade e Esparsidade

Item Based Filtering
Find k most similars to the item

Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}

! # Invert the preference matrix to be item-centric

! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))

# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores

! return result

Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}

! # Invert the preference matrix to be item-centric

! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))

# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores

! return result

>>> itemsim=recommendations.calculateSimilarItems(users)
>>> itemsim
{'Lady in the Water': [(0.40000000000000002, 'You, Me and Dupree'), (0.2857142857142857, 'The
Night Listener'),... 'Snakes on a Plane': [(0.22222222222222221, 'Lady in the Water'),
(0.18181818181818182, 'The Night Listener'),... etc.

The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]

The recommender

>>> recommend('Hailey', users,similarities,3)
[(3.1176470588235294, 'Slightly Stoopid'),
(2.639207507820647, 'Phoenix'),(2.64476386036961, 'Blues Traveler')]

The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]

>>> recommend('Hailey', users,similarities,3)
[(3.1176470588235294, 'Slightly Stoopid'),
(2.639207507820647, 'Phoenix'),(2.64476386036961, 'Blues Traveler')]

Content Based Filtering

Similar

Duro de O Vento Toy
Armagedon Items
Matar Levou Store

recommend
likes

Marcel Users

source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is,
aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results
the reviews from similar users.
We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender
achieved showed to be very promising [19].
approach. By combining the content-based filtering and the
uct recommendation approach in a mobile application so the
A.

Crab is already in production
users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it
Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues
repositories which D ESIGN
recommendation to the user, since the current approaches just the services to be recommended, and the review repository
Application data information our mobile recommender sys-
that contains the user opinions about those services. All this for
only deliver product recommendations with a overall score
without pointing out the appropriateness of such recommen- datatembecan be from data source containers in the web product description
can extracted divided into two parts: the rec
dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user
as the as location, description and [17] as

Hybrid Meta Approach gives the system’s architecture and
suppliers, the system will deliver the explanation, providing
relevant reviews of similar users, we believe that it will
tags, etc.). The Figure 3
increase the confidence in the buying decision process and the
displayed at the Figure 2 and the location recommendation
engine from Google: Google HotPot [18]. by user (such as rating, comments,
reviews or ratings provided
mo
wh
product accepptance rate. In the mobile context this approach
po
could help the users in this process and showing the user
relative components. thi
opinions could contribute to achieve this task. rec
spe
!"#$"%&'$ 5&-$
!"#$%&'%($) !".,"/#) acc
!"*+#,$+'-) !"*+#,$+'-) +,-*.&$
!(#$()&'*&%$
/01&'234&$ !6#$6,00&41&7$
wh
res
!<#$<'&2&'&04&%A$B,431*,0A$&14C$
ves
0+44%6+'%$,.")1%#"2)
0+($"($)1%#"2)
3,4$"',(5)
ou
3,4$"',(5)
)))67,8,#%)+,4%$91$'%4)-1":))))
suc
!"#$%&"'()*+,#&-,.)
/$%,0"12()*3$4%)3""5.)
))))1,;&,<4)<1&%%,')=2)4&:&8$1))
)))))))))))%$4%,5)94,14>?) <',7)41$
pro
8&=,%*1,'>$
exp
8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$
!(#$()&'*&%$
ma
8&?*&@$
we
Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$
com
7"$%)
!"8+99"(2"'))
!8#$830E&7$<',7)41%$
The content-based filtering approach will be used to filter ext
the product/service repository, while the collaborative based
8&%).1%$ B.
approach will derive the product review recommendations. In
addition we will use text mining techniques to distinct the
!"8+99"(2%$,+(#) polarity of the user review between positive or negative one.
This information summarized would contribute in the product Architecture
Fig. 3. Mobile Recommender System rat
score recommendation computation. The final product recom-
Fig. 1. Meta Recommender Architecture
mendation score is computed by integrating the result of both
me
recommenders. By now, weproduct/service recommender, the user could
In our mobile are considering to use different and
Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen-
different data sources of user opinions and descriptions, we filter some products or services at special oth
is the symbolic data analysis approach (SDA) [19], which
have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his
product description and also can enter modeled ow
a meta recommender architecture, the system would provide
a personalized control over the generated recommendation list
feedback to some offered product recommendation.
as set of modal symbolic descriptions that summarizes the Re
information provided by the corresponding data sources. It is

Crab is already in production

Brazilian Social Network called Atepassar.com
Educational network with more than 60.000 students and 120 video-classes

Running on Python
+ Numpy + Scipy and
Django

Backend for Recommendations
MongoDB - mongoengine

Daily Recommendations
with Explanations

Distributing the recommendation computations

Use Hadoop and Map-Reduce intensively
Investigating the Yelp mrjob framework https://github.com/pﬁg/mrjob

Develop the Netﬂix and novel standard-of-the-art used
Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines

The most commonly used is Slope One technique.
Simple algebra math with slope one algebra y = a*x+b

Distributed Computing with mrJob
https://github.com/Yelp/mrjob

http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html


It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)



"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re

WORD_RE = re.compile(r"[w']+")

class MRWordFreqCount(MRJob):

    def mapper(self, _, line):
        for word in WORD_RE.findall(line):
            yield (word.lower(), 1)

    def reducer(self, word, counts):
        yield (word, sum(counts))

if __name__ == '__main__':
    MRWordFreqCount.run()

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)


Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html

Solutions:

scipy.sparse package

Sharding operations

Matrix Factorization
techniques (SVD)

Apontador Reviews Dataset


Solutions:


Sharding operations

techniques (SVD)

Crab implements a Matrix
Factorization with Expectation
Maximization algorithm



Solutions:


Sharding operations

techniques (SVD)

Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
scikits.crab.svd package

How are we working ?
Our Project’s Home Page

http://github.com/python-recsys/crab

Future Releases
Planned Release 0.1
Collaborative Filtering Algorithms working, sample datasets to load and test

Planned Release 0.11
Sparse Matrixes and Database Models support

Planned Release 0.12
Slope One Agorithm, new factorization techniques implemented

....

Join us!

1. Read our Wiki Page
https://github.com/python-recsys/crab/wiki/Developer-Resources

2. Check out our current sprints and open issues
https://github.com/python-recsys/crab/issues

3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
discussion list
http://groups.google.com/group/scikit-crab

Construção
do
Social
Genoma

Construindo Sistemas de Recomendação com Python

colecione descontos

http://aimotion.blogspot.com.br/2013/01/how-recommend-deals-on-line-for-coupon.html

WWW.
FAVORITOZ.
COM

Recommended Books

Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in
Intelligence, O'Reilly, 2007 Action, Manning Publications, 2009

ACM RecSys, KDD , SBSC...

Conferências Recomendadas
- ACM RecSys.

–ICWSM: Weblogand Social Media

–WebKDD: Web Knowledge Discovery and Data Mining

–WWW: The original WWW conference

–SIGIR: Information Retrieval

–ACM KDD: Knowledge Discovery and Data Mining

–ICML: Machine Learning

Construindo Sistemas de Recomendação com Python

Recommended

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to Construindo Sistemas de Recomendação com Python (20)

More from Marcel Caraciolo (20)

Recently uploaded (20)

Construindo Sistemas de Recomendação com Python