The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis Bitencourt-Emilio, Reddit

The Search for Better Search
Nick Caldwell, Chris Slowe & Luis Bitencourt-Emilio

What is Reddit?
Nick Caldwell, VP of Engineering

What is Reddit?
Reddit is the frontpage of the internet
A social network where there are tens of thousands of communities
around whatever passions or interests you might have
It’s where people converse about the things that
are most important to them

Reddit by the numbers
Alexa Rank (US/World)
MAU
Communities
Posts per day
Comments day
Votes per day
Searches per Day
4th/7th
320M
1.1M
1M
5M
75M
70M

SCALE
ESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS
ENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTEN
ESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS CONTENT ENDLESS

So, what are we doing with all that power

Cat Walking a HumanCat Fist Bumping

Wait, it’s not just cat pictures!

Community > Content > Individual
● Authenticity
● Creative freedom
● Empathy @ scale

r/confession
Secrets that if revealed
would change your life
forever?

The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis Bitencourt-Emilio, Reddit

r/assistance
Empathy and support at
scale

None of that matters if you
can’t FIND the content!

Storytime: History of Search @ Reddit
Chris Slowe, CTO

In the beginning (2005), there was PostGres...
And it was mostly good.
● Tsearch2 is pretty sweet
○ “Oh wow it does stemming!”
● We also really liked TRIGGERs back then (“No, it’s cool. The database does
all the work and it’s guaranteed to be accurate”)
● Eventually grew so we were bogging down the majority of Postgres queries
with a small minority (~2%) of search traffic

2007: Oh! There’s a tool for that. Enter Lucene.
And it was mostly better.
● This was actually implemented (by me!) just over 10 years ago in July 2007.
● The runner up was a Google Search Appliance. Remember those!
○ Would have made a nice addition to our one rack
● Implemented in Python as a hand-rolled RPC server over TCP
○ Lucene Index files all on a single machine
○ “We’ll scale it later.”
● Actually had posts and comments

2008: What’s this “Solr” the kids are on about?
● Continuing the long tradition of “make the new guy fix it” we passed the “fix
search pls” task onto our third hire, David King (u/ketralnis)
● Implemented in early 2008 and pre-Solr 1.3
● Set up an actual search cluster.
● Wrapper/driver code in python for both indexing and searching.
● Scalable and tunable relevance!

2010: Up and to the right!
● As the site continued to grow and we first cracked a billion pageviews/month
● Engineering team of four, we put all of our effort into:
○ 503 mitigation
○ continuing to add Postgres read slaves
○ adding more cache
○ add very early version of Cassandra (0.6.0)
● Oh. Right. Search. How’s that going?

2010: If you love something, you outsource it
Said no one ever.
● For starters, we were out of new guys to fix it.
● Also, no chance to really focus on it full time
○ Only about 2% of traffic
○ Because if you don’t build it, they won’t come…
● Contract with a company called IndexTank who provided a nifty drop-in
replacement and we can stop worrying about it.
● “We launched a new search engine yesterday. Calm down. It’s okay. I know.
You’ve been hurt before.” - David King ‘10

2012: start worrying!
● IndexTank got bought by LinkedIn (yay!)
○ ....and shut down their API (boo!)
○ ….with a 6 month grace period (d’oh!)
● AWS CloudSearch to the rescue!

2017: Fix it Fix it Fix it!
● You’ve heard of “five nines”? We had “nine fives”
● Giant CloudSearch Cluster, terrible performance
● Worked closely with AWS on it
● Did the equivalent of “turn it off and on again”
○ Full index rebuild
○ Drop seldom used indices
○ Consider blood sacrifice
○ Success!

What I’m saying is, Search is hard...
“we fixed a bug in the search results ordering” - Steve Huffman ‘06
“We updated the search system this morning to help alleviate some load problems” - Steve ‘06
“Jeremy is working on search! It’s not a complicated fix (basically, the sorting is whacky).” - Steve ‘06
“Search works much better, tagging and user-controlled subreddits are right around the corner” - Steve ‘07
“Search is better, but not quite where we’d like it.” - Steve ‘07
“Stats and search are temporarily disabled, but will be coming back as soon as we can get them repaired.” - Steve ‘07
“we were hoping to include an upgraded search, which, unlike the last version, was actually useful and helped you find what you were looking for.
Unfortunately, the version we settled on didn’t quite load test as nicely” - Steve ‘ 07
“I made a quick fix to search that I hope helps until we get a chance to really fix it.” - Steve ‘07
“[David]’s been fixing search and hacking mystery projects in Erlang.” - Alexis Ohanian ‘08
“ I’ve totally replaced the reddit search function.” - David King ‘08
“We launched a new search engine yesterday. Calm down. It’s okay. I know. You’ve been hurt before.” - David King ‘10
“The old [cloud]search domain had significant performance issues: roughly 33% of queries took over 5 seconds to complete and would result in
the search error page.” - Brian Simpson ‘17

...and we have fully internalized it.

Search Today @ Reddit
Luis Bitencourt-Emilio, Sr. Director - Search & Discovery

Deep Dive: Ingestion… our first attempt

...let’s try that again; keep it simple stupid.

The pudding: Bye bye error pages!

The pudding: If it’s not relevant it doesn’t matter

Recap: The numbers
● Over a quarter billion posts indexed
● 1000 index updates per second
● 400 QPS from first party apps
● 75M searches per day (including public API)
● Up next: ingesting many billions of comments, messages, and user profiles.

Show and tell: A better subreddit search
The challenge: Redditors are very creative in their subreddit naming (e.g.
r/superbowl is about superb owl pictures) which whilst fun, poses a challenge for
discovery.
The answer: faceted search on posts!

The pudding: A better subreddit search

The future: A better subreddit search
Coming soon; normalization!
On a similar vein, once we ingest comments we can leverage a similar strategy
for posts!
This will make a big impact on things image, video and link only posts which have
little self text to index but lots of comments!

The Future of Search @ Reddit
Nick Caldwell, Reddit VPE

...but that’s not all
● Relevance
○ Relevance model experimentation
○ Query understanding & rewriting
● Smarter search for new content types
○ Comments, user profiles, messages, etc
○ Image and video search
○ Q&A

The Future of Reddit
Welcoming
Personalized

Thanks!
Nick Caldwell: nickc@reddit.com
Chris Slowe: chris@reddit.com
Luis Bitencourt-Emilio: luis@reddit.com
PS: We’re hiring!
http://reddit.com/jobs

The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis Bitencourt-Emilio, Reddit

More Related Content

Similar to The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis Bitencourt-Emilio, Reddit (20)

More from Lucidworks (20)

Recently uploaded (20)

The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis Bitencourt-Emilio, Reddit

Editor's Notes