SlideShare a Scribd company logo
Building Distributed Systems in Scala
A presentation to Emerging Technologies for the Enterprise
April 8, 2010 – Philadelphia, PA




                                                             TM
About @al3x
‣   At Twitter since 2007
‣   Working on the Web
    since 1995
‣   Co-author of
    Programming Scala
    (O’Reilly, 2009)
‣   Into programming
    languages,
    distributed systems.
About Twitter
‣   Social messaging – a
    new way to
    communicate
‣   Launched in
    mid-2006
‣   Hit the mainstream in
    2008
‣   50+ million tweets per
    day (600+ per
    second)
‣   Millions of users
    worldwide
Technologies Used At Twitter
Languages                         Frameworks
‣   Ruby, JavaScript              ‣   Rails
‣   Scala                         ‣   jQuery
‣   lil’ bit of C, Python, Java


Data Storage                      Misc.
‣   MySQL                         ‣   memcached
‣   Cassandra                     ‣   ZooKeeper
‣   HBase (Hadoop)                ‣   Jetty
                                  ‣   so much more!
Why Scala?
‣   A language that’s both fun and productive.
‣   Great performance (on par with Java).
‣   Object-oriented and functional programming,
    together.
‣   Ability to reuse existing Java libraries.
‣   Flexible concurrency (Actors, threads, events).
‣   A smart community with infectious momentum.
Hawkwind
A case study in (re)building
a distributed system in Scala.
Requirements
‣   Search for people by name, username, eventually
    by other attributes.
‣   Order the results some sensible way (ex: by
    number of followers).
‣   Offer suggestions for misspellings/alternate names.
‣   Handle case-folding and other text normalization
    concerns on the query string.
‣   Return results in about a second, preferably less.
Finding People on Twitter
Finding People on Twitter




results
Finding People on Twitter



                 suggestion




results
Finding People on Twitter

                              speedy!

                 suggestion




results
First Attempt: acts_as_solr
‣   Crunched on time, so we wanted the fastest
    route to working user search.
‣   Uses the Solr distribution/platform from Apache
    Lucene.
‣   Tries to make Rails integration straightforward
    and idiomatic.
‣   Easy to get running, hard to operationalize.
In the Interim: A Move to SOA
‣   Stopped thinking of our architecture as just a
    Rails app and the components that orbit it.
‣   Started building isolated services that
    communicate with the rest of the system via
    Thrift (an RPC and server framework).
‣   Allows us freedom to change the underlying
    implementation of services without modifying the
    rest of the system.
Thrift Example
   struct Results {
     1: list<i64> people
     2: string suggestion
     3: i32 processingTime /* milliseconds */
     4: list<i32> timings
     5: i32 totalResults
   }

   service NameSearch {
    Results find(1: string name, 2: i32 maxResults, 3: bool
   wantSuggestion)

  Results find_with_ranking(1: string name, 2: i32 maxResults, 3: bool
wantSuggestion, 4: Ranker ranking)
}
Second Attempt: Hawkwind 1
‣   A quick (three weeks) bespoke Scala project to
    “stop the bleeding”.
‣   Vertically but not horizontally scalable: no
    sharding, no failover, machine-level redundancy.
‣   Ran into memory and disk space limits.
‣   Reused Java code but didn’t offer nice Scala
    wrappers or rewrites.
‣   Still, planned to grow 10x, grew 25x!
Goals for Hawkwind 2
‣   Horizontally scalable: sharded corpus,
    replication of shards, easy to grow the service.
‣   Faster.
‣   Higher-quality results.
‣   Better use of Scala (language features,
    programming style).
‣   Maintainable code base, make it easy to add
    features.
High-Level Concepts
‣   Shards: pieces of the user corpus.
‣   Replicas: copies of shards.
‣   Document Servers.
‣   Merge Servers.
‣   Every machine gets the same code, can be
    either a Document Server or a Merge Server.
Hawkwind 2                                          Internet




High-Level                               queries for users, API requests




Architecture                                    Rails Cluster



                                    Thrift call to semi-random Merge Server




                                     Merge           Merge           Merge
                                     Server          Server          Server


                                Thrift calls to semi-random replica of each shard




                Shard 1      Shard 1         Shard 2           Shard 2         Shard 3      Shard 3
               Doc Server   Doc Server      Doc Server        Doc Server      Doc Server   Doc Server




                                  periodic deliveries of sharded user corpus




                                               Hadoop (HBase)
Taking Care of Data
‣   A Hadoop job gathers up the user data and slices it
    into shards.
‣   A cron job fetches these data dumps several times
    per day.
‣   To load a new corpus on a Document Server, simply
    restart the process.
‣   Redundancy and staggered scheduling keeps the
    system from running too hot while restarts are in
    progress.
What a Document Server does
‣   On startup, load Thrift serialized User objects.
‣   Populate an Inverted Index, Map, and Trie with
    normalized attributes of those User objects.
‣   Once ready, listen for queries.
‣   Answering a query basically means looking
    stuff up in those pre-populated data structures.
‣   Maintains a connection pool for Thrift requests,
    wrapping org.apache.commons.pool.
What a Merge Server does
‣   Gets queries.
‣   Fans out queries to Document Servers.
‣   Waits for queries to come back using a custom
    ParallelFuture class, which wraps a number of
    java.util.concurrent classes.
‣   Merges together the result sets, re-ranks them,
    and ships ‘em back to the requesting client.
How to model a distributed system?
‣   Literal decomposition: classes for all
    architectural components (Shard, Replica, etc.).
‣   Each component knows/does as little as
    possible.
‣   Isolate mutable state, test carefully.
‣   Cleanly delegate calls.
Literal Decomposition: Replica
case class Replica(val shard: Shard, val server: Server) {
 private val log = Logger.get
 val BACKEND_ERROR = Stats.getCounter("backend_timeout")

    def query(q: Query): DocResults = w3c.time("replica-query") {
      server.thriftCall { client =>
        // logic goes here
      }
    }

    def ping(): Boolean = server.thriftCall { client =>
      log.debug("calling ping via thrift for %s", server)
      val rv = client.ping()
      log.debug("ping returned %s from %s", rv, server)
      rv
    }
}
Literal Decomposition: Server
 case class Server(val hostname: String, val port: Int) {
  val pool = ConnectionPool(hostname, port)
  private val log = Logger.get

     def thriftCall[A](f: Client => A) = {
       log.debug("making thriftCall for server %s", this)
       pool.withClient { client => f(client) }
     }

     def replica: Replica = {
       Replica(ShardMap.serversToShards(this), this)
     }
 }
Hawkwind 2
Query Call
                      MergeLayer.query




Graph                  ShardMap.query




                 shard.replicaManager ! query




                         shard.query




                       randomReplica()




                        replica.query




                       server.thriftCall




             NameSearchDocumentLayerClient.find
Hawkwind 2
Query Call
                                   MergeLayer.query




Graph      what’s this?             ShardMap.query




                              shard.replicaManager ! query




                                      shard.query




                                    randomReplica()




                                     replica.query




                                    server.thriftCall




                          NameSearchDocumentLayerClient.find
ShardMap: Isolating Mutable State
‣   A singleton and an Actor.
‣   Contains a map from Servers to their
    corresponding Shards.
‣   Also contains a map from Shards to the Replicas
    of those shards.
‣   Responsible for populating and managing
    those maps.
‣   Send it a message to evict or reinsert a Replica.
‣   Fans out queries to Shards.
ReplicaHealthChecker
‣   Much like the ShardMap, a singleton and an
    Actor.
‣   Maintains mutable lists of unhealthy Replicas
    (“the penalty box”).
‣   Constantly checking to see if evicted Replicas
    are healthy again (back online).
‣   Sends messages to itself – an effective Actor
    technique.
Challenges, Large and Small
‣   Fast importing of huge serialized Thrift object
    dumps.
‣   Testing the ShardMap and ReplicaHealthChecker
    (mutable state wants to hurt you).
‣   Efficient accent normalization and filtering for
    special characters.
‣   Working with the Apache Commons object pool.
‣   Breaking out different ranking mechanisms in a
    clean, reusable way.
Libraries & Tools
Things that make working in Scala
way more productive.
sbt – the Simple Build Tool
‣   Scala’s answer to Ant and Maven.
‣   Sets up new projects.
‣   Maintains project configuration, build tasks,
    and dependencies in pure Scala. Totally open-
    ended.
‣   Interactive console.
‣   Will run tasks as soon as files in your project
    change – automatically compile and run tests!
Ostrich
‣   Gather statistics about your application.
‣   Counters, gauges, and timings.
‣   Share stats via JMX, a plain-text socket, a web
    interface, or log files.
‣   Ex:
          Stats.time("foo") {
            timeConsumingOperation()
          }
Configgy
‣   Manages configuration files and logging.
‣   Flexible file format, can include files in other files.
‣   Inheritance, variable substitution.
‣   Tunable logging, logging with Scribe.
‣   Subscription API: push and validate
    configuration changes to running processes.
‣   Ex:
      val foo = config.getString(“foo”)
Specs + xrayspecs
 ‣   A behavior-driven development (BDD) testing
     framework for Scala.
 ‣   Elegant, readable, fun-to-write tests.
 ‣   Support for several mocking frameworks (we
     like Mockito).
 ‣   Test concurrent operations, time, much more.
 ‣   Ex:
"suggestion with a List of null does not blow up" in {
  MergeLayer.suggestion("steve", List(null)) mustEqual None
}
Questions?                                 Follow me at
                                           twitter.com/al3x

Learn with us at engineering.twitter.com
Work with us at jobs.twitter.com




                                                   TM
Ad

More Related Content

What's hot (20)

How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
Himanshu Gupta
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
Devopam Mittra
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Lucidworks
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
Roland Kuhn
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
huguk
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
thelabdude
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
daveconnors
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
Kathleen Ting
 
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEFTopic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Sergiy Matusevych
 
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF FrameworkTopic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
DataWorks Summit
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
Himanshu Gupta
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Lucidworks
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
Roland Kuhn
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
huguk
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
thelabdude
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
daveconnors
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
Kathleen Ting
 
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEFTopic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Sergiy Matusevych
 
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF FrameworkTopic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
DataWorks Summit
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 

Viewers also liked (20)

Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
datamantra
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
Patrick Nicolas
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
Knoldus Inc.
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
Ryan King
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
Tudor Lapusan
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
Sarah Mount
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
ctrezzo
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
Taposh Roy
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
Haim Michael
 
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
Amazon Web Services Korea
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
datamantra
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
Patrick Nicolas
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
Knoldus Inc.
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
Ryan King
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
Sarah Mount
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
ctrezzo
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
Taposh Roy
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
Haim Michael
 
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
Amazon Web Services Korea
 
Ad

Similar to Building Distributed Systems in Scala (20)

High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
Kamesh Pemmaraju
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Akara Sucharitakul
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
Alex Payne
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Timothy McPhillips
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for Scala
Marakana Inc.
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Scala+data
Scala+dataScala+data
Scala+data
Samir Bessalah
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
Florent Ramiere
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
Cisco DevNet
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 
Couchbase Data Pipeline
Couchbase Data PipelineCouchbase Data Pipeline
Couchbase Data Pipeline
Justin Michaels
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
Kamesh Pemmaraju
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Akara Sucharitakul
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
Alex Payne
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Timothy McPhillips
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for Scala
Marakana Inc.
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
Cisco DevNet
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
Ad

More from Alex Payne (17)

Splitting up your web app
Splitting up your web appSplitting up your web app
Splitting up your web app
Alex Payne
 
The perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that mattersThe perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that matters
Alex Payne
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the Horizon
Alex Payne
 
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional LanguagesSpeedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Alex Payne
 
Mind The Tools
Mind The ToolsMind The Tools
Mind The Tools
Alex Payne
 
Strange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in ComputingStrange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in Computing
Alex Payne
 
The Business Value of Twitter
The Business Value of TwitterThe Business Value of Twitter
The Business Value of Twitter
Alex Payne
 
Twitter API 2.0
Twitter API 2.0Twitter API 2.0
Twitter API 2.0
Alex Payne
 
The Interaction Design Of APIs
The Interaction Design Of APIsThe Interaction Design Of APIs
The Interaction Design Of APIs
Alex Payne
 
Why Scala for Web 2.0?
Why Scala for Web 2.0?Why Scala for Web 2.0?
Why Scala for Web 2.0?
Alex Payne
 
The Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to AdobeThe Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to Adobe
Alex Payne
 
Protecting Public Hotspots
Protecting Public HotspotsProtecting Public Hotspots
Protecting Public Hotspots
Alex Payne
 
Twitter at BarCamp 2008
Twitter at BarCamp 2008Twitter at BarCamp 2008
Twitter at BarCamp 2008
Alex Payne
 
Securing Rails
Securing RailsSecuring Rails
Securing Rails
Alex Payne
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
Alex Payne
 
Designing Your API
Designing Your APIDesigning Your API
Designing Your API
Alex Payne
 
Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007
Alex Payne
 
Splitting up your web app
Splitting up your web appSplitting up your web app
Splitting up your web app
Alex Payne
 
The perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that mattersThe perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that matters
Alex Payne
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the Horizon
Alex Payne
 
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional LanguagesSpeedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Alex Payne
 
Mind The Tools
Mind The ToolsMind The Tools
Mind The Tools
Alex Payne
 
Strange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in ComputingStrange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in Computing
Alex Payne
 
The Business Value of Twitter
The Business Value of TwitterThe Business Value of Twitter
The Business Value of Twitter
Alex Payne
 
Twitter API 2.0
Twitter API 2.0Twitter API 2.0
Twitter API 2.0
Alex Payne
 
The Interaction Design Of APIs
The Interaction Design Of APIsThe Interaction Design Of APIs
The Interaction Design Of APIs
Alex Payne
 
Why Scala for Web 2.0?
Why Scala for Web 2.0?Why Scala for Web 2.0?
Why Scala for Web 2.0?
Alex Payne
 
The Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to AdobeThe Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to Adobe
Alex Payne
 
Protecting Public Hotspots
Protecting Public HotspotsProtecting Public Hotspots
Protecting Public Hotspots
Alex Payne
 
Twitter at BarCamp 2008
Twitter at BarCamp 2008Twitter at BarCamp 2008
Twitter at BarCamp 2008
Alex Payne
 
Securing Rails
Securing RailsSecuring Rails
Securing Rails
Alex Payne
 
Designing Your API
Designing Your APIDesigning Your API
Designing Your API
Alex Payne
 
Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007
Alex Payne
 

Recently uploaded (20)

Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 

Building Distributed Systems in Scala

  • 1. Building Distributed Systems in Scala A presentation to Emerging Technologies for the Enterprise April 8, 2010 – Philadelphia, PA TM
  • 2. About @al3x ‣ At Twitter since 2007 ‣ Working on the Web since 1995 ‣ Co-author of Programming Scala (O’Reilly, 2009) ‣ Into programming languages, distributed systems.
  • 3. About Twitter ‣ Social messaging – a new way to communicate ‣ Launched in mid-2006 ‣ Hit the mainstream in 2008 ‣ 50+ million tweets per day (600+ per second) ‣ Millions of users worldwide
  • 4. Technologies Used At Twitter Languages Frameworks ‣ Ruby, JavaScript ‣ Rails ‣ Scala ‣ jQuery ‣ lil’ bit of C, Python, Java Data Storage Misc. ‣ MySQL ‣ memcached ‣ Cassandra ‣ ZooKeeper ‣ HBase (Hadoop) ‣ Jetty ‣ so much more!
  • 5. Why Scala? ‣ A language that’s both fun and productive. ‣ Great performance (on par with Java). ‣ Object-oriented and functional programming, together. ‣ Ability to reuse existing Java libraries. ‣ Flexible concurrency (Actors, threads, events). ‣ A smart community with infectious momentum.
  • 6. Hawkwind A case study in (re)building a distributed system in Scala.
  • 7. Requirements ‣ Search for people by name, username, eventually by other attributes. ‣ Order the results some sensible way (ex: by number of followers). ‣ Offer suggestions for misspellings/alternate names. ‣ Handle case-folding and other text normalization concerns on the query string. ‣ Return results in about a second, preferably less.
  • 9. Finding People on Twitter results
  • 10. Finding People on Twitter suggestion results
  • 11. Finding People on Twitter speedy! suggestion results
  • 12. First Attempt: acts_as_solr ‣ Crunched on time, so we wanted the fastest route to working user search. ‣ Uses the Solr distribution/platform from Apache Lucene. ‣ Tries to make Rails integration straightforward and idiomatic. ‣ Easy to get running, hard to operationalize.
  • 13. In the Interim: A Move to SOA ‣ Stopped thinking of our architecture as just a Rails app and the components that orbit it. ‣ Started building isolated services that communicate with the rest of the system via Thrift (an RPC and server framework). ‣ Allows us freedom to change the underlying implementation of services without modifying the rest of the system.
  • 14. Thrift Example struct Results { 1: list<i64> people 2: string suggestion 3: i32 processingTime /* milliseconds */ 4: list<i32> timings 5: i32 totalResults } service NameSearch { Results find(1: string name, 2: i32 maxResults, 3: bool wantSuggestion) Results find_with_ranking(1: string name, 2: i32 maxResults, 3: bool wantSuggestion, 4: Ranker ranking) }
  • 15. Second Attempt: Hawkwind 1 ‣ A quick (three weeks) bespoke Scala project to “stop the bleeding”. ‣ Vertically but not horizontally scalable: no sharding, no failover, machine-level redundancy. ‣ Ran into memory and disk space limits. ‣ Reused Java code but didn’t offer nice Scala wrappers or rewrites. ‣ Still, planned to grow 10x, grew 25x!
  • 16. Goals for Hawkwind 2 ‣ Horizontally scalable: sharded corpus, replication of shards, easy to grow the service. ‣ Faster. ‣ Higher-quality results. ‣ Better use of Scala (language features, programming style). ‣ Maintainable code base, make it easy to add features.
  • 17. High-Level Concepts ‣ Shards: pieces of the user corpus. ‣ Replicas: copies of shards. ‣ Document Servers. ‣ Merge Servers. ‣ Every machine gets the same code, can be either a Document Server or a Merge Server.
  • 18. Hawkwind 2 Internet High-Level queries for users, API requests Architecture Rails Cluster Thrift call to semi-random Merge Server Merge Merge Merge Server Server Server Thrift calls to semi-random replica of each shard Shard 1 Shard 1 Shard 2 Shard 2 Shard 3 Shard 3 Doc Server Doc Server Doc Server Doc Server Doc Server Doc Server periodic deliveries of sharded user corpus Hadoop (HBase)
  • 19. Taking Care of Data ‣ A Hadoop job gathers up the user data and slices it into shards. ‣ A cron job fetches these data dumps several times per day. ‣ To load a new corpus on a Document Server, simply restart the process. ‣ Redundancy and staggered scheduling keeps the system from running too hot while restarts are in progress.
  • 20. What a Document Server does ‣ On startup, load Thrift serialized User objects. ‣ Populate an Inverted Index, Map, and Trie with normalized attributes of those User objects. ‣ Once ready, listen for queries. ‣ Answering a query basically means looking stuff up in those pre-populated data structures. ‣ Maintains a connection pool for Thrift requests, wrapping org.apache.commons.pool.
  • 21. What a Merge Server does ‣ Gets queries. ‣ Fans out queries to Document Servers. ‣ Waits for queries to come back using a custom ParallelFuture class, which wraps a number of java.util.concurrent classes. ‣ Merges together the result sets, re-ranks them, and ships ‘em back to the requesting client.
  • 22. How to model a distributed system? ‣ Literal decomposition: classes for all architectural components (Shard, Replica, etc.). ‣ Each component knows/does as little as possible. ‣ Isolate mutable state, test carefully. ‣ Cleanly delegate calls.
  • 23. Literal Decomposition: Replica case class Replica(val shard: Shard, val server: Server) { private val log = Logger.get val BACKEND_ERROR = Stats.getCounter("backend_timeout") def query(q: Query): DocResults = w3c.time("replica-query") { server.thriftCall { client => // logic goes here } } def ping(): Boolean = server.thriftCall { client => log.debug("calling ping via thrift for %s", server) val rv = client.ping() log.debug("ping returned %s from %s", rv, server) rv } }
  • 24. Literal Decomposition: Server case class Server(val hostname: String, val port: Int) { val pool = ConnectionPool(hostname, port) private val log = Logger.get def thriftCall[A](f: Client => A) = { log.debug("making thriftCall for server %s", this) pool.withClient { client => f(client) } } def replica: Replica = { Replica(ShardMap.serversToShards(this), this) } }
  • 25. Hawkwind 2 Query Call MergeLayer.query Graph ShardMap.query shard.replicaManager ! query shard.query randomReplica() replica.query server.thriftCall NameSearchDocumentLayerClient.find
  • 26. Hawkwind 2 Query Call MergeLayer.query Graph what’s this? ShardMap.query shard.replicaManager ! query shard.query randomReplica() replica.query server.thriftCall NameSearchDocumentLayerClient.find
  • 27. ShardMap: Isolating Mutable State ‣ A singleton and an Actor. ‣ Contains a map from Servers to their corresponding Shards. ‣ Also contains a map from Shards to the Replicas of those shards. ‣ Responsible for populating and managing those maps. ‣ Send it a message to evict or reinsert a Replica. ‣ Fans out queries to Shards.
  • 28. ReplicaHealthChecker ‣ Much like the ShardMap, a singleton and an Actor. ‣ Maintains mutable lists of unhealthy Replicas (“the penalty box”). ‣ Constantly checking to see if evicted Replicas are healthy again (back online). ‣ Sends messages to itself – an effective Actor technique.
  • 29. Challenges, Large and Small ‣ Fast importing of huge serialized Thrift object dumps. ‣ Testing the ShardMap and ReplicaHealthChecker (mutable state wants to hurt you). ‣ Efficient accent normalization and filtering for special characters. ‣ Working with the Apache Commons object pool. ‣ Breaking out different ranking mechanisms in a clean, reusable way.
  • 30. Libraries & Tools Things that make working in Scala way more productive.
  • 31. sbt – the Simple Build Tool ‣ Scala’s answer to Ant and Maven. ‣ Sets up new projects. ‣ Maintains project configuration, build tasks, and dependencies in pure Scala. Totally open- ended. ‣ Interactive console. ‣ Will run tasks as soon as files in your project change – automatically compile and run tests!
  • 32. Ostrich ‣ Gather statistics about your application. ‣ Counters, gauges, and timings. ‣ Share stats via JMX, a plain-text socket, a web interface, or log files. ‣ Ex: Stats.time("foo") { timeConsumingOperation() }
  • 33. Configgy ‣ Manages configuration files and logging. ‣ Flexible file format, can include files in other files. ‣ Inheritance, variable substitution. ‣ Tunable logging, logging with Scribe. ‣ Subscription API: push and validate configuration changes to running processes. ‣ Ex: val foo = config.getString(“foo”)
  • 34. Specs + xrayspecs ‣ A behavior-driven development (BDD) testing framework for Scala. ‣ Elegant, readable, fun-to-write tests. ‣ Support for several mocking frameworks (we like Mockito). ‣ Test concurrent operations, time, much more. ‣ Ex: "suggestion with a List of null does not blow up" in { MergeLayer.suggestion("steve", List(null)) mustEqual None }
  • 35. Questions? Follow me at twitter.com/al3x Learn with us at engineering.twitter.com Work with us at jobs.twitter.com TM

Editor's Notes

  • #24: This is literally all there is to this class!