SlideShare a Scribd company logo
NewSQL Database Overview




                민형기 (S-Core)
                hg.min@samsung.com
                2013. 2. 22.
Contents
I. Why NewSQL?
II. NewSQL 기본 개념
III. NewSQL 종류
IV.NewSQL 정리



                    1
Why NewSQL?




              2
Thinking – Extreme Data




Qcon London 2012           3
Thinking - Traffic Explosion




출처 : Netflix in the Cloud (http://www.slideshare.net/adrianco/netflix-in-the-cloud-2011)   4
Organizations need deeper insights




Qcon London 2012                      5
Solutions

□Buy High end Technology
□Higher more developers
□Using NoSQL
□Using NewSQL




                           6
Solution – Buy High End Technology




Oracle, IBM                            7
Solution – Higher more developers

   □Application Level Sharding
   □Build your replication middleware
   □…




http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5   8
Solutions – Use NoSQL
 □새로운 비 관계형 데이터 베이스
 □분산 아키텍처
 □수평 확장성
 □고정된 테이블 스키마가 없음
 □Join, UPDATE, DELETE 연산이 없음
 □트랜잭션이 없음
 □SQL 지원이 없음



                                9
NoSQL Ecosystems




451 group           10
MongoDB
 □Document-oriented database
  JSON-style documents: Lists, Maps, primitives
  Schema-less
 □Transaction = update of a single
  document
 □Rich query language for dynamic queries
 □Tunable writes: speed reliability
 □Highly scalable and available


                                                   11
MongoDB 사용예
□Use cases
  High volume writes
  Complex data
  Semi-structured data


□주요 고객
    Foursquare
    Bit.ly Intuit
    SourceForge, NY Times
    GILT Groupe, Evite,
    SugarCRM

                             12
Apache Cassandra
 □Column-oriented database/Extensible row store
     Think Row ~= java.util.SortedMap
 □Transaction = update of a row
 □Fast writes = append to a log
 □Tunable reads/writes: consistency / availability
 □Extremely scalable
   Transparent and dynamic clustering
   Rack and datacenter aware data replication
 □CQL = “SQL”-like DDL and DML



                                                     13
Apache Cassandra 사용 예
 □사용 예
    Big data
    Multiple Data Center distributed database
    Persistent cache
    (Write intensive) Logging
    High-availability (writes)


 □주요 고객
  Digg, Facebook, Twitter, Reddit, Rackspace
  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
  The largest production cluster has over 100 TB of
   data in over 150 machines.“ – Casssandra web site
                                                       14
Solutions – Use NewSQL
 □새로운 관계형 데이터베이스

 □SQL과 ACID 트랜잭션을 유지
 □새롭고 개선된 분산 아키텍처
 □뛰어난 확장성과 성능을 지원

 □NewSQL vendors: ScaleDB, NimbusDB, ...,
  VoltDB


                                            15
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   16
NewSQL 정의 – Wikipedia


      NewSQL is a class of modern relational
        database management systems that seek
        to provide the same scalable performance
        of NoSQL systems for OLTP workloads while
        still maintaining the ACID guarantees of a
        traditional single-node database system


http://en.wikipedia.org/wiki/NewSQL                  17
NewSQL 정의 – 451 Group


       A DBMS that delivers the scalability and
         flexibility promised by NoSQL while retaining
         the support for SQL queries and/or ACID, or
         to improve performance for appropriate

       workloads.



http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   18
NewSQL 정의 – Stonbraker


        SQL as the primary interface.

        ACID support for transactions

        Non-locking concurrency control.

        High per-node performance.

        Parallel, shared-nothing architecture.


http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   19
NewSQL Category
 New Database
 New MySQL Storage Engines
 Transparent Clustering




                              20
The evolving database landscape




OSBC                              21
MySQL Ecosystem




                  22
NewSQL Ecosystem




                   23
New Database
     □ Newly designed from scratch to achieve
      scalability and performance
         One of the key considerations in improving the
          performance is making non-disk (memory) or new
          kinds of disks (flash/SSD) the primary data store.
         some (hopefully minor) changes to the code will be
          required and data migration is still needed.


     □Solutions
         Software-Only: VoltDB, NuoDB, Drizzle, Google Spanner
         Supported as an appliance: Clustrix, Translattice.

http://www.linuxforu.com/2012/01/newsql-handle-big-data/          24
New MySQL Storage Engines
     □Highly optimized storage engines for MySQL
     □Scale better than built-in engines, such as
      InnoDB.
         Good part: the usage of the MySQL interface
         Downside part: data migration from other databases


     □Solutions
         TokuDB, MemSQL, Xeround, Akiban, NDB




http://www.linuxforu.com/2012/01/newsql-handle-big-data/       25
Transparent Clustering
     □Retain the OLTP databases in their original
      format, but provide a pluggable feature
         Cluster transparently
         Ensure Scalability
     □Avoid the rewrite code or perform any
      data migration
     □Solutions
         Cluster transparently: Schooner MySQL, Continuent
          Tungsten, ScalArc
         Ensure Scalability: ScaleBase, dbShards


http://www.linuxforu.com/2012/01/newsql-handle-big-data/      26
NewSQL Products
 VoltDB
 Google Spanner




                   27
VoltDB
   □ VoltDB, 2010, GPL/VoltDB Proprietary License, Java/C++
   □ Type: NewSQL, New Database
   □ Main Point: In-memory Database, Java Stored Procedure, VoltDB
     implements the design of the academic H-Store project
   □ Protocol: SQL
   □ Transaction: Yes
   □ Data Storage: Memory
   □ Features
      □ in-memory relational database
      □ Durability thru replication, snapshots, logging
      □ Transparent partitioning
      □ ACID-level consistency
      □ Synchronous multi-master replication
      □ Database Replication




http://voltdb.com/products-services/products, http://www.slideshare.net/chris.e.richardson/polygot-persistenceforjavadevs-jfokus2012reorgpptx   28
VoltDB- Technical Overview
   “OLTP Through the Looking Glass”
       http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf


   VoltDB avoids the overhead of traditional databases
       K-safety for fault tolerance
        • no logging
       In memory operation for maximum throughput
        • no buffer management

       Partitions operate autonomously
                                                                          X X
        and single-threaded
        • no latching or locking                                    X
   Built to horizontally scale                                           X
  29                                                                            29
VoltDB - Partitions (1/3)
    1 partition per physical CPU core
    – Each physical server has multiple VoltDB partitions
    Data - Two types of tables
    – Partitioned
     Single column serves as partitioning key
     Rows are spread across all VoltDB partitions by partition column   X       X
     Transactional data (high frequency of modification)
    – Replicated
     All rows exist within all VoltDB partitions
     Relatively static data (low frequency of modification)
    Code - Two types of work – both ACID
    – Single-Partition                                             X
     All insert/update/delete operations within single partition            X   X
     Majority of transactional workload
    – Multi-Partition
     CRUD against partitioned tables across multiple partitions
     Insert/update/delete on replicated tables



                                                                                    30
VoltDB - Partitions (2/3)
   Single-partition vs. Multi-partition

        select count(*) from orders where customer_id = 5
                        single-partition

        select count(*) from orders where product_id = 3
                        multi-partition

        insert into orders (customer_id, order_id, product_id) values (3,303,2)
                        single-partition

        update products set product_name = ‘spork’ where product_id = 3
                     multi-partition




        Partition 1               Partition 2                Partition 3

         1     101     2            2    201     1            3     201     1     table orders :   customer_id (partition key)
         1     101     3            5    501     3            6     601     1     (partitioned)    order_id
         4     401     2            5    502     2            6     601     2                      product_id


         1     knife                1    knife                1     knife         table products : product_id
         2     spoon                2    spoon                2     spoon         (replicated)     product_name
         3     fork                 3    fork                 3     fork

                                                                                                                                 31
VoltDB - Partitions (3/3)
    Looking inside a VoltDB partition…
    – Each partition contains data and an
      execution engine.
    – The execution engine contains a queue
      for transaction requests.
                                                                Work
    – Requests are executed sequentially
      (single threaded).
                                                                Queue



                                                      execution engine


                                                            Table Data
                                                            Index Data



                - Complete copy of all replicated tables
                - Portion of rows (about 1/partitions) of
                all partitioned tables

                                                                         32
VoltDB - Compiling
                                                    Schema                            Stored Procedures
   The database is constructed from             CREATE TABLE HELLOWORLD (                     import org.voltdb. * ;
                                                                                                  import org.voltdb. * ;
                                                    HELLO CHAR(15),
                                                                                                 @ProcInfo( org.voltdb. * ;
                                                                                                     import
                                                                                               @ProcInfo(
                                                    WORLD CHAR(15),                                 partitionInfo = "HELLOWORLD.DIA


   – The schema (DDL)
                                                    DIALECT CHAR(15),                             partitionInfo true "HE
                                                                                                    singlePartition = =
                                                                                                     @ProcInfo(
                                                                                                        partitionInfo = "HELLOWORLD.DIA
                                                                                                 )singlePartition = t
                                                    PRIMARY KEY (DIALECT)                               singlePartition = true
                                                 );                                                   )
                                                                                                  public class Insert extends VoltPr



   – The work load (Java stored procedures)
                                                                                               public final SQLStmt
                                                                                                 public final SQLStmt sql =
                                                                                                     public class Insert extends VoltPr
                                                                                                     new SQLStmt("INSERT INTO HELLO
                                                                                               public VoltTable[] sql =
                                                                                                     public final SQLStmt run
                                                                                                          new SQLStmt("INSERT INTO HELLO
                                                                                                  public VoltTable[] run( String hel



   – The Project (users, groups, partitioning)
                                                                                                      public VoltTable[] run( String hel




   VoltCompiler creates application
    catalog                                                             Project.xml
   – Copy to servers along with 1 .jar and                            <?xml version="1.0"?>
                                                                      <project>

     1 .so                                                              <database name='data
                                                                          <schema path='ddl.
                                                                          <partition table=‘

   – Start servers                                                      </database>
                                                                      </project>




                                                                                                                                           33
VoltDB - Transactions

   All access to VoltDB is via Java stored procedures (Java +
    SQL)
   A single invocation of a stored procedure is a transaction
    (committed on success)
                                                                 SQL
   Limits round trips between DBMS
    and application
   High performance client applications communicate
    asynchronously with VoltDB




                                                                       34
VoltDB - Clusters/Durability
   Scalability
   – Increase RAM in servers to add capacity
   – Add servers to increase performance / capacity
   – Consistently measuring 90% of single-node performance increase per additional
     node
   High availability
   – K-safety for redundancy
   Snapshots
   – Scheduled, continuous, on demand
   Spooling to data warehouse
   Disaster Recovery/WAN replication (Future)
   – Asynchronous replication




                                                                                     35
Google Spanner
   □ Google, 2012, Paper, C++
   □ Type: NewSQL, New Database
   □ Main Point: Google's scalable, multi-version, globally-distributed, and
     synchronously-replicated database

   □ Distributed multiversion database
       General-purpose transactions (ACID)
       SQL query language
       Schematized tables
       Semi-relational data model
   □ Running in production
       Storage for Google’s ad data
       Replaced a sharded MySQL database




http://research.google.com/archive/spanner.html                                36
Google Spanner Overview
   □Feature: Lock-free distributed read
    transactions
   □Property: External consistency of distributed
    transactions
   □First system at global scale
   □Implementation: Integration of concurrency
    control, replication, and 2PC
   □Correctness and performance
   □Enabling technology: TrueTime
   □Interval-based global time

http://research.google.com/archive/spanner.html     37
Design Goals for Spanner




http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf   38
MySQL Cluster – NDB Architecture




http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-overview.html   39
Schooner MySQL Active Cluster




http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-overview.html   40
dbShards Architecture




http://www.linuxforu.com/2012/01/newsql-handle-big-data/   41
NewSQL 정리




            42
Database 업계의 3가지 Trends
□ NoSQL 데이터베이스:
    분산 아키텍처의 확장성 등의 요구 사항을 충족하며, 스키마 없는 데이터
     관리 요구 사항에 부합하도록 설계됨.
□ NewSQL 데이터베이스:
    분산 아키텍처의 확장성 등의 요구 사항을 충족하거나 혹은 수평 확장을
     필요로하지 않지만 성능을 개선은 되도록 설계됨.
□ Data Grid/Cache 제품:
    응용 프로그램 및 데이터베이스 성능을 높이기 위해 메모리에 데이터를
     저장하도록 설계됨.




                                               43
결론
□ 데이터 저장을 위한 많은 솔루션이 존재
  □ Oracle, MySQL만 있다는 생각은 버려야 함
  □ 먼저 시스템의 데이터 속성과 요구사항을 파악(CAP, ACID/BASE)
  □ 한 시스템에 여러 솔루션을 적용
      소규모/복잡한 관계 데이터: RDBMS
      대규모 실시간 처리 데이터: NoSQL, NewSQL
      대규모 저장용 데이터: Hadoop 등
□ 적절한 솔루션 선택
  □ 반드시 운영 중 발생할 수 있는 이슈에 대해 검증 후 도입 필요
  □ 대부분의 NewSQL 솔루션은 베타 상태(섣부른 선택은 독이 될 수 있음)
  □ 솔루션의 프로그램 코드 수준으로 검증 필요
□ NewSQL 솔루션에 대한 안정성 확보
  □ 솔루션 자체의 안정성은 검증이 필요하며 현재의 DBMS 수준의 안정성은 지원하
    지 않음
  □ 반드시 안정적인 데이터 저장 방안 확보 후 적용 필요
  □ 운영 및 개발 경험을 가진 개발자 확보 어려움
  □ 요구사항에 부합되는 NewSQL 선정 필요
□ 처음부터 중요 시스템에 적용하기 보다는 시범 적용 필요
  □ 선정된 솔루션 검증, 기술력 내재화

                                                  44
감사합니다.




         45
Appendix.




            46
Early – 2000s

   □All the big players were heavyweight
    and expensive.
       Oracle, DB2, Sybase, SQL Server, etc.


   □Open-source databases were missing
    important features.
       Postgres, mSQL, and MySQL.


http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   47
Early – 2000s : eBay Architecture




http://highscalability.com/ebay-architecture   48
Early – 2000s : eBay Architecture




                                            Push functionality to application:
                                               Joins
                                               Referential integrity
                                               Sorting done

                                            No distributed transactions




http://highscalability.com/ebay-architecture                                      49
Mid– 2000s

   □MySQL + InnoDB is widely adopted by
    new web companies:
       Supported transactions, replication,
        recovery.
       Still must use custom middleware to scale
        out across multiple machines.
       Memcache for caching queries.



http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   50
Mid – 2000s : Facebook Architecture




http://www.techthebest.com/2011/11/29/technology-used-in-facebook/   51
Mid – 2000s : Facebook Architecture




                             Scale out using custom middleware.
                             Store ~75% of database in Memcache.
                             No distributed transactions.




http://www.techthebest.com/2011/11/29/technology-used-in-facebook/   52
Late – 2000s

   □MySQL + InnoDB is widely adopted by
    new web companies:
       Supported transactions, replication,
        recovery.
       Still must use custom middleware to scale
        out across multiple machines.
       Memcache for caching queries.



http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   53
Late – 2000s : MongoDB Architecture




http://sett.ociweb.com/sett/settAug2011.html   54
Late – 2000s : MongoDB Architecture




                              Easy to use.
                              Becoming more like a DBMS over time.
                              No transactions.




http://sett.ociweb.com/sett/settAug2011.html                          55
Early – 2010s

   □New DBMSs that can scale across
    multiple machines natively and provide
    ACID guarantees.
       MySQL Middleware
       Brand New Architectures




http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf   56
Database SPRAIN




                  57
Database SPRAIN
 □“An injury to ligaments... caused by being
  stretched beyond normal capacity”

 □Six key drivers for NoSQL/NewSQL/DDG
  adoption
     Scalability
     Performance
     Relaxed consistency
     Agility
     Intricacy
     Necessity
                                               58
Database SPRAIN - Scalability
 □Associated sub-driver: Hardware
  economics
   Scale-out across clusters of commodity servers
 □Example project/service/vendor
   BigTable HBase Riak MongoDB Couchbase, Hadoop
   Amazon RDS, Xeround, SQL Azure, NimbusDB
   Data grid/cache
 □Associated use case:
   Large-scale distributed data storage
   Analysis of continuously updated data
   Multi-tenant PaaS data layer
                                                     59
Database SPRAIN - Scalability
 □User: StumbleUpon
 □Problem:
   Scaling problems with recommendation engine on
    MySQL
 □Solution: HBase
   Started using Apache HBase to provide real-time
    analytics on Su.pr
   MySQL lacked the performance headroom and scale
   Multiple benefits including avoiding declaring schema
   Enables the data to be used for multiple applications
    and use cases

                                                            60
Database SPRAIN - Performance
 □Associated sub-driver: MySQL limitations
   Inability to perform consistently at scale
 □Example project/service/vendor
   Hypertable Couchbase Membrain MongoDB Redis
   Data grid/cache
   VoltDB, Clustrix
 □Associated use case:
   Real time data processing of mixed read/write
    workloads
   Data caching
   Large-scale data ingestion
                                                    61
Database SPRAIN - Performance
 □User: AOL Advertising
 □Problem:
   Real-time data processing to support targeted
    advertising
 □Solution: Membase Server
   Segmentation analysis runs in CDH, results passed into
    Membase
   Make use of its sub-millisecond data delivery
   More time for analysis as part of a 40ms targeted and
    response time
   Also real time log and event management

                                                             62
Database SPRAIN – Relaxed Consistency
 □Associated sub-driver: CAP theorem
   The need to relax consistency in order to maintain
    availability
 □Example project/service/vendor:
   Dynamo, Voldemort, Cassandra
   Amazon SimpleDB
 □Associated use case:
   Multi-data center replication
   Service availability
   Non-transactional data off-load


                                                         63
Database SPRAIN – Relaxed Consistency
 □User: Wordnik
 □Problem:
   MySQL too consistent –blocked access to data during
    inserts and created numerous temp files to stay
    consistent.
 □Solution: MongoDB
   Single word definition contains multiple data items
    from various sources
   MongoDB stores data as a complete document
   Reduced the complexity of data storage


                                                          64
Database SPRAIN – Agility
 □ Associated sub-driver: Polyglot
  persistence
   Choose most appropriate storage technology for app
    in development
 □Example project/service/vendor
   MongoDB, CouchDB, Cassandra
   Google App Engine, SimpleDB, SQL Azure
 □Associated use case:
   Mobile/remote device synchronization
   Agile development
   Data caching
                                                         65
Database SPRAIN – Agility
 □ User: Dimagi BHOMA (Better Health
  Outcomes through Mentoring and
  Assessments) project
 □Problem:
   Deliver patient information to clinics despite a lack of
    reliable Internet connections
 □Solution: Apache CouchDB
   Replicates data from regional to national database
   When Internet connection, and power, is available
   Upload patient data from cell phones to local clinic


                                                               66
Database SPRAIN – Intricacy
 □ Associated sub-driver: Big data, total
  data
   Rising data volume, variety and velocity
 □Example project/service/vendor
   Neo4j GraphDB, InfiniteGraph
   Apache Cassandra, Hadoop,
   VoltDB, Clustrix
 □Associated use case:
   Social networking applications
   Geo-locational applications
   Configuration management database
                                               67
Database SPRAIN – Intricacy
 □ User: Evident Software
 □Problem:
   Mapping infrastructure dependencies for application
    performance management
 □Solution: Neo4j
   Apache Cassandra stores performance data
   Neo4j used to map the correlations between different
    elements
   Enables users to follow relationships between
    resources while investigating issues


                                                           68
Database SPRAIN – Necessity
 □ Associated sub-driver: Open source
   The failure of existing suppliers to address the
    performance, scalability and flexibility requirements of
    large-scale data processing
 □ Example project/service/vendor
     BigTable, Dynamo, MapReduce, Memcached
     Hadoop HBase, Hypertable, Cassandra, Membase
     Voldemort, Riak, BigCouch
     MongoDB, Redis, CouchDB, Neo4J
 □Associated use case:
   All of the above

                                                               69
Database SPRAIN – Necessity
 □BigTable: Google
 □Dynamo: Amazon
 □Cassandra: Facebook
 □HBase: Powerset
 □Voldemort: LinkedIn
 □Hypertable: Zvents
 □Neo4j: Windh Technologies
   Yahoo: Apache Hadoop and Apache HBase
   Digg: Apache Cassandra
   Twitter: Apache Cassandra, Apache Hadoop and
    FlockDB
                                                   70
Ad

More Related Content

What's hot (20)

Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
Ceph Community
 
Mongo DB
Mongo DBMongo DB
Mongo DB
Pradeep Shanmugam
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
Ceph Community
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Ludovico Caldara
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFS
Tsung-en Hsiao
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
MariaDB plc
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
I Goo Lee
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
Italo Santos
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
Sushant Choudhary
 
Kubernetes #6 advanced scheduling
Kubernetes #6   advanced schedulingKubernetes #6   advanced scheduling
Kubernetes #6 advanced scheduling
Terry Cho
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
Ceph Community
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
Ceph Community
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Ludovico Caldara
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFS
Tsung-en Hsiao
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
MariaDB plc
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
I Goo Lee
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
Italo Santos
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
Sushant Choudhary
 
Kubernetes #6 advanced scheduling
Kubernetes #6   advanced schedulingKubernetes #6   advanced scheduling
Kubernetes #6 advanced scheduling
Terry Cho
 

Viewers also liked (17)

[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
Steve Min
 
Cloud Computing v1.0
Cloud Computing v1.0Cloud Computing v1.0
Cloud Computing v1.0
Steve Min
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (정재화)
 
Html5 video
Html5 videoHtml5 video
Html5 video
Steve Min
 
빅데이터_ISP수업
빅데이터_ISP수업빅데이터_ISP수업
빅데이터_ISP수업
jrim Choi
 
Cloud Music v1.0
Cloud Music v1.0Cloud Music v1.0
Cloud Music v1.0
Steve Min
 
Apache Htrace overview (20160520)
Apache Htrace overview (20160520)Apache Htrace overview (20160520)
Apache Htrace overview (20160520)
Steve Min
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
Steve Min
 
Scala overview
Scala overviewScala overview
Scala overview
Steve Min
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoop
MinJae Kang
 
BigData, Hadoop과 Node.js
BigData, Hadoop과 Node.jsBigData, Hadoop과 Node.js
BigData, Hadoop과 Node.js
고포릿 default
 
vertica_tmp_4.5
vertica_tmp_4.5vertica_tmp_4.5
vertica_tmp_4.5
Hwang Andrew
 
RESTful API Design, Second Edition
RESTful API Design, Second EditionRESTful API Design, Second Edition
RESTful API Design, Second Edition
Apigee | Google Cloud
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
Matthew (정재화)
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
Gruter
 
[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)
Steve Min
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
Steve Min
 
Cloud Computing v1.0
Cloud Computing v1.0Cloud Computing v1.0
Cloud Computing v1.0
Steve Min
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (정재화)
 
빅데이터_ISP수업
빅데이터_ISP수업빅데이터_ISP수업
빅데이터_ISP수업
jrim Choi
 
Cloud Music v1.0
Cloud Music v1.0Cloud Music v1.0
Cloud Music v1.0
Steve Min
 
Apache Htrace overview (20160520)
Apache Htrace overview (20160520)Apache Htrace overview (20160520)
Apache Htrace overview (20160520)
Steve Min
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
Steve Min
 
Scala overview
Scala overviewScala overview
Scala overview
Steve Min
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoop
MinJae Kang
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
Matthew (정재화)
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
Gruter
 
[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)
Steve Min
 
Ad

Similar to NewSQL Database Overview (20)

NoSQL with MySQL
NoSQL with MySQLNoSQL with MySQL
NoSQL with MySQL
FromDual GmbH
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
Tao Cheng
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
I Goo Lee
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TX
GOTO Satoru
 
Initial review of Firebird 3
Initial review of Firebird 3Initial review of Firebird 3
Initial review of Firebird 3
Mind The Firebird
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
Datacenter 2014: IPnett - Martin Milnert
Datacenter 2014: IPnett - Martin MilnertDatacenter 2014: IPnett - Martin Milnert
Datacenter 2014: IPnett - Martin Milnert
Mediehuset Ingeniøren Live
 
Introduction 6.1 01_architecture_overview
Introduction 6.1 01_architecture_overviewIntroduction 6.1 01_architecture_overview
Introduction 6.1 01_architecture_overview
Anvith S. Upadhyaya
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
Continuent
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guide
neinamat
 
Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01
Anvith S. Upadhyaya
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Robert Stupp
 
04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore
mlraviol
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
Daniel Martin
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Mydbops
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Azure SQL
Azure SQLAzure SQL
Azure SQL
GlobalLogic Ukraine
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
Tao Cheng
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
I Goo Lee
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TX
GOTO Satoru
 
Initial review of Firebird 3
Initial review of Firebird 3Initial review of Firebird 3
Initial review of Firebird 3
Mind The Firebird
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
Introduction 6.1 01_architecture_overview
Introduction 6.1 01_architecture_overviewIntroduction 6.1 01_architecture_overview
Introduction 6.1 01_architecture_overview
Anvith S. Upadhyaya
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
Continuent
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guide
neinamat
 
Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01
Anvith S. Upadhyaya
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Robert Stupp
 
04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore
mlraviol
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
Daniel Martin
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Mydbops
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Ad

NewSQL Database Overview

  • 1. NewSQL Database Overview 민형기 (S-Core) [email protected] 2013. 2. 22.
  • 2. Contents I. Why NewSQL? II. NewSQL 기본 개념 III. NewSQL 종류 IV.NewSQL 정리 1
  • 4. Thinking – Extreme Data Qcon London 2012 3
  • 5. Thinking - Traffic Explosion 출처 : Netflix in the Cloud (http://www.slideshare.net/adrianco/netflix-in-the-cloud-2011) 4
  • 6. Organizations need deeper insights Qcon London 2012 5
  • 7. Solutions □Buy High end Technology □Higher more developers □Using NoSQL □Using NewSQL 6
  • 8. Solution – Buy High End Technology Oracle, IBM 7
  • 9. Solution – Higher more developers □Application Level Sharding □Build your replication middleware □… http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5 8
  • 10. Solutions – Use NoSQL □새로운 비 관계형 데이터 베이스 □분산 아키텍처 □수평 확장성 □고정된 테이블 스키마가 없음 □Join, UPDATE, DELETE 연산이 없음 □트랜잭션이 없음 □SQL 지원이 없음 9
  • 12. MongoDB □Document-oriented database  JSON-style documents: Lists, Maps, primitives  Schema-less □Transaction = update of a single document □Rich query language for dynamic queries □Tunable writes: speed reliability □Highly scalable and available 11
  • 13. MongoDB 사용예 □Use cases  High volume writes  Complex data  Semi-structured data □주요 고객  Foursquare  Bit.ly Intuit  SourceForge, NY Times  GILT Groupe, Evite,  SugarCRM 12
  • 14. Apache Cassandra □Column-oriented database/Extensible row store  Think Row ~= java.util.SortedMap □Transaction = update of a row □Fast writes = append to a log □Tunable reads/writes: consistency / availability □Extremely scalable  Transparent and dynamic clustering  Rack and datacenter aware data replication □CQL = “SQL”-like DDL and DML 13
  • 15. Apache Cassandra 사용 예 □사용 예  Big data  Multiple Data Center distributed database  Persistent cache  (Write intensive) Logging  High-availability (writes) □주요 고객  Digg, Facebook, Twitter, Reddit, Rackspace  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX  The largest production cluster has over 100 TB of data in over 150 machines.“ – Casssandra web site 14
  • 16. Solutions – Use NewSQL □새로운 관계형 데이터베이스 □SQL과 ACID 트랜잭션을 유지 □새롭고 개선된 분산 아키텍처 □뛰어난 확장성과 성능을 지원 □NewSQL vendors: ScaleDB, NimbusDB, ..., VoltDB 15
  • 18. NewSQL 정의 – Wikipedia NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for OLTP workloads while still maintaining the ACID guarantees of a traditional single-node database system http://en.wikipedia.org/wiki/NewSQL 17
  • 19. NewSQL 정의 – 451 Group A DBMS that delivers the scalability and flexibility promised by NoSQL while retaining the support for SQL queries and/or ACID, or to improve performance for appropriate workloads. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 18
  • 20. NewSQL 정의 – Stonbraker  SQL as the primary interface.  ACID support for transactions  Non-locking concurrency control.  High per-node performance.  Parallel, shared-nothing architecture. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 19
  • 21. NewSQL Category  New Database  New MySQL Storage Engines  Transparent Clustering 20
  • 22. The evolving database landscape OSBC 21
  • 25. New Database □ Newly designed from scratch to achieve scalability and performance  One of the key considerations in improving the performance is making non-disk (memory) or new kinds of disks (flash/SSD) the primary data store.  some (hopefully minor) changes to the code will be required and data migration is still needed. □Solutions  Software-Only: VoltDB, NuoDB, Drizzle, Google Spanner  Supported as an appliance: Clustrix, Translattice. http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 24
  • 26. New MySQL Storage Engines □Highly optimized storage engines for MySQL □Scale better than built-in engines, such as InnoDB.  Good part: the usage of the MySQL interface  Downside part: data migration from other databases □Solutions  TokuDB, MemSQL, Xeround, Akiban, NDB http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 25
  • 27. Transparent Clustering □Retain the OLTP databases in their original format, but provide a pluggable feature  Cluster transparently  Ensure Scalability □Avoid the rewrite code or perform any data migration □Solutions  Cluster transparently: Schooner MySQL, Continuent Tungsten, ScalArc  Ensure Scalability: ScaleBase, dbShards http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 26
  • 28. NewSQL Products  VoltDB  Google Spanner 27
  • 29. VoltDB □ VoltDB, 2010, GPL/VoltDB Proprietary License, Java/C++ □ Type: NewSQL, New Database □ Main Point: In-memory Database, Java Stored Procedure, VoltDB implements the design of the academic H-Store project □ Protocol: SQL □ Transaction: Yes □ Data Storage: Memory □ Features □ in-memory relational database □ Durability thru replication, snapshots, logging □ Transparent partitioning □ ACID-level consistency □ Synchronous multi-master replication □ Database Replication http://voltdb.com/products-services/products, http://www.slideshare.net/chris.e.richardson/polygot-persistenceforjavadevs-jfokus2012reorgpptx 28
  • 30. VoltDB- Technical Overview  “OLTP Through the Looking Glass” http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf  VoltDB avoids the overhead of traditional databases K-safety for fault tolerance • no logging In memory operation for maximum throughput • no buffer management Partitions operate autonomously X X and single-threaded • no latching or locking X  Built to horizontally scale X 29 29
  • 31. VoltDB - Partitions (1/3)  1 partition per physical CPU core – Each physical server has multiple VoltDB partitions  Data - Two types of tables – Partitioned Single column serves as partitioning key Rows are spread across all VoltDB partitions by partition column X X Transactional data (high frequency of modification) – Replicated All rows exist within all VoltDB partitions Relatively static data (low frequency of modification)  Code - Two types of work – both ACID – Single-Partition X All insert/update/delete operations within single partition X X Majority of transactional workload – Multi-Partition CRUD against partitioned tables across multiple partitions Insert/update/delete on replicated tables 30
  • 32. VoltDB - Partitions (2/3)  Single-partition vs. Multi-partition select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition Partition 1 Partition 2 Partition 3 1 101 2 2 201 1 3 201 1 table orders : customer_id (partition key) 1 101 3 5 501 3 6 601 1 (partitioned) order_id 4 401 2 5 502 2 6 601 2 product_id 1 knife 1 knife 1 knife table products : product_id 2 spoon 2 spoon 2 spoon (replicated) product_name 3 fork 3 fork 3 fork 31
  • 33. VoltDB - Partitions (3/3)  Looking inside a VoltDB partition… – Each partition contains data and an execution engine. – The execution engine contains a queue for transaction requests. Work – Requests are executed sequentially (single threaded). Queue execution engine Table Data Index Data - Complete copy of all replicated tables - Portion of rows (about 1/partitions) of all partitioned tables 32
  • 34. VoltDB - Compiling Schema Stored Procedures  The database is constructed from CREATE TABLE HELLOWORLD ( import org.voltdb. * ; import org.voltdb. * ; HELLO CHAR(15), @ProcInfo( org.voltdb. * ; import @ProcInfo( WORLD CHAR(15), partitionInfo = "HELLOWORLD.DIA – The schema (DDL) DIALECT CHAR(15), partitionInfo true "HE singlePartition = = @ProcInfo( partitionInfo = "HELLOWORLD.DIA )singlePartition = t PRIMARY KEY (DIALECT) singlePartition = true ); ) public class Insert extends VoltPr – The work load (Java stored procedures) public final SQLStmt public final SQLStmt sql = public class Insert extends VoltPr new SQLStmt("INSERT INTO HELLO public VoltTable[] sql = public final SQLStmt run new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel – The Project (users, groups, partitioning) public VoltTable[] run( String hel  VoltCompiler creates application catalog Project.xml – Copy to servers along with 1 .jar and <?xml version="1.0"?> <project> 1 .so <database name='data <schema path='ddl. <partition table=‘ – Start servers </database> </project> 33
  • 35. VoltDB - Transactions  All access to VoltDB is via Java stored procedures (Java + SQL)  A single invocation of a stored procedure is a transaction (committed on success) SQL  Limits round trips between DBMS and application  High performance client applications communicate asynchronously with VoltDB 34
  • 36. VoltDB - Clusters/Durability  Scalability – Increase RAM in servers to add capacity – Add servers to increase performance / capacity – Consistently measuring 90% of single-node performance increase per additional node  High availability – K-safety for redundancy  Snapshots – Scheduled, continuous, on demand  Spooling to data warehouse  Disaster Recovery/WAN replication (Future) – Asynchronous replication 35
  • 37. Google Spanner □ Google, 2012, Paper, C++ □ Type: NewSQL, New Database □ Main Point: Google's scalable, multi-version, globally-distributed, and synchronously-replicated database □ Distributed multiversion database  General-purpose transactions (ACID)  SQL query language  Schematized tables  Semi-relational data model □ Running in production  Storage for Google’s ad data  Replaced a sharded MySQL database http://research.google.com/archive/spanner.html 36
  • 38. Google Spanner Overview □Feature: Lock-free distributed read transactions □Property: External consistency of distributed transactions □First system at global scale □Implementation: Integration of concurrency control, replication, and 2PC □Correctness and performance □Enabling technology: TrueTime □Interval-based global time http://research.google.com/archive/spanner.html 37
  • 39. Design Goals for Spanner http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf 38
  • 40. MySQL Cluster – NDB Architecture http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-overview.html 39
  • 41. Schooner MySQL Active Cluster http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-overview.html 40
  • 44. Database 업계의 3가지 Trends □ NoSQL 데이터베이스:  분산 아키텍처의 확장성 등의 요구 사항을 충족하며, 스키마 없는 데이터 관리 요구 사항에 부합하도록 설계됨. □ NewSQL 데이터베이스:  분산 아키텍처의 확장성 등의 요구 사항을 충족하거나 혹은 수평 확장을 필요로하지 않지만 성능을 개선은 되도록 설계됨. □ Data Grid/Cache 제품:  응용 프로그램 및 데이터베이스 성능을 높이기 위해 메모리에 데이터를 저장하도록 설계됨. 43
  • 45. 결론 □ 데이터 저장을 위한 많은 솔루션이 존재 □ Oracle, MySQL만 있다는 생각은 버려야 함 □ 먼저 시스템의 데이터 속성과 요구사항을 파악(CAP, ACID/BASE) □ 한 시스템에 여러 솔루션을 적용  소규모/복잡한 관계 데이터: RDBMS  대규모 실시간 처리 데이터: NoSQL, NewSQL  대규모 저장용 데이터: Hadoop 등 □ 적절한 솔루션 선택 □ 반드시 운영 중 발생할 수 있는 이슈에 대해 검증 후 도입 필요 □ 대부분의 NewSQL 솔루션은 베타 상태(섣부른 선택은 독이 될 수 있음) □ 솔루션의 프로그램 코드 수준으로 검증 필요 □ NewSQL 솔루션에 대한 안정성 확보 □ 솔루션 자체의 안정성은 검증이 필요하며 현재의 DBMS 수준의 안정성은 지원하 지 않음 □ 반드시 안정적인 데이터 저장 방안 확보 후 적용 필요 □ 운영 및 개발 경험을 가진 개발자 확보 어려움 □ 요구사항에 부합되는 NewSQL 선정 필요 □ 처음부터 중요 시스템에 적용하기 보다는 시범 적용 필요 □ 선정된 솔루션 검증, 기술력 내재화 44
  • 47. Appendix. 46
  • 48. Early – 2000s □All the big players were heavyweight and expensive.  Oracle, DB2, Sybase, SQL Server, etc. □Open-source databases were missing important features.  Postgres, mSQL, and MySQL. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 47
  • 49. Early – 2000s : eBay Architecture http://highscalability.com/ebay-architecture 48
  • 50. Early – 2000s : eBay Architecture  Push functionality to application:  Joins  Referential integrity  Sorting done  No distributed transactions http://highscalability.com/ebay-architecture 49
  • 51. Mid– 2000s □MySQL + InnoDB is widely adopted by new web companies:  Supported transactions, replication, recovery.  Still must use custom middleware to scale out across multiple machines.  Memcache for caching queries. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 50
  • 52. Mid – 2000s : Facebook Architecture http://www.techthebest.com/2011/11/29/technology-used-in-facebook/ 51
  • 53. Mid – 2000s : Facebook Architecture  Scale out using custom middleware.  Store ~75% of database in Memcache.  No distributed transactions. http://www.techthebest.com/2011/11/29/technology-used-in-facebook/ 52
  • 54. Late – 2000s □MySQL + InnoDB is widely adopted by new web companies:  Supported transactions, replication, recovery.  Still must use custom middleware to scale out across multiple machines.  Memcache for caching queries. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 53
  • 55. Late – 2000s : MongoDB Architecture http://sett.ociweb.com/sett/settAug2011.html 54
  • 56. Late – 2000s : MongoDB Architecture  Easy to use.  Becoming more like a DBMS over time.  No transactions. http://sett.ociweb.com/sett/settAug2011.html 55
  • 57. Early – 2010s □New DBMSs that can scale across multiple machines natively and provide ACID guarantees.  MySQL Middleware  Brand New Architectures http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 56
  • 59. Database SPRAIN □“An injury to ligaments... caused by being stretched beyond normal capacity” □Six key drivers for NoSQL/NewSQL/DDG adoption  Scalability  Performance  Relaxed consistency  Agility  Intricacy  Necessity 58
  • 60. Database SPRAIN - Scalability □Associated sub-driver: Hardware economics  Scale-out across clusters of commodity servers □Example project/service/vendor  BigTable HBase Riak MongoDB Couchbase, Hadoop  Amazon RDS, Xeround, SQL Azure, NimbusDB  Data grid/cache □Associated use case:  Large-scale distributed data storage  Analysis of continuously updated data  Multi-tenant PaaS data layer 59
  • 61. Database SPRAIN - Scalability □User: StumbleUpon □Problem:  Scaling problems with recommendation engine on MySQL □Solution: HBase  Started using Apache HBase to provide real-time analytics on Su.pr  MySQL lacked the performance headroom and scale  Multiple benefits including avoiding declaring schema  Enables the data to be used for multiple applications and use cases 60
  • 62. Database SPRAIN - Performance □Associated sub-driver: MySQL limitations  Inability to perform consistently at scale □Example project/service/vendor  Hypertable Couchbase Membrain MongoDB Redis  Data grid/cache  VoltDB, Clustrix □Associated use case:  Real time data processing of mixed read/write workloads  Data caching  Large-scale data ingestion 61
  • 63. Database SPRAIN - Performance □User: AOL Advertising □Problem:  Real-time data processing to support targeted advertising □Solution: Membase Server  Segmentation analysis runs in CDH, results passed into Membase  Make use of its sub-millisecond data delivery  More time for analysis as part of a 40ms targeted and response time  Also real time log and event management 62
  • 64. Database SPRAIN – Relaxed Consistency □Associated sub-driver: CAP theorem  The need to relax consistency in order to maintain availability □Example project/service/vendor:  Dynamo, Voldemort, Cassandra  Amazon SimpleDB □Associated use case:  Multi-data center replication  Service availability  Non-transactional data off-load 63
  • 65. Database SPRAIN – Relaxed Consistency □User: Wordnik □Problem:  MySQL too consistent –blocked access to data during inserts and created numerous temp files to stay consistent. □Solution: MongoDB  Single word definition contains multiple data items from various sources  MongoDB stores data as a complete document  Reduced the complexity of data storage 64
  • 66. Database SPRAIN – Agility □ Associated sub-driver: Polyglot persistence  Choose most appropriate storage technology for app in development □Example project/service/vendor  MongoDB, CouchDB, Cassandra  Google App Engine, SimpleDB, SQL Azure □Associated use case:  Mobile/remote device synchronization  Agile development  Data caching 65
  • 67. Database SPRAIN – Agility □ User: Dimagi BHOMA (Better Health Outcomes through Mentoring and Assessments) project □Problem:  Deliver patient information to clinics despite a lack of reliable Internet connections □Solution: Apache CouchDB  Replicates data from regional to national database  When Internet connection, and power, is available  Upload patient data from cell phones to local clinic 66
  • 68. Database SPRAIN – Intricacy □ Associated sub-driver: Big data, total data  Rising data volume, variety and velocity □Example project/service/vendor  Neo4j GraphDB, InfiniteGraph  Apache Cassandra, Hadoop,  VoltDB, Clustrix □Associated use case:  Social networking applications  Geo-locational applications  Configuration management database 67
  • 69. Database SPRAIN – Intricacy □ User: Evident Software □Problem:  Mapping infrastructure dependencies for application performance management □Solution: Neo4j  Apache Cassandra stores performance data  Neo4j used to map the correlations between different elements  Enables users to follow relationships between resources while investigating issues 68
  • 70. Database SPRAIN – Necessity □ Associated sub-driver: Open source  The failure of existing suppliers to address the performance, scalability and flexibility requirements of large-scale data processing □ Example project/service/vendor  BigTable, Dynamo, MapReduce, Memcached  Hadoop HBase, Hypertable, Cassandra, Membase  Voldemort, Riak, BigCouch  MongoDB, Redis, CouchDB, Neo4J □Associated use case:  All of the above 69
  • 71. Database SPRAIN – Necessity □BigTable: Google □Dynamo: Amazon □Cassandra: Facebook □HBase: Powerset □Voldemort: LinkedIn □Hypertable: Zvents □Neo4j: Windh Technologies  Yahoo: Apache Hadoop and Apache HBase  Digg: Apache Cassandra  Twitter: Apache Cassandra, Apache Hadoop and FlockDB 70