SlideShare a Scribd company logo
Fit For Purpose:
The New Database Revolution




     Mark Madsen & Robin Bloor
Introduction
Significant and revolutionary changes are taking place
in database technology

In order to investigate and analyze these changes and
where they may lead, The Bloor Group has teamed up
with Third Nature to launch an Open Research
project.

This is the first webinar in a series of webinars and
research activities that will comprise the project

All research will be made available through our web
site: Databaserevolution.com
Sponsors of This Research
General Webinar Structure
What & why

History of Database Part 1: How we got to the RDBMS

History of Database Part 2: Relational and Post- relational

Food For Thought: Issues, Problems, Assumptions,
Challenges

Current Conclusions: Insofar as we have any
Change? Why?
Increased data volumes

Significant hardware changes

Database product innovation

New workloads, different data structures

Established database concepts are being challenged

Market Forces can drive change
Data Volumes: Moore’s Law Cubed
Moore’s Law suggests that CPU power
increases 10-fold every 6 years (and other
technologies have stayed in step to some
degree)
Large database volumes have grown 1000-
fold every 6 years:
  In 1992, measured in megabytes
  In 1998 measured in gigabytes
  In 2004 measured in terabytes
  In 2010 measured in petabytes

Exabytes by 2016?
Hardware Changes
Moore’s Law now proceeds by adding cores
rather than by increasing clock speed.

Computer grids using commodity servers are
now relatively inexpensive

Parallelism is now on the rise and will eventually
become the normal mode of processing

Memory is about 1 million times faster than
disk and random reads have become very
expensive in respect of latency

SSD are augmenting and may eventually replace
spinning disk
Majority of Data becomes Historical Data over
       time or even all historic when no longer active

Data
          Application Performance
                                    10%



                                                              100%
         Active
          70%
                                    90%
         Static
          30% Cost $$$
                  and PAIN

Transactional Data                                        Time

                                          Image courtesy: RainStror
Market Forces
A new set of products appear

They include some fundamental innovations

A few are sufficiently popular to last

Fashion and marketing drive greater adoption

Products defects begin to be addressed

They eventually challenge the dominant products
Section 1:
         History Part 1
 Pre-relational and Relational
What we had in prior technology regimes

Where we came from

What we traded away and why
The Dawn of Database
Schema defines logical structure of data
   The schema enables extensive reuse
   Logical structure vs Physical structure

ACID properties
   Atomicity – transactions must be
   atomic
   Consistency – a transaction ensures
   consistency
   Isolation – a transaction runs in
   isolation
   Durability – a completed transaction
   causes permanent change to data
Database Performance Bottlenecks
  CPU saturation

  Memory saturation

  Disk I/O channel saturation

  Locking

  Network saturation

  Parallelism – inefficient load balancing
The Joys of SQL?
SQL is a declarative query language
targeted at data organized in two-
dimensional tables.
It enables set operations on those
tables via: Select, Project and Join
operations which can be qualified
(Order By, etc.)
It imposes some limitations on the
logical model of data.
It can create a barrier between the user
and the data....
The Ordering Of Data
“A data set is an unordered collection of
unique, non-duplicated items.”

Data is naturally ordered by time if by
nothing else.
   Events are ordered by time.
   Changes to entities are ordered by
   time

Having an inherent physical order to data
can save many processing cycles in some
areas of application

This is particularly the case for time
series applications.
The RDBMS Optimizer
The database can know how to access data better and
faster than any programmer…
   It wasn’t true
   It became true
   It isn’t always true

It only optimizes for persistent data
Section 2:
       History Part 2
Relational and Post-relational
Where we are today: oldsql, newsql and nosql

The finalizing of the distributed web architecture

Rediscovery of the past, when we had purpose-built data stores
of different types, with a twist.

Revisiting of old arguments

Challenging old assumptions
Database Product Innovation
Column Stores and Query-biased Workloads
      Column store databases are still RDBMSs

      Most SQL queries do not require all columns of a table
         So partitioning data by columns (vertically) will usually
         be better than partitioning by rows (horizontally)
         And data compression can be more efficient

      Column store databases scale up [somewhat] better
      than traditional RDBMSs depending on workload,
      queries, etc.

      Column store <> column family
New Lamps For Old
   Google, Yahoo!, Facebook and others had data management
   problems that established products did not cater for: Big Data,
   unusual data structures, new workloads

   They had money to invest and some smart engineers

   They built their own solutions: Big Table, MapReduce,
   Cassandra, etc.

   In doing so, they provoked a database revolution



In others words, the internet happened and some people noticed.
A random selection of databases
Sybase IQ, ASE             EnterpriseDB   Algebraix
Teradata, Aster Data       LucidDB        Intersystems Caché
Oracle, RAC                Vectorwise     Streambase
Microsoft SQLServer, PDW   MonetDB        SQLStream
IBM DB2s, Netezza          Exasol         Coral8
Paraccel                   Illuminate     Ingres
Kognitio                   Vertica        Postgres
EMC/Greenplum              InfiniDB       Cassandra
Oracle Exadata             1010 Data      CouchDB
SAP HANA                   SAND           Mongo
Infobright                 Endeca         Hbase
MySQL                      Xtreme Data    Redis
MarkLogic                  IMS            RainStor
Tokyo Cabinet              Hive           Scalaris
                  And a few hundred more…
Section 3: Database Discussion Topics
The core post-relational changes
in assumptions.

Key aspects of the code-
database mismatch

Reclassifying pre-relational as
NoSQL

Complex data, emergent
structure, types and schemas

Cloud and databases, uhoh?
Changing Assumptions
One single scalable piece of reliable hardware

You really need a schema all the time

A handful of discrete types are all anybody will ever need, and
when they need more they can code UDTs and UDFs in C++

SQL is the optimal way to write and retrieve data

ACID always applies

Data integrity is a key component of a database
No SQL, New Concepts
Maybe SQL is an unacceptable constraint

Maybe SQL is unnecessary for some fit-for-purpose databases,
or perhaps just unimportant

Maybe the impedance mismatch can be avoided

Maybe a formal schema is a constraint

Maybe ACID properties can be compromised
The “Impedance Mismatch”
The RDBMS stores data organized
according to table structures

The OO programmer manipulates data
organized according to complex object
structures, which may have specific
methods associated with them.

The data does not simply map to the
structure it has within the database

Consequently a mapping activity is
necessary to get and put data

Basically: hierarchies, types, result sets,
crappy APIs, language bindings, tools
NoSQL Directions: Technology Types
  Some NoSQL DBs do not attempt to provide all ACID properties.
  (Atomicity, Consistency, Isolation, Durability)

  Some NoSQL DBs deploy a distributed scale-out architecture with
  data redundancy.

  XML DBMS using XQuery are NoSQL DBs

  Some documents stores are NoSQL DBs (OrientDB, Terrastore,
  etc.)

  Object databases are NoSQL DBs (Gemstone, Objectivity,
  ObjectStore, etc.)

  Key value stores = schema-less stores (Cassandra, MongoDB,
  Berkeley DB, etc.)

  Graph DBMS (DEX, OrientDB, etc.) are NoSQL DBs

  Large data pools (BigTable, Hbase, Mnesia, etc.) are NoSQL DBs
The Cloud, uh-oh
Negative implications for shared-everything databases
that have scalability needs
There are architectural implications and possible
incompatibilities for shared-nothing databases too
Not at scale and at scale (concurrency, ingest volumes
and frequencies, etc.) are different
How does the database permit dynamic provisioning,
elasticity (+/-), etc?
The new database problems for IT
 …are probably like old problems for people who went
 through the Unix client-server era.
 Best of breed, no standards for anything, “polyglot
 persistence” = silos on steroids, data integration
 challenges, shifting data movement architectures
Recognize Tradeoffs
Read consistency vs programmatic correction
Schema vs a program to interpret each data structure
Standard access interface vs an API for each type of store
Data integrity enforcement vs programmatic control
Query performance for arbitrary queries vs planned access paths
Space efficiency vs simplicity / latency
Network transfer performance vs simplicity / latency
For the primary goals of
   Horizontal scale
   Looser coupling
   Flexibility for developers building and changing applications
Information Management Through Human History


         New technology development
                    creates
             New methods to cope
                    creates
     New information scale and availability
                   creates…
Big Data
Big data?




      Unstructured data isn’t 
      really unstructured.
      The problem is that this 
      data is unmodeled.
The holy grail of databases under current market hype




The other problem is that 
we’re talking mostly about 
computation over data 
when we talk about “big 
data” and analytics, 
another potential 
mismatch.
Conclusion
Wherein all is revealed, or ignorance exposed

Best of breed is back baby

Workload types and characteristics

The importance of understanding workload in order to select
technology

Pragmatism, babies and bathwater
Solving the Problem Depends on the Diagnosis
Types of workloads
Write‐biased:                Read‐biased:
  ▪ OLTP                       ▪ Query
  ▪ OLTP, batch                ▪ Query, simple retrieval
  ▪ OLTP, lite                 ▪ Query, complex
  ▪ Object persistence         ▪ Query‐hierarchical / 
  ▪ Data ingest, batch           object / network
  ▪ Data ingest, real‐time     ▪ Analytic



                        Mixed?
The real challenge is that few systems are all one 
workload.

Who said you have to write everything to one 
place, and read everything from the same place?
SOA offers a partial way out, and is how many 
apps work.
You must understand your 
workload ‐ throughput and 
response time requirements 
aren’t enough.
  ▪ 100 simple queries accessing 
    month‐to‐date data
  ▪ 90 simple queries accessing 
    month‐to‐date data and 10 
    complex queries using two 
    years of history
  ▪ Hazard calculation for the 
    entire customer master
  ▪ Performance problems are 
    rarely due to a single factor. 
Six Key Query Workload Elements
These characteristics help determine suitability of 
technologies to improve query performance.
  1. Retrieval – how much data comes back?
  2. Selectivity – how much data is filtered?
  3. Repetition – how often for the same query?
  4. Concurrency – how many queries at once?
  5. Data volume – how much data is being queried?
  6. Query complexity – how many joins, 
     aggregations, columns, filters, subselects, etc.?
  7. Computational complexity – how much 
     computation is performed over the data?
Characteristics of BI workloads

Workload             Selectivity Retrieval     Repetition       Complexity
Reporting / BI       Moderate     Low          Moderate         Moderate
Dashboards /         Moderate     Low          High             Low
scorecards
Ad‐hoc query and  Low to          Moderate Low                  Low to 
analysis          high            to low                        moderate
Analytics (batch)    Low          High         Low to High Low*
Analytics (inline)   High         Low          High             Low*
Operational /        High         Low          High             Low
embedded BI

* Low for retrieving the data, high if doing analytics in SQL
Choosing Hardware Architectures
               Compute and data sizes are key requirements
               PF




                                                          MR and related
Computations
               TF




                                         Shared nothing
               GF




                              Shared everything
                     PC       or shared disk
               MF




                    <10s GB    100s GB    1s TB   10s TB     100sTB        PB
                                         Data volume
                                                                           40
Choosing Hardware Architectures
Today’s reality, and true for a while in most businesses.
               PF
Computations
               TF
               GF




                       The bulk of the
                     market resides here!
               MF




                    <10s GB   100s GB    1s TB   10s TB   100sTB   PB
                                        Data volume
                                                                   41
Choosing Hardware Architectures
Today’s reality, and true for a while in most businesses.
               PF




                              …but analytics
Computations




                              pushes many things
               TF




                              into the MPP zone.
               GF




                       The bulk of the
                     market resides here!
               MF




                    <10s GB   100s GB    1s TB   10s TB   100sTB   PB
                                        Data volume
                                                                   42
Evaluating DB Technology

1. Define the key problems: 
   response time, 
   throughput, scalability?
2. Examine the workloads 
   and their requirements
3. Match those to suitable 
   technologies
4. Look for vendors using 
   those technologies
5. Evaluate on real data 
   with real workloads
                                 Slide 43
  Copyright Third Nature, Inc.
Thank You
For Your
Attention
Back-Up Slides
NoSQL Directions
Some NDBMS do not attempt to provide all ACID properties.
(Atomicity, Consistency, Isolation, Durability)

Some NDBMS deploy a distributed scale-out architecture with data
redundancy.

XML DBMS using XQuery are NDBMS.

Some documents stores are NDBMS (OrientDB, Terrastore, etc.)

Object databases are NDBMS (Gemstone, Objectivity, ObjectStore,
etc.)

Key value stores = schema-less stores (Cassandra, MongoDB,
Berkeley DB, etc.)

Graph DBMS (DEX, OrientDB, etc.) are NDMBS

Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
The SQL Barrier
SQL has:
  DDL (for data definition)
  DML (for Select, Project and Join)
  But it has no MML (Math) or TML
  (Time)

Usually result sets are brought to
the client for further analytical
manipulation, but this creates
problems

Alternatively doing all analytical
manipulation in the database
creates problems
Discussion Topics
If not covered in history through today:
    the core post-relational change in assumptions
    nosql core drivers, persistence in cloud, finalizing of web
    arch, SOAizing
    a NoSQL classification list (types and projects/products)
    key aspects of the OR mismatch

complex data and emergent structure

database technology types

a giant list of databases

cloud and databases, uhoh?

More Related Content

What's hot (20)

PDF
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
PDF
Big Data: an introduction
Bart Vandewoestyne
 
PPTX
Big Data Concepts
Ahmed Salman
 
PPTX
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
PDF
Introduction to Big Data
Haluan Irsad
 
PDF
Big Data using NoSQL Technologies
Amit Singh
 
PPTX
Big data ppt
Shweta Sahu
 
PDF
Big Data Final Presentation
17aroumougamh
 
PDF
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
PPTX
Big Data Use Cases
boorad
 
PPTX
Introduction to BIg Data and Hadoop
Amir Shaikh
 
PDF
Introduction to Bigdata and HADOOP
vinoth kumar
 
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
PDF
The Future Of Big Data
Matthew Dennis
 
PPTX
Big data concepts
Serkan Özal
 
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
PPTX
Whatisbigdataandwhylearnhadoop
Edureka!
 
PPTX
Hadoop and BigData - July 2016
Ranjith Sekar
 
PDF
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
PDF
BIG DATA
Dr. Shashank Shetty
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Big Data: an introduction
Bart Vandewoestyne
 
Big Data Concepts
Ahmed Salman
 
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Introduction to Big Data
Haluan Irsad
 
Big Data using NoSQL Technologies
Amit Singh
 
Big data ppt
Shweta Sahu
 
Big Data Final Presentation
17aroumougamh
 
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
Big Data Use Cases
boorad
 
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Introduction to Bigdata and HADOOP
vinoth kumar
 
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
The Future Of Big Data
Matthew Dennis
 
Big data concepts
Serkan Özal
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Whatisbigdataandwhylearnhadoop
Edureka!
 
Hadoop and BigData - July 2016
Ranjith Sekar
 
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 

Viewers also liked (8)

PDF
Down to Business: Taking Action Quickly with Linked Data Services
Inside Analysis
 
PDF
The Cloud Imperative – What, Why, When and How
Inside Analysis
 
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
Inside Analysis
 
PDF
How Data Visualization Enhances the News
Inside Analysis
 
PDF
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Inside Analysis
 
PDF
No Time-Outs: How to Empower Round-the-Clock Analytics
Inside Analysis
 
PDF
Continuous Intelligence: Staying Ahead with Streaming Analytics
Inside Analysis
 
PDF
Enabling Flexible Governance for All Data Sources
Inside Analysis
 
Down to Business: Taking Action Quickly with Linked Data Services
Inside Analysis
 
The Cloud Imperative – What, Why, When and How
Inside Analysis
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Inside Analysis
 
How Data Visualization Enhances the News
Inside Analysis
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Inside Analysis
 
No Time-Outs: How to Empower Round-the-Clock Analytics
Inside Analysis
 
Continuous Intelligence: Staying Ahead with Streaming Analytics
Inside Analysis
 
Enabling Flexible Governance for All Data Sources
Inside Analysis
 
Ad

Similar to Database Revolution - Exploratory Webcast (20)

PPTX
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
PDF
NoSQL – Back to the Future or Yet Another DB Feature?
Martin Scholl
 
PDF
Database-Technology_introduction and feature.pdf
ajajkhan16
 
PPTX
NOSQL DATAbASES INTRDUCTION powerpoint presentaion
Abcd463572
 
PPTX
Relational databases store data in tables
HELLOWorld889594
 
PPTX
History and Introduction to NoSQL over Traditional Rdbms
vinayh902
 
PPTX
Revision
David Sherlock
 
PDF
Wolfgang Lehner Technische Universitat Dresden
InfinIT - Innovationsnetværket for it
 
PDF
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 
PDF
Database Systems - A Historical Perspective
Karoly K
 
PDF
What You Need To Know About The Top Database Trends
Dell World
 
PPTX
Overview of dbms
Dabbal Singh Mahara
 
PDF
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
 
PDF
The Coming Database Revolution
DATAVERSITY
 
PPT
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
PDF
NoSQL-Database-Concepts
Bhaskar Gunda
 
PPTX
Big Data (NJ SQL Server User Group)
Don Demcsak
 
PPTX
RDBMS to NoSQL. An overview.
Girish. N. Raghavan
 
KEY
What ya gonna do?
CQD
 
PPTX
Big iron 2 (published)
Ben Stopford
 
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
NoSQL – Back to the Future or Yet Another DB Feature?
Martin Scholl
 
Database-Technology_introduction and feature.pdf
ajajkhan16
 
NOSQL DATAbASES INTRDUCTION powerpoint presentaion
Abcd463572
 
Relational databases store data in tables
HELLOWorld889594
 
History and Introduction to NoSQL over Traditional Rdbms
vinayh902
 
Revision
David Sherlock
 
Wolfgang Lehner Technische Universitat Dresden
InfinIT - Innovationsnetværket for it
 
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 
Database Systems - A Historical Perspective
Karoly K
 
What You Need To Know About The Top Database Trends
Dell World
 
Overview of dbms
Dabbal Singh Mahara
 
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
 
The Coming Database Revolution
DATAVERSITY
 
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
NoSQL-Database-Concepts
Bhaskar Gunda
 
Big Data (NJ SQL Server User Group)
Don Demcsak
 
RDBMS to NoSQL. An overview.
Girish. N. Raghavan
 
What ya gonna do?
CQD
 
Big iron 2 (published)
Ben Stopford
 
Ad

More from Inside Analysis (20)

PDF
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
PDF
Agile, Automated, Aware: How to Model for Success
Inside Analysis
 
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
PDF
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
PDF
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
PDF
Introducing: A Complete Algebra of Data
Inside Analysis
 
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
Inside Analysis
 
PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Inside Analysis
 
PDF
All Together Now: Connected Analytics for the Internet of Everything
Inside Analysis
 
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
PDF
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
PDF
Structurally Sound: How to Tame Your Architecture
Inside Analysis
 
PDF
SQL In Hadoop: Big Data Innovation Without the Risk
Inside Analysis
 
PDF
The Perfect Fit: Scalable Graph for Big Data
Inside Analysis
 
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
PDF
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 
PDF
DisrupTech - Dave Duggal
Inside Analysis
 
PPTX
Modus Operandi
Inside Analysis
 
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
Agile, Automated, Aware: How to Model for Success
Inside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Introducing: A Complete Algebra of Data
Inside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
Inside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Inside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
Inside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
Structurally Sound: How to Tame Your Architecture
Inside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
Inside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
Inside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 
DisrupTech - Dave Duggal
Inside Analysis
 
Modus Operandi
Inside Analysis
 

Recently uploaded (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PPTX
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Practical Applications of AI in Local Government
OnBoard
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 

Database Revolution - Exploratory Webcast

  • 1. Fit For Purpose: The New Database Revolution Mark Madsen & Robin Bloor
  • 2. Introduction Significant and revolutionary changes are taking place in database technology In order to investigate and analyze these changes and where they may lead, The Bloor Group has teamed up with Third Nature to launch an Open Research project. This is the first webinar in a series of webinars and research activities that will comprise the project All research will be made available through our web site: Databaserevolution.com
  • 3. Sponsors of This Research
  • 4. General Webinar Structure What & why History of Database Part 1: How we got to the RDBMS History of Database Part 2: Relational and Post- relational Food For Thought: Issues, Problems, Assumptions, Challenges Current Conclusions: Insofar as we have any
  • 5. Change? Why? Increased data volumes Significant hardware changes Database product innovation New workloads, different data structures Established database concepts are being challenged Market Forces can drive change
  • 6. Data Volumes: Moore’s Law Cubed Moore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree) Large database volumes have grown 1000- fold every 6 years: In 1992, measured in megabytes In 1998 measured in gigabytes In 2004 measured in terabytes In 2010 measured in petabytes Exabytes by 2016?
  • 7. Hardware Changes Moore’s Law now proceeds by adding cores rather than by increasing clock speed. Computer grids using commodity servers are now relatively inexpensive Parallelism is now on the rise and will eventually become the normal mode of processing Memory is about 1 million times faster than disk and random reads have become very expensive in respect of latency SSD are augmenting and may eventually replace spinning disk
  • 8. Majority of Data becomes Historical Data over time or even all historic when no longer active Data Application Performance 10% 100% Active 70% 90% Static 30% Cost $$$ and PAIN Transactional Data Time Image courtesy: RainStror
  • 9. Market Forces A new set of products appear They include some fundamental innovations A few are sufficiently popular to last Fashion and marketing drive greater adoption Products defects begin to be addressed They eventually challenge the dominant products
  • 10. Section 1: History Part 1 Pre-relational and Relational What we had in prior technology regimes Where we came from What we traded away and why
  • 11. The Dawn of Database Schema defines logical structure of data The schema enables extensive reuse Logical structure vs Physical structure ACID properties Atomicity – transactions must be atomic Consistency – a transaction ensures consistency Isolation – a transaction runs in isolation Durability – a completed transaction causes permanent change to data
  • 12. Database Performance Bottlenecks CPU saturation Memory saturation Disk I/O channel saturation Locking Network saturation Parallelism – inefficient load balancing
  • 13. The Joys of SQL? SQL is a declarative query language targeted at data organized in two- dimensional tables. It enables set operations on those tables via: Select, Project and Join operations which can be qualified (Order By, etc.) It imposes some limitations on the logical model of data. It can create a barrier between the user and the data....
  • 14. The Ordering Of Data “A data set is an unordered collection of unique, non-duplicated items.” Data is naturally ordered by time if by nothing else. Events are ordered by time. Changes to entities are ordered by time Having an inherent physical order to data can save many processing cycles in some areas of application This is particularly the case for time series applications.
  • 15. The RDBMS Optimizer The database can know how to access data better and faster than any programmer… It wasn’t true It became true It isn’t always true It only optimizes for persistent data
  • 16. Section 2: History Part 2 Relational and Post-relational Where we are today: oldsql, newsql and nosql The finalizing of the distributed web architecture Rediscovery of the past, when we had purpose-built data stores of different types, with a twist. Revisiting of old arguments Challenging old assumptions
  • 18. Column Stores and Query-biased Workloads Column store databases are still RDBMSs Most SQL queries do not require all columns of a table So partitioning data by columns (vertically) will usually be better than partitioning by rows (horizontally) And data compression can be more efficient Column store databases scale up [somewhat] better than traditional RDBMSs depending on workload, queries, etc. Column store <> column family
  • 19. New Lamps For Old Google, Yahoo!, Facebook and others had data management problems that established products did not cater for: Big Data, unusual data structures, new workloads They had money to invest and some smart engineers They built their own solutions: Big Table, MapReduce, Cassandra, etc. In doing so, they provoked a database revolution In others words, the internet happened and some people noticed.
  • 20. A random selection of databases Sybase IQ, ASE EnterpriseDB Algebraix Teradata, Aster Data LucidDB Intersystems Caché Oracle, RAC Vectorwise Streambase Microsoft SQLServer, PDW MonetDB SQLStream IBM DB2s, Netezza Exasol Coral8 Paraccel Illuminate Ingres Kognitio Vertica Postgres EMC/Greenplum InfiniDB Cassandra Oracle Exadata 1010 Data CouchDB SAP HANA SAND Mongo Infobright Endeca Hbase MySQL Xtreme Data Redis MarkLogic IMS RainStor Tokyo Cabinet Hive Scalaris And a few hundred more…
  • 21. Section 3: Database Discussion Topics The core post-relational changes in assumptions. Key aspects of the code- database mismatch Reclassifying pre-relational as NoSQL Complex data, emergent structure, types and schemas Cloud and databases, uhoh?
  • 22. Changing Assumptions One single scalable piece of reliable hardware You really need a schema all the time A handful of discrete types are all anybody will ever need, and when they need more they can code UDTs and UDFs in C++ SQL is the optimal way to write and retrieve data ACID always applies Data integrity is a key component of a database
  • 23. No SQL, New Concepts Maybe SQL is an unacceptable constraint Maybe SQL is unnecessary for some fit-for-purpose databases, or perhaps just unimportant Maybe the impedance mismatch can be avoided Maybe a formal schema is a constraint Maybe ACID properties can be compromised
  • 24. The “Impedance Mismatch” The RDBMS stores data organized according to table structures The OO programmer manipulates data organized according to complex object structures, which may have specific methods associated with them. The data does not simply map to the structure it has within the database Consequently a mapping activity is necessary to get and put data Basically: hierarchies, types, result sets, crappy APIs, language bindings, tools
  • 25. NoSQL Directions: Technology Types Some NoSQL DBs do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NoSQL DBs deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NoSQL DBs Some documents stores are NoSQL DBs (OrientDB, Terrastore, etc.) Object databases are NoSQL DBs (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NoSQL DBs Large data pools (BigTable, Hbase, Mnesia, etc.) are NoSQL DBs
  • 26. The Cloud, uh-oh Negative implications for shared-everything databases that have scalability needs There are architectural implications and possible incompatibilities for shared-nothing databases too Not at scale and at scale (concurrency, ingest volumes and frequencies, etc.) are different How does the database permit dynamic provisioning, elasticity (+/-), etc?
  • 27. The new database problems for IT …are probably like old problems for people who went through the Unix client-server era. Best of breed, no standards for anything, “polyglot persistence” = silos on steroids, data integration challenges, shifting data movement architectures
  • 28. Recognize Tradeoffs Read consistency vs programmatic correction Schema vs a program to interpret each data structure Standard access interface vs an API for each type of store Data integrity enforcement vs programmatic control Query performance for arbitrary queries vs planned access paths Space efficiency vs simplicity / latency Network transfer performance vs simplicity / latency For the primary goals of Horizontal scale Looser coupling Flexibility for developers building and changing applications
  • 29. Information Management Through Human History New technology development creates New methods to cope creates New information scale and availability creates…
  • 31. Big data? Unstructured data isn’t  really unstructured. The problem is that this  data is unmodeled.
  • 33. Conclusion Wherein all is revealed, or ignorance exposed Best of breed is back baby Workload types and characteristics The importance of understanding workload in order to select technology Pragmatism, babies and bathwater
  • 35. Types of workloads Write‐biased:  Read‐biased: ▪ OLTP ▪ Query ▪ OLTP, batch ▪ Query, simple retrieval ▪ OLTP, lite ▪ Query, complex ▪ Object persistence ▪ Query‐hierarchical /  ▪ Data ingest, batch object / network ▪ Data ingest, real‐time ▪ Analytic Mixed?
  • 37. You must understand your  workload ‐ throughput and  response time requirements  aren’t enough. ▪ 100 simple queries accessing  month‐to‐date data ▪ 90 simple queries accessing  month‐to‐date data and 10  complex queries using two  years of history ▪ Hazard calculation for the  entire customer master ▪ Performance problems are  rarely due to a single factor. 
  • 38. Six Key Query Workload Elements These characteristics help determine suitability of  technologies to improve query performance. 1. Retrieval – how much data comes back? 2. Selectivity – how much data is filtered? 3. Repetition – how often for the same query? 4. Concurrency – how many queries at once? 5. Data volume – how much data is being queried? 6. Query complexity – how many joins,  aggregations, columns, filters, subselects, etc.? 7. Computational complexity – how much  computation is performed over the data?
  • 39. Characteristics of BI workloads Workload Selectivity Retrieval Repetition Complexity Reporting / BI Moderate Low Moderate Moderate Dashboards /  Moderate Low High Low scorecards Ad‐hoc query and  Low to  Moderate Low Low to  analysis high to low moderate Analytics (batch) Low High Low to High Low* Analytics (inline) High Low High Low* Operational /  High Low High Low embedded BI * Low for retrieving the data, high if doing analytics in SQL
  • 40. Choosing Hardware Architectures Compute and data sizes are key requirements PF MR and related Computations TF Shared nothing GF Shared everything PC or shared disk MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 40
  • 41. Choosing Hardware Architectures Today’s reality, and true for a while in most businesses. PF Computations TF GF The bulk of the market resides here! MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 41
  • 42. Choosing Hardware Architectures Today’s reality, and true for a while in most businesses. PF …but analytics Computations pushes many things TF into the MPP zone. GF The bulk of the market resides here! MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 42
  • 43. Evaluating DB Technology 1. Define the key problems:  response time,  throughput, scalability? 2. Examine the workloads  and their requirements 3. Match those to suitable  technologies 4. Look for vendors using  those technologies 5. Evaluate on real data  with real workloads Slide 43 Copyright Third Nature, Inc.
  • 46. NoSQL Directions Some NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NDBMS deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NDBMS. Some documents stores are NDBMS (OrientDB, Terrastore, etc.) Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBS Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
  • 47. The SQL Barrier SQL has: DDL (for data definition) DML (for Select, Project and Join) But it has no MML (Math) or TML (Time) Usually result sets are brought to the client for further analytical manipulation, but this creates problems Alternatively doing all analytical manipulation in the database creates problems
  • 48. Discussion Topics If not covered in history through today: the core post-relational change in assumptions nosql core drivers, persistence in cloud, finalizing of web arch, SOAizing a NoSQL classification list (types and projects/products) key aspects of the OR mismatch complex data and emergent structure database technology types a giant list of databases cloud and databases, uhoh?