SlideShare a Scribd company logo
Building Apps with
Distributed In-Memory Computing
using Apache Geode
Nitin Lamba
@nlamba9
(incubating)
William Markito
@william_markito
Introduction (Nitin)
• WHAT? Overview & history
• WHY? Relevance & Differentiators
• HOW? Features & Basic Concepts
• SEE! Quick start
Hands-on (William)
• LEARN: Advanced Concepts - Persistence, f(x), PDX, …
• SHOW: Demos (Docker, PDX)
Resources
Q & A
Agenda
2
Introduction
Nitin
3
From GEM to GEODE…
4
A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
5
High-level Architecture
6
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policiesÎ
!
Locator
Server
Server
Server
Server
+""""
" 
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer
Distributed System
REST
!
* Experimental and waiting community feedback
• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
7
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-time response
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemfire > Geode
• M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems
…with both SCALE and SPEED, …
8
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users
… and impacting a LOT of people!
9
China Railway

Corporation
Indian
Railways
19%
17%
36%
of the world population
Built for PERFORMANCE…
10
Operationspersecond
0
200,000
400,000
600,000
800,000
YCSB Workloads
AReads
AUpdates
BReads
BUpdates
CReads
DInserts
DReads
FReads
FUpdates
Cassandra
Geode
…and horizontal, consistent SCALABILITY!
11
Horizontal scaling for reads, consistent latency and CPU
0
4.5
9
13.5
18
Speedup
0
1.25
2.5
3.75
5
Server	Hosts
2 4 6 8 10
speedup latency	(ms) CPU	%
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size
• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
Let’s talk about a few (basic) CONCEPTS…
13
• In-memory storage and
management for your data
• Configurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14
Region
Region
Region
Cache
JVM
• Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of where
or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL	
LOCAL_HEAP_LRU	
LOCAL_OVERFLOW	
LOCAL_PERSISTENT	
LOCAL_PERSISTENT_OVERFLOW	
PARTITION	
PARTITION_HEAP_LRU	
PARTITION_OVERFLOW	
PARTITION_PERSISTENT	
PARTITION_PERSISTENT_OVERFLOW	
PARTITION_PROXY	
PARTITION_PROXY_REDUNDANT	
PARTITION_REDUNDANT	
PARTITION_REDUNDANT_HEAP_LRU	
PARTITION_REDUNDANT_OVERFLOW	
PARTITION_REDUNDANT_PERSISTENT	
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW	
REPLICATE	
REPLICATE_HEAP_LRU	
REPLICATE_OVERFLOW	
REPLICATE_PERSISTENT	
REPLICATE_PERSISTENT_OVERFLOW	
REPLICATE_PROXY
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
• A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on
the servers
What is a CLIENT CACHE?
19
Application
GemFire Server
Region
Region
RegionClient Cache
• Clone & Build
•
• Start Services
• Create & Monitor Region
How to START? Easy as !!
20
git	clone	https://github.com/apache/incubator-geode	
cd	incubator-geode

./gradlew	build	-Dskip.tests=true
cd	gemfire-assembly/build/install/apache-geode		
./bin/gfsh		
gfsh>	start	locator	--name=locator		
gfsh>	start	server	--name=server
gfsh>	create	region	--name=myRegion	—type=REPLICATE	
gfsh>	start	[pulse	|	jconsole]
1
2
3
'
1 2 3
Hands On
William
21
• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
More (advanced) CONCEPTS…
22
Persistence - Shared Nothing
23
Server 3Server 2Server 1
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
26
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
27
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
28
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
29
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
30
Create
k1->v1
Create 

k2->v2
Modify

k1->v3
Create 

k4->v4
Modify
k1->v5
Create 

k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Persistence - Operational Logs: Compaction
31
Create
k1->v1
Create 

k2->v2
Modify

k1->v3
Create 

k4->v4
Modify
k1->v5
Create 

k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward
• Used for distributed concurrent
processing 

(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
32
Submit (f1)
f1 , f2 , … fn
Execute

Functions
Functions
33
Server Server
FunctionService.onRegion.withFilter.execute
ResultCollector.getResult
Server Distributed System
execute
Server
Server
6
1
result
execute
execute
result
result
2
5
3
4
3 4
Server
Partitioned Region
Data Store - X
Partitioned Region
Data Store - Y
Partitioned Region
Data Store - Z
Partitioned Region
Data Accessor
Partitioned Region
Data Accessor
filter = Keys X, Y
Client Region
• Register Interest
• Individual Keys OR RegEx for Keys
• Updates Local Copy
• Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
• Continuous Query
• Receive Notification when Query condition met on server
• Example:
• SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
Events & Notifications
34
• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
Listeners
35
High Availability
36
Fixed or Flexible schema?
37
id name age pet_id
or
{	
		id			:	1,	
		name	:	“Fred”,	
		age		:	42,	
		pet		:	{	
				name	:	“Barney”,	
				type	:	“dino”	
		}	
}
Portable Data eXchange (PDX)
38
C#, C++, Java, JSON
No IDL, no schemas, no hand-coding
Schema evolution (Forward and Backward Compatible)
* domain object classes not required
|												header												|							data							|	
|	pdx	|	length	|	dsid	|	typeid	|	fields	|	offsets	|
Efficient for queries
39
{	
		id			:	1,	
		name	:	“Fred”,	
		age		:	42,	
		pet		:	{	
				name	:	“Barney”,	
				type	:	“dino”	
		}	
}
SELECT	p.name	FROM	/Person	p	WHERE	p.pet.type	=	“dino”
single field
deserialization
But HOW to serialize data?
40
Benchmark: https://github.com/eishay/jvm-serializers
Schema Evolution
41
Member A Member B
Distributed Type Definitions
v2v1
Application #1
Application #2
v2 objects preserve data
from missing fields
v1 objects use default values to
fill in new fields
PDX provides forwards and backwards
compatibility, no code required
Demo
(Docker, PDX, …)
42
Code
• New features
• Bug fixes
• Writing tests
Documentation
• Wiki
• Web site
• User guide
How to CONTRIBUTE?
43
Community
• Join the mailing list
• Ask or answer
• Join our HipChat
• Become a speaker
• Finding bugs
• Testing an RC/Beta
Website
http://geode.incubator.apache.org/
JIRA
https://issues.apache.org/jira/browse/GEODE
Wiki
cwiki.apache.org/confluence/display/GEODE
GitHub
https://github.com/apache/incubator-geode
Mailing lists
mail-archives.apache.org/mod_mbox/incubator-geode-dev/
Where to BEGIN?
44
45
Thank you!
http://geode.incubator.apache.org
https://github.com/Pivotal-Open-Source-Hub

More Related Content

What's hot (20)

PPTX
Using Apache Geode: Lessons Learned at Southwest Airlines
VMware Tanzu
 
PPTX
An Introduction to Apache Geode (incubating)
Anthony Baker
 
PDF
Apache Geode - The First Six Months
Anthony Baker
 
PDF
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PivotalOpenSourceHub
 
POTX
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
PPTX
Visualizing Kafka Security
DataWorks Summit
 
PPTX
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
PPTX
How to Design for Database High Availability
EDB
 
PPTX
Hive 3 - a new horizon
Thejas Nair
 
PDF
Development of concurrent services using In-Memory Data Grids
jlorenzocima
 
PPTX
GemFire In Memory Data Grid
Dmitry Buzdin
 
PPTX
New life inside monolithic application
Taras Matyashovsky
 
PPTX
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
 
PPTX
Database as a Service - Tutorial @ICDE 2010
DBIS @ Ilmenau University of Technology
 
PPTX
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
PPTX
An Expert Guide to Migrating Legacy Databases to PostgreSQL
EDB
 
PDF
eBay Cloud CMS - QCon 2012 - http://yidb.org/
Xu Jiang
 
PPTX
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
PPTX
Apache geode
Yogesh BG
 
Using Apache Geode: Lessons Learned at Southwest Airlines
VMware Tanzu
 
An Introduction to Apache Geode (incubating)
Anthony Baker
 
Apache Geode - The First Six Months
Anthony Baker
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PivotalOpenSourceHub
 
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
Visualizing Kafka Security
DataWorks Summit
 
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
How to Design for Database High Availability
EDB
 
Hive 3 - a new horizon
Thejas Nair
 
Development of concurrent services using In-Memory Data Grids
jlorenzocima
 
GemFire In Memory Data Grid
Dmitry Buzdin
 
New life inside monolithic application
Taras Matyashovsky
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
 
Database as a Service - Tutorial @ICDE 2010
DBIS @ Ilmenau University of Technology
 
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
An Expert Guide to Migrating Legacy Databases to PostgreSQL
EDB
 
eBay Cloud CMS - QCon 2012 - http://yidb.org/
Xu Jiang
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Apache geode
Yogesh BG
 

Viewers also liked (19)

PDF
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
PPTX
Open Sourcing GemFire - Apache Geode
Apache Geode
 
PPTX
Introducing Apache Geode and Spring Data GemFire
John Blum
 
PDF
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
PPTX
Zettaset Elastic Big Data Security for Greenplum Database
PivotalOpenSourceHub
 
PDF
JBoss Community Introduction
jbugkorea
 
KEY
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
PPTX
Архитектура Apache Ignite .NET
Mikhail Shcherbakov
 
PDF
Building Wall St Risk Systems with Apache Geode
Andre Langevin
 
PDF
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Christian Tzolov
 
PDF
Infinispan from POC to Production
C2B2 Consulting
 
PDF
Infinispan Servers: Beyond peer-to-peer data grids
Galder Zamarreño
 
PDF
Hacking Infinispan: the new open source data grid meets NoSQL
Codemotion
 
KEY
Infinspan: In-memory data grid meets NoSQL
Manik Surtani
 
PDF
Redis adaptor for Apache Geode
Swapnil Bawaskar
 
PDF
Keeping Infinispan In Shape: Highly-Precise, Scalable Data Eviction
Galder Zamarreño
 
PDF
인메모리 클러스터링 아키텍처
Jaehong Cheon
 
ODP
Infinispan and Enterprise Data Grid
JBug Italy
 
PPTX
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
Open Sourcing GemFire - Apache Geode
Apache Geode
 
Introducing Apache Geode and Spring Data GemFire
John Blum
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
Zettaset Elastic Big Data Security for Greenplum Database
PivotalOpenSourceHub
 
JBoss Community Introduction
jbugkorea
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
Архитектура Apache Ignite .NET
Mikhail Shcherbakov
 
Building Wall St Risk Systems with Apache Geode
Andre Langevin
 
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Christian Tzolov
 
Infinispan from POC to Production
C2B2 Consulting
 
Infinispan Servers: Beyond peer-to-peer data grids
Galder Zamarreño
 
Hacking Infinispan: the new open source data grid meets NoSQL
Codemotion
 
Infinspan: In-memory data grid meets NoSQL
Manik Surtani
 
Redis adaptor for Apache Geode
Swapnil Bawaskar
 
Keeping Infinispan In Shape: Highly-Precise, Scalable Data Eviction
Galder Zamarreño
 
인메모리 클러스터링 아키텍처
Jaehong Cheon
 
Infinispan and Enterprise Data Grid
JBug Italy
 
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
 
Ad

Similar to Building Apps with Distributed In-Memory Computing Using Apache Geode (20)

PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
PPTX
Sum209
jmcAustin
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PPTX
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Travis Wright
 
PDF
Geode - Day 2
Swapnil Bawaskar
 
PPTX
MySQL Options in OpenStack
Tesora
 
PDF
Next-Gen DHCP
Andreas Taudte
 
PDF
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Josef Adersberger
 
PDF
Patterns and Pains of Migrating Legacy Applications to Kubernetes
QAware GmbH
 
PDF
OpenStack Days East -- MySQL Options in OpenStack
Matt Lord
 
PDF
Database failover from client perspective
Priit Piipuu
 
PPTX
Stream Processing @ Lyft
Jamie Grier
 
PDF
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
PDF
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
PDF
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Kai Wähner
 
PDF
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
Frank Munz
 
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
PPTX
data Artisans Product Announcement
Flink Forward
 
PDF
How to run a bank on Apache CloudStack
gjdevos
 
PPTX
G rpc talk with intel (3)
Intel
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Sum209
jmcAustin
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Travis Wright
 
Geode - Day 2
Swapnil Bawaskar
 
MySQL Options in OpenStack
Tesora
 
Next-Gen DHCP
Andreas Taudte
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Josef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
QAware GmbH
 
OpenStack Days East -- MySQL Options in OpenStack
Matt Lord
 
Database failover from client perspective
Priit Piipuu
 
Stream Processing @ Lyft
Jamie Grier
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Kai Wähner
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
Frank Munz
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
data Artisans Product Announcement
Flink Forward
 
How to run a bank on Apache CloudStack
gjdevos
 
G rpc talk with intel (3)
Intel
 
Ad

More from PivotalOpenSourceHub (20)

PPTX
New Security Framework in Apache Geode
PivotalOpenSourceHub
 
PDF
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
PDF
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
PivotalOpenSourceHub
 
PDF
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PivotalOpenSourceHub
 
PPTX
#GeodeSummit - Off-Heap Storage Current and Future Design
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PivotalOpenSourceHub
 
PPTX
#GeodeSummit - Spring Data GemFire API Current and Future
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PivotalOpenSourceHub
 
PDF
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PivotalOpenSourceHub
 
PDF
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Design Tradeoffs in Distributed Systems
PivotalOpenSourceHub
 
PDF
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PivotalOpenSourceHub
 
PPTX
GPORCA: Query Optimization as a Service
PivotalOpenSourceHub
 
PDF
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PivotalOpenSourceHub
 
PPTX
Apache Geode Offheap Storage
PivotalOpenSourceHub
 
PPTX
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PivotalOpenSourceHub
 
PPTX
Build & test Apache Hawq
PivotalOpenSourceHub
 
New Security Framework in Apache Geode
PivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
PivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
PivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
PivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PivotalOpenSourceHub
 
GPORCA: Query Optimization as a Service
PivotalOpenSourceHub
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PivotalOpenSourceHub
 
Apache Geode Offheap Storage
PivotalOpenSourceHub
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PivotalOpenSourceHub
 
Build & test Apache Hawq
PivotalOpenSourceHub
 

Recently uploaded (20)

PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
Quantum Threats Are Closer Than You Think – Act Now to Stay Secure
WSO2
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Practical Applications of AI in Local Government
OnBoard
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
Quantum Threats Are Closer Than You Think – Act Now to Stay Secure
WSO2
 

Building Apps with Distributed In-Memory Computing Using Apache Geode

  • 1. Building Apps with Distributed In-Memory Computing using Apache Geode Nitin Lamba @nlamba9 (incubating) William Markito @william_markito
  • 2. Introduction (Nitin) • WHAT? Overview & history • WHY? Relevance & Differentiators • HOW? Features & Basic Concepts • SEE! Quick start Hands-on (William) • LEARN: Advanced Concepts - Persistence, f(x), PDX, … • SHOW: Demos (Docker, PDX) Resources Q & A Agenda 2
  • 4. From GEM to GEODE… 4
  • 5. A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability, resiliency and continuous availability • fast access to critical data sets • location-aware distributed data processing • event-driven data architecture What is GEODE? 5
  • 6. High-level Architecture 6 Powerful app development kit • APIs: Java & REST • Adapters: Redis, Lucene*, Spark*, … Multiple persistence options • Filesystem, RDBMS or HDFS* • Sync: read-through, write-through • Async: write-behind Durable <K,V> cache/ store • Data replicated or partitioned • Redundant storage in-memory/ disk • Flexible data retention policiesÎ ! Locator Server Server Server Server +"""" "  $ % % % && & % % % % % % % % && A Peer-2-Peer Distributed System REST ! * Experimental and waiting community feedback
  • 7. • 1000+ systems in production (real customers) • Cutting edge use cases Incubating but ROCK solid… 7 <2000 2004 2008 2012 2016 Early drivers • Data Volumes • Margins/ transactions • IT maintenance costs • Elasticity needs Real-time needs • Real-time response • Time to market needs • Flexible Data Models • Persistent+In-memory Global Data • Visibility across DC • Fast Ingest • Device to enterprise • Uptime (always on) Open Source! • Apache Incubation • Gemfire > Geode • M1 release • 1st Geode Summit Financial Services US DoD Trade Clearing Travel Portal Online Gambling Telcos Manufacturing Auto Insurance Payroll processing Rail systems
  • 8. …with both SCALE and SPEED, … 8 40K Transactions per second 3TB Data in-memory 17B Records in-memory 120K Concurrent users
  • 9. … and impacting a LOT of people! 9 China Railway
 Corporation Indian Railways 19% 17% 36% of the world population
  • 10. Built for PERFORMANCE… 10 Operationspersecond 0 200,000 400,000 600,000 800,000 YCSB Workloads AReads AUpdates BReads BUpdates CReads DInserts DReads FReads FUpdates Cassandra Geode
  • 11. …and horizontal, consistent SCALABILITY! 11 Horizontal scaling for reads, consistent latency and CPU 0 4.5 9 13.5 18 Speedup 0 1.25 2.5 3.75 5 Server Hosts 2 4 6 8 10 speedup latency (ms) CPU % • Scaled from 256 clients and 2 servers to 1280 clients and 10 servers • Partitioned region with redundancy and 1K data size
  • 12. • Minimize copying • Minimize contention points • Run user code in-process • Partitioning & parallelism • Avoid disk seeks • Automated benchmarks What makes it go FAST? 12
  • 13. • Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization Let’s talk about a few (basic) CONCEPTS… 13
  • 14. • In-memory storage and management for your data • Configurable through XML, Java API or CLI • Collection of Region What is a CACHE? 14 Region Region Region Cache JVM
  • 15. • Distributed java.util.Map on steroids (Key/Value) • Consistent API regardless of where or how data is stored • Observable (reactive) • Highly available, redundant on cache Member (s). What is a REGION? 15 Region Cache java.util.Map JVM Key Value K01 May K02 Tim
  • 16. • Local, Replicated or Partitioned • In-memory or persistent • Redundant • LRU • Overflow Region: Types & Options 16 Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
  • 17. • Durability • WAL for efficient writing • Consistent recovery • Compaction Persistent Regions 17 Modify k1->v5 Create k6->v6 Create k2->v2 Create k4->v4 Oplog2.crf Member 1 Modify k4->v7Oplog3.crf Put k4->v7 Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim Server 1 Server N
  • 18. • A process that has a connection to the system • A process that has created a cache • Embeddable within your application What is a MEMBER? 18 Client Locator Server
  • 19. • A process connected to the Geode server(s) • Can have a local copy of the data • Run OQL queries on local data • Can be notified about events on the servers What is a CLIENT CACHE? 19 Application GemFire Server Region Region RegionClient Cache
  • 20. • Clone & Build • • Start Services • Create & Monitor Region How to START? Easy as !! 20 git clone https://github.com/apache/incubator-geode cd incubator-geode
 ./gradlew build -Dskip.tests=true cd gemfire-assembly/build/install/apache-geode ./bin/gfsh gfsh> start locator --name=locator gfsh> start server --name=server gfsh> create region --name=myRegion —type=REPLICATE gfsh> start [pulse | jconsole] 1 2 3 ' 1 2 3
  • 22. • Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization More (advanced) CONCEPTS… 22
  • 23. Persistence - Shared Nothing 23 Server 3Server 2Server 1
  • 24. Persistence - Shared Nothing 24 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 25. Persistence - Shared Nothing 25 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 26. Persistence - Shared Nothing 26 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 27. Persistence - Shared Nothing 27 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 28. Persistence - Shared Nothing 28 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary B3 B2 Server 1 waits for others when it starts
  • 29. Persistence - Shared Nothing 29 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary Fetches missed operations on restart
  • 30. Persistence - Operational Logs 30 Create k1->v1 Create 
 k2->v2 Modify
 k1->v3 Create 
 k4->v4 Modify k1->v5 Create 
 k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log
  • 31. Persistence - Operational Logs: Compaction 31 Create k1->v1 Create 
 k2->v2 Modify
 k1->v3 Create 
 k4->v4 Modify k1->v5 Create 
 k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log Copy live data forward
  • 32. • Used for distributed concurrent processing 
 (Map/Reduce, stored procedure) • Highly available • Data oriented • Member oriented Functions 32 Submit (f1) f1 , f2 , … fn Execute
 Functions
  • 33. Functions 33 Server Server FunctionService.onRegion.withFilter.execute ResultCollector.getResult Server Distributed System execute Server Server 6 1 result execute execute result result 2 5 3 4 3 4 Server Partitioned Region Data Store - X Partitioned Region Data Store - Y Partitioned Region Data Store - Z Partitioned Region Data Accessor Partitioned Region Data Accessor filter = Keys X, Y Client Region
  • 34. • Register Interest • Individual Keys OR RegEx for Keys • Updates Local Copy • Examples: • region.registerInterest(“key-1”); • region1.registerInterestRegex(“[a-z]+“); • Continuous Query • Receive Notification when Query condition met on server • Example: • SELECT * FROM /tradeOrder t WHERE t.price > 100.00 Can be DURABLE Events & Notifications 34
  • 35. • CacheWriter / CacheListener • AsyncEventListener (queue / batch) • Parallel or Serial • Conflation Listeners 35
  • 37. Fixed or Flexible schema? 37 id name age pet_id or { id : 1, name : “Fred”, age : 42, pet : { name : “Barney”, type : “dino” } }
  • 38. Portable Data eXchange (PDX) 38 C#, C++, Java, JSON No IDL, no schemas, no hand-coding Schema evolution (Forward and Backward Compatible) * domain object classes not required | header | data | | pdx | length | dsid | typeid | fields | offsets |
  • 40. But HOW to serialize data? 40 Benchmark: https://github.com/eishay/jvm-serializers
  • 41. Schema Evolution 41 Member A Member B Distributed Type Definitions v2v1 Application #1 Application #2 v2 objects preserve data from missing fields v1 objects use default values to fill in new fields PDX provides forwards and backwards compatibility, no code required
  • 43. Code • New features • Bug fixes • Writing tests Documentation • Wiki • Web site • User guide How to CONTRIBUTE? 43 Community • Join the mailing list • Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta