SlideShare a Scribd company logo
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés de la Peña at Big Data Spain 2015
1
Andrés de la Peña
andres@stratio.com
GEOSPATIAL
AND BITEMPORAL
SEARCH IN C* WITH PLUGGABLE LUCENEINDEX
@a_de_la_pena
Pluggable Lucene based 2i
Geospatial Search
Bitemporal Indexes
1
2
3
CONTENTS
PLUGGABLE
LUCENE2i
primary key
secondary indexes
token ranges
Throughput
Expressiveness
Cassandra query methods
#BDS15 4
primary key
secondary indexes
token ranges
Cassandra query methods by use case
#BDS15 5
primary key
secondary indexes
token ranges
Operational Analytics
Cassandra query methods trade offs
#BDS15 6
• Pure-range queries limited to partition
• No Boolean logic
• No Full text search
• Sorting limited to partition
• Full-table scan
• High load
• High latency
• Low concurrency
primary key
secondary indexes
token ranges
primary key
secondary indexes
token ranges
Operational Analytics
A third use case
#BDS15 7
AnalyticsOperational Search
• Not as fast as primary key queries
• Not as expressive as map reduce
• Search can be used for both cases
#BDS15 8
CQL + Lucene
A Lucene based secondary index implementation
A Lucene based secondary index implementation
• Proven stable and fast indexing solution
• Expressive queries
- Multivariable, ranges, full text, sorting, top-k, etc.
• Mature distributed search solutions built on top of it
- Solr, ElasticSearch
• Just a small embeddable library
• Easily extensible
• Published under the Apache License
#BDS15 9
Cassandra query methods
#BDS15 10
primary key
token ranges
primary key
secondary indexes
token ranges
primary key
secondary indexes
token ranges
• Mid expressiveness
• Mid latency
• Mid load
• Low expressiveness
• Low latency
• Low load
• High expressiveness
• High latency
• High load
Operational AnalyticsSearch
A Lucene based secondary index implementation
CLIENT
C*
node
C*
node
C*
node
Lucene
index
Lucene
index
Lucene
index
#BDS15 11
• Each node indexes its own data
• Keep P2P architecture
• Distribution and replication managed by C*
• Just a single pluggable JAR file
- CASSANDRA-8717
JVM
JVM
JVM
CREATE TABLE tweets (
id bigint,
created timestamp,
message text,
userid bigint,
username text,
PRIMARY KEY (userid, created, id)
);
Create index
• Built in the background in any moment
• Real time updates
• Mapping eases ETL
• Language aware
#BDS15 12
ALTER TABLE tweets ADD lucene TEXT;
CREATE CUSTOM INDEX tweets_idx ON tweets (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '10',
'schema' : ' fields : {
created : {type : "date", pattern : "yyyy-MM-dd"},
message : {type : "text", analyzer : "english"},
userid : {type : "string"},
username : {type : "string"} } '};
SELECT * FROM tweets WHERE lucene = '{
filter : {
type : "boolean",
must : [
{type : "range", field : "created_at", lower : "2015/01/01"},
{type : "wildcard", field : "user", value : "a*"}
],
not : [
{type : "match", field : "user", value : "andres"}
]
},
sort : {
fields: [
{field : "time", reverse : true},
{field : "user", reverse : false}
]
}
}' LIMIT 10000;
Searching for rows
#BDS15 13
Integrating Lucene & Spark
CLIENT
Spark
master
C*
node
C*
node
C*
node
Lucene
Lucene
Lucene
• Compute large amounts of data
• Filtering push-down
• Avoid systematic full scan
• Reduces the amount of data to be processed
#BDS15 14
Index performance in Spark
#BDS15 15
0
500
1000
1500
2000
2500
0 20 40 60 80 100
seconds
millions of collected rows
index
full scan
SPATIAL SEARCH
Lucene spatial module
• Spatial4J shapes
- Points, rectangles, circles, etc.
• Spatial search strategies
- BBox, RecursivePrefixTree, PointVector, etc.
• Not only geographical data
- Numbers, dates
• It can be combined with other searches
#BDS15 17
Indexing geographical locations
#BDS15 18
CREATE CUSTOM INDEX restaurants_idx
ON restaurants (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
location : {
type : "geo_point",
latitude : "lat",
longitude : "lon"
},
stars: {type : "integer" }
}
}
'};
CREATE TABLE restaurants(
name text PRIMARY KEY,
stars bigint,
lat double,
lon double);
• No native shape data types in CQL
• Many-to-one column mapping
• Just points. For now.
Bounding box search
#BDS15 19
SELECT * FROM restaurants
WHERE lucene =
'{
filter :
{
type : "geo_bbox",
field : "location",
min_latitude : 40.425978,
max_latitude : 40.445886,
min_longitude : -3.808252,
max_longitude : -3.770999
}
}';
Distance search
#BDS15 20
SELECT * FROM restaurants
WHERE lucene =
'{
filter :
{
type : "geo_distance",
field : "location",
latitude : 40.443270,
longitude : -3.800498,
min_distance : "100m",
max_distance : "2km"
}
}';
Combining geospatial searches
#BDS15 21
SELECT * FROM restaurants WHERE lucene =
'{ filter : {
type : "boolean",
must : [
{
type : "geo_distance",
field : "location",
latitude : 40.443270,
longitude : -3.800498,
max_distance : "10km"
},
{
type : "range",
field : "stars",
lower : 2,
upper : 4
}
] } }';
Lucene spatial is not only geospatial…
#BDS15 22
• General geometry
• Numeric ranges
- NumberRangePrefixTree
• Date ranges/durations
- DateRangePrefixTree
Temporal/Date durations
#BDS15 23
• A pair composed by a start-date and a stop-date
- Can be indexed as points in a 2D space
• David Smiley's DateRangePrefixTree
- Levels for common date-ranges: years, months, days…
- Spatial operations: intersects, is_within, contains
27 Nov 2015 29 Dec2015
intersects
is - within
contains
Indexing date ranges
#BDS15 24
CREATE CUSTOM INDEX breakdowns_idx
ON breakdowns (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
duration: {
type : "date_range",
from : "start_date",
to : "stop_date",
pattern : "yyyy-MM-dd"
},
cause: {type : "string" }
}
}
'};
CREATE TABLE breakdowns (
system text PRIMARY KEY,
cause text,
start_date timestamp,
stop_date timestamp);
• No native date range type in CQL
• Many-to-one column mapping
• Spatial operations
Searching for date ranges
#BDS15 25
SELECT * FROM breakdowns
WHERE lucene =
'{
filter :
{
type : "date_range",
field : "duration",
from : "2015-01-01",
to : "2015-01-05",
operation : "intersects"
}
}';
SELECT * FROM users
WHERE lucene =
'{ filter : {
type : "boolean",
must : [
{
type : "date_range",
field : "duration",
from : "2015-01-01",
to : "2015-01-05",
operation : "is_within"
},
{
type : "match",
field : "cause",
value : "human error"
}
] } }';
INDEXING
BITEMPORAL
DATA
The bitemporal data model
#BDS15 27
• Stores WHAT and WHEN
• Support for corrections.
• Reproducible business perspective history at a point of time.
• Trace why a decision was made.
The bitemporal data model
#BDS15 28
• Valid Time
- The application period
- WHAT happened, the real life fact period
• Transaction Time
- The system period
- WHEN the system consider it true
The bitemporal data model: example
#BDS15 29
person city vt_from vt_to tt_from tt_to
John Smallville 3-Apr-1975 ∞ 4-Apr-1975 26-Dec-1994
John Smallville 3-Apr-1975 25-Aug-1994 27-Dec-1994 ∞
John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001
John Bigtown 26-Aug-1994 30-May-1995 2-Feb-2001 ∞
John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞
John Bigtown 3-Sep-2000 ∞ 2-Feb-2001 31-Mar-2001
John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞
Modified example from Wikipedia
https://en.wikipedia.org/wiki/Temporal_database
A naïve approach
#BDS15 30
CREATE CUSTOM INDEX census_idx
ON census (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
vt_from : { type : "date", pattern : "yyyyMMdd" },
vt_to : { type : "date", pattern : "yyyyMMdd" },
tt_from : { type : "date", pattern : "yyyyMMdd" },
tt_to : { type : "date", pattern : "yyyyMMdd" }
}} '};
Using 4 dates
A naive approach
#BDS15 31
SELECT * FROM census WHERE lucene =
'{ filter : { type : "boolean",
must : [
should : [
{ type : "range", field : "vt_from", lower : "", upper : "",
include_lower=true, include_upper=true },
{ type : "range", field : "vt_to", lower : "", upper : "",
include_lower=true, include_upper=true },
must : [
{ type : "range", field : "vt_from", upper : "", include_upper=true},
{ type : "range", field : "vt_to", lower : "", include_lower=true}]
],
should : [
{ type : "range", field : “tt_from", lower : "", upper : "",
include_lower=true, include_upper=true },
{ type : "range", field : “tt_to", lower : "", upper : "",
include_lower=true, include_upper=true },
must : [
{ type : "range", field : “tt_from", upper : "", include_upper=true},
{ type : "range", field : “tt_to", lower : "", include_lower=true}
]
]
] } }' AND person = 'John Doe';
A naive approach: Issues
#BDS15 32
• Very difficult to understand/build the query.
• Now value (∞) using Long.MAX_VALUE is costly.
A spatial approach
#BDS15 33
CREATE CUSTOM INDEX census_idx
ON census (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'schema' : '{ fields : {
vt: {
type : "date_range",
pattern : "yyyyMMdd",
from : "vt_from",
to : "vt_to"
},
tt: {
type : "date_range",
pattern : "yyyyMMdd",
from : "tt_from",
to : "tt_to"
},
} } '};
Using 2 date ranges
SELECT * FROM users WHERE lucene =
'{ filter : {
type : "boolean",
must : [
{
type : "date_range",
field : "vt",
from : "20150501",
to : "99999999",
operation : "intersects"
},
{
type : "date_range",
field : "tt",
from : "20150501",
to : "9999999999",
operation : "intersects"
}
] } }';
A spatial approach: performance issues
#BDS15 34
• Very difficult to understand/build the query.
• Now value (∞) using Long.MAX_VALUE is costly.
4R-Tree to the rescue
#BDS15 35
• Based on Bliujute, R., Jensen, C. S., & Slivinskas, G. (2000). Light-weight indexing
of general bitemporal data
• The Now Value is never stored.
• The data is stored in 4 R-Trees.
• Queries are transformed and distributed among the trees.
Point(vt_from, tt_from) Line(vt_from,vt_to,tt_to)
Rectangle(vt_from,vt_to,
tt_from,tt_to)Line(vt_from,vt_to,tt_to)
4R-Tree to the rescue: storing data
#BDS15 36
TT_TO==NOW && VT_TO==NOW
TT_TO==NOW && VT_TO!=NOW
TT_TO!=NOW && VT_TO==NOW
TT_TO!=NOW && VT_TO!=NOW
•
R1 R2 R3 R4
4R-Tree to the rescue: searching data
#BDS15 37
IF (TT_FROM!=NOW) && (TT_TO >= VT_FROM):
searchR1(0, TT_TO, 0,VT_TO) U
searchR2(0, TT_TO, VT_FROM,VT_TO) U
searchR3(max(TT_FROM,VT_FROM),TT_TO,0,VT_TO)U
searchR4(TT_FROM,TT_TO, VT_FROM, VT_TO)
IF (TT_FROM!=NOW) && (TT_TO < VT_FROM):
searchR2(0, TT_TO, VT_FROM,VT_TO) U
searchR4(TT_FROM,TT_TO, VT_FROM, VT_TO)
IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]≠[0,MAX]) && (TT_TO >= VT_FROM):
searchR1(0, TT_TO, 0,VT_TO) U searchR2(0, TT_TO, VT_FROM,VT_TO)
IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]≠[0,MAX]) && (TT_TO < VT_FROM):
searchR2(0, TT_TO, VT_FROM,VT_TO)
IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]=[0,MAX]):
R1 U R2
4R-Tree to the rescue:
#BDS15 38
• Problem!!! Lucene does not have support for R-Tree
• Our Solution:
- Use 2 DateRangePrefixTrees for each R-Tree
• Future Work: Experiment with other Lucene spatial trees and strategies.
The bitemporal data model: example
#BDS15 39
Modified example from Wikipedia
https://en.wikipedia.org/wiki/Temporal_database
person city vt_from vt_to tt_from tt_to
John Smallville 3-Apr-1975 ∞ 4-Apr-1975 26-Dec-1994
John Smallville 3-Apr-1975 25-Aug-1994 27-Dec-1994 ∞
John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001
John Bigtown 26-Aug-1994 30-May-1995 2-Feb-2001 ∞
John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞
John Bigtown 3-Sep-2000 ∞ 2-Feb-2001 31-Mar-2001
John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞
Indexing bitemporal data
#BDS15 40
CREATE CUSTOM INDEX census_idx
ON census (lucene)
USING
'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'schema' : '{
fields : {
bitemporal : {
type : "bitemporal",
vt_from : "vt_from",
vt_to : "vt_to",
tt_from : "vt_from",
tt_to : "tt_to",
pattern : "yyyyMMdd"
now_value : "99999999"
},
city : { type : "string" }
}
} '};
CREATE TABLE census (
person text,
city text,
vt_from text,
vt_to text,
tt_from text,
tt_to text,
lucene text,
PRIMARY KEY((person),vt_from,tt_from)
);
Searching for bitemporal data, several queries
#BDS15 41
SELECT * FROM users WHERE lucene =
'{
filter :
{
type : "bitemporal",
field : "bitemporal",
vt_from : "99999999",
vt_to : "99999999",
tt_from : "99999999",
tt_to : "99999999"
}
}' AND person = 'John Doe';
Where does the system currently
think that John lives right now?
person city vt_from vt_to tt_from tt_to
John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞
Searching for bitemporal data
#BDS15 42
person city vt_from vt_to tt_from tt_to
John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞
Where does the system currently
think that John lived in 1999?
SELECT * FROM users WHERE lucene =
'{
filter :
{
type : "bitemporal",
field : "bitemporal",
vt_from : "19990101",
vt_to : "19991231",
tt_from : "99999999",
tt_to : "99999999"
}
}' AND person = 'John Doe';
#BDS15 43
On 01-Jan-2000, where did the
system think John was living back in
1999?
SELECT * FROM users WHERE lucene =
'{
filter :
{
type : "bitemporal",
field : "bitemporal",
vt_from : "19990101",
vt_to : "19991231",
tt_from : “20000101",
tt_to : “20000101"
}
}' AND person = 'John Doe';
person city vt_from vt_to tt_from tt_to
John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001
Searching for bitemporal data
#BDS15 44
SELECT * FROM users WHERE lucene =
'{
filter :
{
type : "boolean",
must : [
{ type : "bitemporal", field : "bitemporal",
vt_from : "99999999", vt_to : "99999999",
tt_from : "99999999", tt_to : "99999999"
},
{ type : "match",
field : "city",
value : "smallville"}
]}
}}';
Who currently lives at Smallville?
Searching for bitemporal data
CONCLUSIONS
Conclusions
• Pluggable Lucene features in Cassandra
• Basic geospatial features
• Date/Time durations
• Bitemporal data model indexing
• Compatible with MapReduce frameworks
• Preserves Cassandra's functionality
#BDS15 46
github.com/stratio/cassandra-lucene-index
• Published as plugin for Apache Cassandra
• Apache License Version 2.0
Its open source
#BDS15 47
BIG DATA
CHILD`S PLAY
Andrés de la Peña
andres@stratio.com
@a_de_la_pena
Ad

More Related Content

Viewers also liked (17)

Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Big Data Spain
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Big Data Spain
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data Spain
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
Big Data Spain
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Big Data Spain
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Big Data Spain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Spain
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
Big Data Spain
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
Big Data Spain
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Big Data Spain
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Big Data Spain
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Big Data Spain
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
Big Data Spain
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
Big Data Spain
 
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Big Data Spain
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Big Data Spain
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data Spain
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
Big Data Spain
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Big Data Spain
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Big Data Spain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Spain
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
Big Data Spain
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
Big Data Spain
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Big Data Spain
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Big Data Spain
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Big Data Spain
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
Big Data Spain
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
Big Data Spain
 

Similar to Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés de la Peña at Big Data Spain 2015 (20)

Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
DataStax Academy
 
Geospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene index
Andrés de la Peña
 
Stratio's Cassandra Lucene index: Geospatial use cases
Stratio's Cassandra Lucene index: Geospatial use casesStratio's Cassandra Lucene index: Geospatial use cases
Stratio's Cassandra Lucene index: Geospatial use cases
Andrés de la Peña
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
DataStax
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Big Data Spain
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Keshav Murthy
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdfITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
Ortus Solutions, Corp
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
Redis Labs
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Keshav Murthy
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
Keshav Murthy
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Dan Robinson
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
Jonathan Sedar
 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
MongoDB
 
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Lucian Precup
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
DataStax Academy
 
Geospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene index
Andrés de la Peña
 
Stratio's Cassandra Lucene index: Geospatial use cases
Stratio's Cassandra Lucene index: Geospatial use casesStratio's Cassandra Lucene index: Geospatial use cases
Stratio's Cassandra Lucene index: Geospatial use cases
Andrés de la Peña
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
DataStax
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Big Data Spain
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Keshav Murthy
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdfITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
Ortus Solutions, Corp
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
Redis Labs
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Keshav Murthy
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
Keshav Murthy
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Dan Robinson
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
Jonathan Sedar
 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
MongoDB
 
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Lucian Precup
 
Ad

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 
Ad

Recently uploaded (20)

IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 

Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés de la Peña at Big Data Spain 2015

  • 2. 1 Andrés de la Peña [email protected] GEOSPATIAL AND BITEMPORAL SEARCH IN C* WITH PLUGGABLE LUCENEINDEX @a_de_la_pena
  • 3. Pluggable Lucene based 2i Geospatial Search Bitemporal Indexes 1 2 3 CONTENTS
  • 5. primary key secondary indexes token ranges Throughput Expressiveness Cassandra query methods #BDS15 4
  • 6. primary key secondary indexes token ranges Cassandra query methods by use case #BDS15 5 primary key secondary indexes token ranges Operational Analytics
  • 7. Cassandra query methods trade offs #BDS15 6 • Pure-range queries limited to partition • No Boolean logic • No Full text search • Sorting limited to partition • Full-table scan • High load • High latency • Low concurrency primary key secondary indexes token ranges primary key secondary indexes token ranges Operational Analytics
  • 8. A third use case #BDS15 7 AnalyticsOperational Search • Not as fast as primary key queries • Not as expressive as map reduce • Search can be used for both cases
  • 9. #BDS15 8 CQL + Lucene A Lucene based secondary index implementation
  • 10. A Lucene based secondary index implementation • Proven stable and fast indexing solution • Expressive queries - Multivariable, ranges, full text, sorting, top-k, etc. • Mature distributed search solutions built on top of it - Solr, ElasticSearch • Just a small embeddable library • Easily extensible • Published under the Apache License #BDS15 9
  • 11. Cassandra query methods #BDS15 10 primary key token ranges primary key secondary indexes token ranges primary key secondary indexes token ranges • Mid expressiveness • Mid latency • Mid load • Low expressiveness • Low latency • Low load • High expressiveness • High latency • High load Operational AnalyticsSearch
  • 12. A Lucene based secondary index implementation CLIENT C* node C* node C* node Lucene index Lucene index Lucene index #BDS15 11 • Each node indexes its own data • Keep P2P architecture • Distribution and replication managed by C* • Just a single pluggable JAR file - CASSANDRA-8717 JVM JVM JVM
  • 13. CREATE TABLE tweets ( id bigint, created timestamp, message text, userid bigint, username text, PRIMARY KEY (userid, created, id) ); Create index • Built in the background in any moment • Real time updates • Mapping eases ETL • Language aware #BDS15 12 ALTER TABLE tweets ADD lucene TEXT; CREATE CUSTOM INDEX tweets_idx ON tweets (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds' : '10', 'schema' : ' fields : { created : {type : "date", pattern : "yyyy-MM-dd"}, message : {type : "text", analyzer : "english"}, userid : {type : "string"}, username : {type : "string"} } '};
  • 14. SELECT * FROM tweets WHERE lucene = '{ filter : { type : "boolean", must : [ {type : "range", field : "created_at", lower : "2015/01/01"}, {type : "wildcard", field : "user", value : "a*"} ], not : [ {type : "match", field : "user", value : "andres"} ] }, sort : { fields: [ {field : "time", reverse : true}, {field : "user", reverse : false} ] } }' LIMIT 10000; Searching for rows #BDS15 13
  • 15. Integrating Lucene & Spark CLIENT Spark master C* node C* node C* node Lucene Lucene Lucene • Compute large amounts of data • Filtering push-down • Avoid systematic full scan • Reduces the amount of data to be processed #BDS15 14
  • 16. Index performance in Spark #BDS15 15 0 500 1000 1500 2000 2500 0 20 40 60 80 100 seconds millions of collected rows index full scan
  • 18. Lucene spatial module • Spatial4J shapes - Points, rectangles, circles, etc. • Spatial search strategies - BBox, RecursivePrefixTree, PointVector, etc. • Not only geographical data - Numbers, dates • It can be combined with other searches #BDS15 17
  • 19. Indexing geographical locations #BDS15 18 CREATE CUSTOM INDEX restaurants_idx ON restaurants (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds' : '1', 'schema' : '{ fields : { location : { type : "geo_point", latitude : "lat", longitude : "lon" }, stars: {type : "integer" } } } '}; CREATE TABLE restaurants( name text PRIMARY KEY, stars bigint, lat double, lon double); • No native shape data types in CQL • Many-to-one column mapping • Just points. For now.
  • 20. Bounding box search #BDS15 19 SELECT * FROM restaurants WHERE lucene = '{ filter : { type : "geo_bbox", field : "location", min_latitude : 40.425978, max_latitude : 40.445886, min_longitude : -3.808252, max_longitude : -3.770999 } }';
  • 21. Distance search #BDS15 20 SELECT * FROM restaurants WHERE lucene = '{ filter : { type : "geo_distance", field : "location", latitude : 40.443270, longitude : -3.800498, min_distance : "100m", max_distance : "2km" } }';
  • 22. Combining geospatial searches #BDS15 21 SELECT * FROM restaurants WHERE lucene = '{ filter : { type : "boolean", must : [ { type : "geo_distance", field : "location", latitude : 40.443270, longitude : -3.800498, max_distance : "10km" }, { type : "range", field : "stars", lower : 2, upper : 4 } ] } }';
  • 23. Lucene spatial is not only geospatial… #BDS15 22 • General geometry • Numeric ranges - NumberRangePrefixTree • Date ranges/durations - DateRangePrefixTree
  • 24. Temporal/Date durations #BDS15 23 • A pair composed by a start-date and a stop-date - Can be indexed as points in a 2D space • David Smiley's DateRangePrefixTree - Levels for common date-ranges: years, months, days… - Spatial operations: intersects, is_within, contains 27 Nov 2015 29 Dec2015 intersects is - within contains
  • 25. Indexing date ranges #BDS15 24 CREATE CUSTOM INDEX breakdowns_idx ON breakdowns (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds' : '1', 'schema' : '{ fields : { duration: { type : "date_range", from : "start_date", to : "stop_date", pattern : "yyyy-MM-dd" }, cause: {type : "string" } } } '}; CREATE TABLE breakdowns ( system text PRIMARY KEY, cause text, start_date timestamp, stop_date timestamp); • No native date range type in CQL • Many-to-one column mapping • Spatial operations
  • 26. Searching for date ranges #BDS15 25 SELECT * FROM breakdowns WHERE lucene = '{ filter : { type : "date_range", field : "duration", from : "2015-01-01", to : "2015-01-05", operation : "intersects" } }'; SELECT * FROM users WHERE lucene = '{ filter : { type : "boolean", must : [ { type : "date_range", field : "duration", from : "2015-01-01", to : "2015-01-05", operation : "is_within" }, { type : "match", field : "cause", value : "human error" } ] } }';
  • 28. The bitemporal data model #BDS15 27 • Stores WHAT and WHEN • Support for corrections. • Reproducible business perspective history at a point of time. • Trace why a decision was made.
  • 29. The bitemporal data model #BDS15 28 • Valid Time - The application period - WHAT happened, the real life fact period • Transaction Time - The system period - WHEN the system consider it true
  • 30. The bitemporal data model: example #BDS15 29 person city vt_from vt_to tt_from tt_to John Smallville 3-Apr-1975 ∞ 4-Apr-1975 26-Dec-1994 John Smallville 3-Apr-1975 25-Aug-1994 27-Dec-1994 ∞ John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001 John Bigtown 26-Aug-1994 30-May-1995 2-Feb-2001 ∞ John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞ John Bigtown 3-Sep-2000 ∞ 2-Feb-2001 31-Mar-2001 John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞ Modified example from Wikipedia https://en.wikipedia.org/wiki/Temporal_database
  • 31. A naïve approach #BDS15 30 CREATE CUSTOM INDEX census_idx ON census (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds' : '1', 'schema' : '{ fields : { vt_from : { type : "date", pattern : "yyyyMMdd" }, vt_to : { type : "date", pattern : "yyyyMMdd" }, tt_from : { type : "date", pattern : "yyyyMMdd" }, tt_to : { type : "date", pattern : "yyyyMMdd" } }} '}; Using 4 dates
  • 32. A naive approach #BDS15 31 SELECT * FROM census WHERE lucene = '{ filter : { type : "boolean", must : [ should : [ { type : "range", field : "vt_from", lower : "", upper : "", include_lower=true, include_upper=true }, { type : "range", field : "vt_to", lower : "", upper : "", include_lower=true, include_upper=true }, must : [ { type : "range", field : "vt_from", upper : "", include_upper=true}, { type : "range", field : "vt_to", lower : "", include_lower=true}] ], should : [ { type : "range", field : “tt_from", lower : "", upper : "", include_lower=true, include_upper=true }, { type : "range", field : “tt_to", lower : "", upper : "", include_lower=true, include_upper=true }, must : [ { type : "range", field : “tt_from", upper : "", include_upper=true}, { type : "range", field : “tt_to", lower : "", include_lower=true} ] ] ] } }' AND person = 'John Doe';
  • 33. A naive approach: Issues #BDS15 32 • Very difficult to understand/build the query. • Now value (∞) using Long.MAX_VALUE is costly.
  • 34. A spatial approach #BDS15 33 CREATE CUSTOM INDEX census_idx ON census (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'schema' : '{ fields : { vt: { type : "date_range", pattern : "yyyyMMdd", from : "vt_from", to : "vt_to" }, tt: { type : "date_range", pattern : "yyyyMMdd", from : "tt_from", to : "tt_to" }, } } '}; Using 2 date ranges SELECT * FROM users WHERE lucene = '{ filter : { type : "boolean", must : [ { type : "date_range", field : "vt", from : "20150501", to : "99999999", operation : "intersects" }, { type : "date_range", field : "tt", from : "20150501", to : "9999999999", operation : "intersects" } ] } }';
  • 35. A spatial approach: performance issues #BDS15 34 • Very difficult to understand/build the query. • Now value (∞) using Long.MAX_VALUE is costly.
  • 36. 4R-Tree to the rescue #BDS15 35 • Based on Bliujute, R., Jensen, C. S., & Slivinskas, G. (2000). Light-weight indexing of general bitemporal data • The Now Value is never stored. • The data is stored in 4 R-Trees. • Queries are transformed and distributed among the trees.
  • 37. Point(vt_from, tt_from) Line(vt_from,vt_to,tt_to) Rectangle(vt_from,vt_to, tt_from,tt_to)Line(vt_from,vt_to,tt_to) 4R-Tree to the rescue: storing data #BDS15 36 TT_TO==NOW && VT_TO==NOW TT_TO==NOW && VT_TO!=NOW TT_TO!=NOW && VT_TO==NOW TT_TO!=NOW && VT_TO!=NOW • R1 R2 R3 R4
  • 38. 4R-Tree to the rescue: searching data #BDS15 37 IF (TT_FROM!=NOW) && (TT_TO >= VT_FROM): searchR1(0, TT_TO, 0,VT_TO) U searchR2(0, TT_TO, VT_FROM,VT_TO) U searchR3(max(TT_FROM,VT_FROM),TT_TO,0,VT_TO)U searchR4(TT_FROM,TT_TO, VT_FROM, VT_TO) IF (TT_FROM!=NOW) && (TT_TO < VT_FROM): searchR2(0, TT_TO, VT_FROM,VT_TO) U searchR4(TT_FROM,TT_TO, VT_FROM, VT_TO) IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]≠[0,MAX]) && (TT_TO >= VT_FROM): searchR1(0, TT_TO, 0,VT_TO) U searchR2(0, TT_TO, VT_FROM,VT_TO) IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]≠[0,MAX]) && (TT_TO < VT_FROM): searchR2(0, TT_TO, VT_FROM,VT_TO) IF (TT_FROM==NOW) && ([VT_FROM,VT_TO]=[0,MAX]): R1 U R2
  • 39. 4R-Tree to the rescue: #BDS15 38 • Problem!!! Lucene does not have support for R-Tree • Our Solution: - Use 2 DateRangePrefixTrees for each R-Tree • Future Work: Experiment with other Lucene spatial trees and strategies.
  • 40. The bitemporal data model: example #BDS15 39 Modified example from Wikipedia https://en.wikipedia.org/wiki/Temporal_database person city vt_from vt_to tt_from tt_to John Smallville 3-Apr-1975 ∞ 4-Apr-1975 26-Dec-1994 John Smallville 3-Apr-1975 25-Aug-1994 27-Dec-1994 ∞ John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001 John Bigtown 26-Aug-1994 30-May-1995 2-Feb-2001 ∞ John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞ John Bigtown 3-Sep-2000 ∞ 2-Feb-2001 31-Mar-2001 John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞
  • 41. Indexing bitemporal data #BDS15 40 CREATE CUSTOM INDEX census_idx ON census (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'schema' : '{ fields : { bitemporal : { type : "bitemporal", vt_from : "vt_from", vt_to : "vt_to", tt_from : "vt_from", tt_to : "tt_to", pattern : "yyyyMMdd" now_value : "99999999" }, city : { type : "string" } } } '}; CREATE TABLE census ( person text, city text, vt_from text, vt_to text, tt_from text, tt_to text, lucene text, PRIMARY KEY((person),vt_from,tt_from) );
  • 42. Searching for bitemporal data, several queries #BDS15 41 SELECT * FROM users WHERE lucene = '{ filter : { type : "bitemporal", field : "bitemporal", vt_from : "99999999", vt_to : "99999999", tt_from : "99999999", tt_to : "99999999" } }' AND person = 'John Doe'; Where does the system currently think that John lives right now? person city vt_from vt_to tt_from tt_to John Mediumtown 1-Apr-2001 ∞ 1-Apr-2001 ∞
  • 43. Searching for bitemporal data #BDS15 42 person city vt_from vt_to tt_from tt_to John Beachy 1-Jun-1995 3-Sep-2000 2-Feb-2001 ∞ Where does the system currently think that John lived in 1999? SELECT * FROM users WHERE lucene = '{ filter : { type : "bitemporal", field : "bitemporal", vt_from : "19990101", vt_to : "19991231", tt_from : "99999999", tt_to : "99999999" } }' AND person = 'John Doe';
  • 44. #BDS15 43 On 01-Jan-2000, where did the system think John was living back in 1999? SELECT * FROM users WHERE lucene = '{ filter : { type : "bitemporal", field : "bitemporal", vt_from : "19990101", vt_to : "19991231", tt_from : “20000101", tt_to : “20000101" } }' AND person = 'John Doe'; person city vt_from vt_to tt_from tt_to John Bigtown 26-Aug-1994 ∞ 27-Dec-1994 1-Feb-2001 Searching for bitemporal data
  • 45. #BDS15 44 SELECT * FROM users WHERE lucene = '{ filter : { type : "boolean", must : [ { type : "bitemporal", field : "bitemporal", vt_from : "99999999", vt_to : "99999999", tt_from : "99999999", tt_to : "99999999" }, { type : "match", field : "city", value : "smallville"} ]} }}'; Who currently lives at Smallville? Searching for bitemporal data
  • 47. Conclusions • Pluggable Lucene features in Cassandra • Basic geospatial features • Date/Time durations • Bitemporal data model indexing • Compatible with MapReduce frameworks • Preserves Cassandra's functionality #BDS15 46
  • 48. github.com/stratio/cassandra-lucene-index • Published as plugin for Apache Cassandra • Apache License Version 2.0 Its open source #BDS15 47