SlideShare a Scribd company logo
An Introduction to DSE Search
Caleb Rackliffe
Software Engineer
caleb.rackliffe@datastax.com
@calebrackliffe
What problem were we trying to solve?
3
Application
DataStax Driver
4
SELECT * FROM customers WHERE country LIKE '%land%';
5
What about secondary indexes?
Why not just create your own secondary index
implementation that supports wildcard queries?
7
I need full-text search!
DataStax: An Introduction to DataStax Enterprise Search
Why did we build something new?
10
Application
DataStax Driver Solr Client
Polyglot Persistence!
12
Application
DataStax Driver Solr Client
Consistency
Cost
Complexity
DataStax: An Introduction to DataStax Enterprise Search
14
partitioning
multi-DC
replication
geospatial
wildcards
monitoring
C* field type support (UDT, Tuple, collections)
security
live indexing
sorting
faceting
fault-tolerant distributed search
caching
text analysis
grouping
automatic index updates
JVM
CQL
repair
15
Application
DataStax Driver Solr Client
Consistency
Complexity
Cost
How about some examples?
Creating a Solr Core
bash$ dse cassandra -s
cqlsh> CREATE KEYSPACE test
WITH replication = {'class': 'NetworkTopologyStrategy', 'Solr':1};
cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY,
fullname text,
address_ map<text, text>);
bash$ dsetool create_core test.user generateResources=true
Start a node…
Create a table…
Create the core…
bash$ dsetool get_core_schema test.user
<?xml version="1.0" encoding="UTF-8" standalone=“no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="text">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.StrField" name="string"/>
</types>
<fields>
<field indexed="true" name="username" stored="true" type="string"/>
<field indexed="true" name="fullname" stored="true" type="text"/>
<dynamicField indexed="true" name="address_*" stored="true" type="string"/>
</fields>
<uniqueKey>fullname</uniqueKey>
</schema>
The Schema
Insert Rows (…and Index Documents)
cqlsh:test> INSERT INTO user(username, fullname, address)
VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});
cqlsh:test> INSERT INTO user(username, fullname, address)
VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});
cqlsh:test> INSERT INTO user(username, fullname, address)
VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});
…and that’s it. No ETL. No writing to a second datastore.
Wildcards
cqlsh:test> SELECT username, address
FROM user
WHERE solr_query='{"q":"address_home:U*"}';
username | address
-----------+----------------------------------------------------
sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'}
thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}
(2 rows)
Sorting and Limits
cqlsh:test> SELECT username, address
FROM user
WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}';
username | address
-----------+----------------------------------------------------
thegrinch | {'address_home': 'US', 'address_work': 'HQ'}
sbtourist | {'address_home': 'UK', 'address_work': 'UK'}
bereng | {'address_home': 'ES', 'address_work': 'ES'}
(3 rows)
cqlsh:test> SELECT username, address
FROM user
WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}'
LIMIT 1;
username | address
-----------+----------------------------------------------------
thegrinch | {'address_home': 'US', 'address_work': 'HQ'}
(3 rows)
Faceting
cqlsh:test> SELECT *
FROM user
WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';
facet_fields
--------------------------------------------
{"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}
(1 rows)
Partition Restrictions
cqlsh:test> CREATE TABLE event(sensor_id bigint,
recording_time timestamp,
description text,
PRIMARY KEY(sensor_id, recording_time));
…
cqlsh:test> SELECT recording_time, description
FROM test.event
WHERE sensor_id = 2314234432
AND solr_query=‘description:unremarkable’;
What do the internals look like?
Indexing
26
Buffered
Searchable
Durable
Memory
Disk
27
Buffered
Searchable
Durable
Memory
Disk
28
RAMBuffer
Segment
Segment
Memory
Disk
Segment Segment
Buffered
Searchable
Durable
Soft Commit
Hard Commit
Querying
Replica Selection
A
A
RF=2
shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy
Replica Selection
A
A
RF=2
shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy
What happens if a shard query fails?
Failover: Phase 1
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4
Failover: Phase 2
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4
DataStax: An Introduction to DataStax Enterprise Search
Failover: Phase 3
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4
Platform Integrations
Search + Analytics: Explicit Predicate Pushdown
bash$ dse spark
scala> val table = sc.cassandraTable("wiki","solr")
scala> val result = table.select("id","title")
.where(“solr_query=‘body:dog'")
.collect
http://docs.datastax.com
Ad

More Related Content

What's hot (20)

Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
Patrick McFadin
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
Patrick McFadin
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
Patrick McFadin
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
Patrick McFadin
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
Patrick McFadin
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
DataStax
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
Jon Haddad
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New World
Databricks
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
Patrick McFadin
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
Patrick McFadin
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
Patrick McFadin
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
Patrick McFadin
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
Patrick McFadin
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
DataStax
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
Jon Haddad
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New World
Databricks
 

Viewers also liked (20)

Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
Joshua McKenzie
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
DataStax
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax
 
Copa menstrual y esponjas vaginales
Copa menstrual y esponjas vaginalesCopa menstrual y esponjas vaginales
Copa menstrual y esponjas vaginales
Tupper Sex Andalucia
 
Servidor web lamp
Servidor web lampServidor web lamp
Servidor web lamp
yaser6700
 
Magonia getxo blog
Magonia  getxo blogMagonia  getxo blog
Magonia getxo blog
Mikel Agirregabiria
 
Scala for rubyists
Scala for rubyistsScala for rubyists
Scala for rubyists
Michel Perez
 
Accesus - Catalogo andamio para vias ferroviarias
Accesus - Catalogo andamio para vias ferroviariasAccesus - Catalogo andamio para vias ferroviarias
Accesus - Catalogo andamio para vias ferroviarias
Accesus Plataformas Suspendidas
 
Tams 2012
Tams 2012Tams 2012
Tams 2012
Lance McConkey
 
Adquirir una propiedad en españa en 7 pasos
Adquirir una propiedad en españa en 7 pasosAdquirir una propiedad en españa en 7 pasos
Adquirir una propiedad en españa en 7 pasos
Mariscal Abogados | International Law Firm in Spain
 
2013 brand id&print
2013 brand id&print2013 brand id&print
2013 brand id&print
Carl H. Bradford III
 
Pairform cci formpro
Pairform   cci formproPairform   cci formpro
Pairform cci formpro
Christian Colin
 
los bracekts
los bracekts los bracekts
los bracekts
pv996073774
 
9Guia1
9Guia19Guia1
9Guia1
Wilson
 
Una modesta proposición
Una modesta proposiciónUna modesta proposición
Una modesta proposición
Shanie Weissman
 
Dossier ii torneo once caballeros c.f.
Dossier ii torneo once caballeros c.f.Dossier ii torneo once caballeros c.f.
Dossier ii torneo once caballeros c.f.
Jacobo Vázquez Mariño
 
Presentacion corporativa sevenminds agosto2012 (1)
Presentacion corporativa sevenminds agosto2012 (1)Presentacion corporativa sevenminds agosto2012 (1)
Presentacion corporativa sevenminds agosto2012 (1)
Rafael Lopez Rodriguez
 
Project Management Diploma with Instructors
Project Management Diploma with InstructorsProject Management Diploma with Instructors
Project Management Diploma with Instructors
Cisco
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
Joshua McKenzie
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
DataStax
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax
 
Copa menstrual y esponjas vaginales
Copa menstrual y esponjas vaginalesCopa menstrual y esponjas vaginales
Copa menstrual y esponjas vaginales
Tupper Sex Andalucia
 
Servidor web lamp
Servidor web lampServidor web lamp
Servidor web lamp
yaser6700
 
Scala for rubyists
Scala for rubyistsScala for rubyists
Scala for rubyists
Michel Perez
 
9Guia1
9Guia19Guia1
9Guia1
Wilson
 
Una modesta proposición
Una modesta proposiciónUna modesta proposición
Una modesta proposición
Shanie Weissman
 
Presentacion corporativa sevenminds agosto2012 (1)
Presentacion corporativa sevenminds agosto2012 (1)Presentacion corporativa sevenminds agosto2012 (1)
Presentacion corporativa sevenminds agosto2012 (1)
Rafael Lopez Rodriguez
 
Project Management Diploma with Instructors
Project Management Diploma with InstructorsProject Management Diploma with Instructors
Project Management Diploma with Instructors
Cisco
 
Ad

Similar to DataStax: An Introduction to DataStax Enterprise Search (20)

Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
Keshav Murthy
 
Proxysql sharding
Proxysql shardingProxysql sharding
Proxysql sharding
Marco Tusa
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Lukas Fittl
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
fcamachob
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Ontico
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
DataStax
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
sagai
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After Configuration
Cumulus Networks
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
Mindfire Solutions
 
07 application security fundamentals - part 2 - security mechanisms - data ...
07   application security fundamentals - part 2 - security mechanisms - data ...07   application security fundamentals - part 2 - security mechanisms - data ...
07 application security fundamentals - part 2 - security mechanisms - data ...
appsec
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
zznate
 
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASEnable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Invenire Aude
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
DataStax Academy
 
Dbms lab Manual
Dbms lab ManualDbms lab Manual
Dbms lab Manual
Vivek Kumar Sinha
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI Indexes
Jim Hatcher
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
Keshav Murthy
 
Proxysql sharding
Proxysql shardingProxysql sharding
Proxysql sharding
Marco Tusa
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Lukas Fittl
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
fcamachob
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Ontico
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
DataStax
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
sagai
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After Configuration
Cumulus Networks
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
Mindfire Solutions
 
07 application security fundamentals - part 2 - security mechanisms - data ...
07   application security fundamentals - part 2 - security mechanisms - data ...07   application security fundamentals - part 2 - security mechanisms - data ...
07 application security fundamentals - part 2 - security mechanisms - data ...
appsec
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
zznate
 
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASEnable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Invenire Aude
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
DataStax Academy
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI Indexes
Jim Hatcher
 
Ad

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 

Recently uploaded (20)

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 

DataStax: An Introduction to DataStax Enterprise Search

Editor's Notes

  • #2: “Hello! My name is Caleb Rackliffe, and I’m a member of the search team at DataStax. Today I’d like to walk you through a brief (but action-packed) introduction to DataStax Enterprise Search. I’ll start with a question…”
  • #3: “Before we talk about what DSE Search is, let’s make sure we know why we built it.”
  • #4: “Here we have a small Cassandra cluster and an application sitting on top of it, using the Datastax driver. We can go a long way with CQL and proper denormalization, but what happens when we find ourselves wanting to do something as seemingly simple as…”
  • #5: “…this. You’ll recognize the SQL-style wildcard query, which Cassandra does not support out of the box.”
  • #6: Cassandra’s built in secondary indexes might seem like a solution, but they… …don’t support wildcard queries. …can perform poorly unless limited to a single partition. …can perform poorly for very high or very low cardinality fields. …may fail for a frequently updated/deleted column.
  • #7: “You could, but then you’d be saddled with the cost of building and maintaining that, and you’ll still end up with something that is designed for a fairly specific use-case.”
  • #8: “So when our search problem lacks the structure to make denormalization effective, and is beyond the capabilities of C* secondary indexes, we need to think a bit more broadly.”
  • #9: “Fortunately, there are technologies out there that handle full-text and other more advanced kinds of search well, and most of them, like Solr, are built on the foundation of the Apache Lucene project. ”
  • #10: “Well, let’s see what it would look like to use a separate, Lucene-based search cluster alongside our Cassandra cluster…”
  • #11: “…here we are. Our application is now sitting on top of both a Cassandra cluster and separate search cluster. Notice that we’ve added a new client to our application, specifically for search. So we’ve got Cassandra doing key-value lookups and probably some range queries…we’ve got our search cluster handling the more advanced ad-hoc queries for us.”
  • #12: “This is polyglot persistence at its best…right?”
  • #13: “Well, maybe not…and we can talk about this along 3 axes.” Complexity - The persistence layer of our application is now more complex. We have to configure two clients, write to two data stores, and, if we write to one of them asynchronously, manage a queueing solution. Consistency - Since the two data stores have no explicit knowledge of each other, we have to manage questions of consistency between them in our application. Cost - Aside from the implicit cost of complexity, we’ll also need to deal with the explicit cost of infrastructure and hardware for a separate cluster.
  • #14: “So if you need avoid data loss, scale your writes, and replicate your index over multiple DCs, your architecture might start to look like this lovely Rube Goldberg machine. We wanted to provide all of this in an operationally simple package…”
  • #15: “DSE Search is designed to address those problems. We’ve built a coherent search platform that integrates Cassandra’s distributed persistence, Lucene’s core search and indexing functionality, and the advanced features of Solr in the same JVM…and then we’ve made a number of our own enhancements, which we’ll see in the coming slides.”
  • #16: “So back to our architecture diagram. First, with DSE search, we can eliminate the cost associated with running a separate search cluster. We can eliminate much of the complexity at the application layer, since we don’t have to deal with two clients, and we only have to manage one write path…and with all of our data stored in Cassandra alone and collocated with the relevant shards of our search index, we’ve eliminated many of the potential issues of consistency between the two.”
  • #17: “We’ll go into more details on the indexing and query paths, but before we we do that, let’s run through some basic examples and get a feel for the ergonomics of our solution.”
  • #18: “First, we’ll startup a single node. (The -s switch here tells the node it’s going to handle a search workload.) Second, we create a table from the CQL prompt. Third, we create a Solr core over that table from dsetool…and that’s it. We’re ready to index documents. Note that we don’t have to create the Solr schema explicitly, because DSE Search creates it for us, using the CQL schema to determine its type mappings.”
  • #19: “Under the hood, the schema actually looks something like this, but you shouldn’t need to trouble yourself with it, unless our default type mappings aren’t quite right for you. In that case, you can just tweak the auto-generated schema and re-upload it.”
  • #20: “Next we insert a few rows, which will be indexed automatically for search. There is no ETL involved and no explicit writing to a second data store. We’re ready to make some queries…”
  • #21: “…so let’s start with a simple wildcard query. Here, we want to find everyone who’s home address starts with a U, and of course we find users in the United States and the UK.”
  • #22: “Sorting and Limits! In the first query, we just find all our users and sort them descending by home address. In the second query, we do the same thing except we also use the CQL LIMIT keyword to narrow our results down to just the top result by home address.”
  • #23: “Faceting allows us to take the results of a query, in this case a query for all documents, group them, and count the members in each group. In this example, faceting on our users’ work addresses tells us that we have one working in Spain, one at corporate headquarters, and one in the UK. This is very common in the context of a product search, where a user wants to drill into results by brand.”
  • #24: “What if we want to restrict our search to a specific partition? Here I have another table, one that records series of sensor events. Using a CQL partition key restriction in our WHERE clause, we can ensure that our query visits only the node that contains that partition and then filters on it once we get there. Much like our earlier usage of LIMIT, this is a case where we’re translating CQL instructions to search-specific instructions under the hood.”
  • #25: “Now that we have an idea of what basic usage looks like, let’s take a high-level look at what’s going on in the indexing and query internals…”
  • #26: “The indexing process starts with a Cassandra write. It arrives at the coordinator, is distributed to the proper replicas, and it written the commit log and Memtable, as you would expect. At this point, we create an updated Lucene document and queue it up for indexing, then we return to the coordinator and the client. Then, asynchronously, we update the index. Finally, also in the background, when a C* Memtable is flushed to disk, we also flush the corresponding index updates to disk, ensuring their durability.”
  • #27: “In near-real-time search systems, updated documents, once indexed, progress through 3 stages: a buffered stage, where they are just accumulated in memory; a searchable stage, where they move to disk and become visible to ongoing queries; and a durable stage, where they are permanently added to the index and will survive restart.” “Because moving from the “buffered” layer to the “searchable“ layer is expensive, we are forced to make a tradeoff between the visibility of our data and indexing throughput. i.e. We can make our writes visible to ongoing searches more quickly at the cost of slower indexing throughput, or we can maximize indexing throughput with longer delays before write are visible to searches.”
  • #28: In DSE 4.7, we released a feature called “Live Indexing”. Essentially, we’ve made indexed documents buffered in memory searchable, eliminating the need to build a separate “searchable” representation of the index and the need to make a hard decision between update availability and throughput. This might remind your of the Cassandra write path, where we have “searchable” Memtables buffered in memory that are periodically flushed to “durable” SSTables.
  • #29: “This is what it would look like if we mapped these stages to their equivalents in Solr. Notice that the soft commit process creates searchable segments, which must later be merged by Lucene in the background. Since live indexing bypasses this second level, we can accumulate larger segments before flushing to disk, and this reduces the cost of the segment merges that occur in the background.”
  • #30: “On the query side, we’ve implemented our own distributed search, informed by the topology of the cluster that Cassandra makes available to us. Here we have a 4-node cluster with a replication factor of 2. Our first step is to determine the set of nodes that optimally covers the ring, in this case, the tokens from 0 -> 1000. We then scatter the query to those nodes, find the IDs for matching documents, and read the documents themselves, which are stored only in Cassandra. Notice here that, to minimize fan-out, we only contact node 3, not 4 + 2 to cover ranges 0 -> 250 and 250 -> 500.”
  • #31: “When we need to chose between replicas of a particular token, we do our best to minimize fan-out, to cover the entire dataset optimally. When multiple nodes could be optimal selections, we look more closely at the health and activity of those nodes. In this example we have a 5-node cluster with a replication factor of 2 and index shards A-E. We’ll denote health here by color, with green being health, red being unhealthy, and yellow in the middle. If we need to cover shard B, we can query either node 2 or node 3, but we’ll pick node 2, because it’s healthier.”
  • #32: “However, node health is not the only criterion we use for selection. If node 2 is healthy, but is also in the middle of an expensive operation, let’s say, rebuilding its search index, we’ll want to choose node 3, since node 2 is not potentially both out of date and not able to devote as many resources to handling incoming queries.”
  • #34: “Here we have a healthy 4-node cluster with a replication factor of 2 and 4 index shards. If node 1 coordinates our request, it only needs to contact itself and node 3 to cover all 4 of the shards A-D…”
  • #35: “…but then node 3 fails. It could have been a disk failure or a network issue…”
  • #36: “…but it was probably because you let this guy near it.”
  • #37: “In any case, we still need to cover shards B and C, but node 3 was the only node that contained both of them, so we’ll need to contact nodes 2 and 4.”
  • #38: “To this point, I’ve talked about search in a fairly isolated way, but in the context of a larger platform, there are opportunities to step outside that.”
  • #39: “One example is the integration we released in DSE 4.7 with Spark - a component of DSE Analytics. There are cases where pushing a search query through a Spark job can meaningfully cut down on the size of the RDD Spark presents for analysis. In this example, we’re filtering every Wikipedia article that contains the word ‘dog’ using search, avoiding some unnecessary filtering after we build the RDD.”
  • #40: “Well that wraps it up for me. If you’d like to dig deeper into any of the topics I covered here, or you’d like to try DSE out for yourself, please visit docs.datastax.com. Thank you all so much for coming, and enjoy the rest of your Summit!”