DataStax: An Introduction to DataStax Enterprise Search

Oct 2, 2015Download as PPTX, PDF1 like1,296 views

1) Why We Built DSE Search 2) Basics of the Read and Write Paths 3) Fault-tolerance and Adaptive Routing 4) Analytics with Search and Spark 5) Live Indexing

An Introduction to DSE Search
Caleb Rackliffe
Software Engineer
caleb.rackliffe@datastax.com
@calebrackliffe

4
SELECT * FROM customers WHERE country LIKE '%land%';

Why not just create your own secondary index
implementation that supports wildcard queries?

DataStax: An Introduction to DataStax Enterprise Search

10
Application
DataStax Driver Solr Client

12
Application
DataStax Driver Solr Client
Consistency
Cost
Complexity

14
partitioning
multi-DC
replication
geospatial
wildcards
monitoring
C* field type support (UDT, Tuple, collections)
security
live indexing
sorting
faceting
fault-tolerant distributed search
caching
text analysis
grouping
automatic index updates
JVM
CQL
repair

15
Application
DataStax Driver Solr Client
Consistency
Complexity
Cost

$Creating a Solr Core bash$ dse cassandra -s cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'Solr':1}; cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>); bash$ dsetool create_core test.user generateResources=true Start a node… Create a table… Create the core…$

bash$ dsetool get_core_schema test.user
<?xml version="1.0" encoding="UTF-8" standalone=“no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="text">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.StrField" name="string"/>
</types>
<fields>
<field indexed="true" name="username" stored="true" type="string"/>
<field indexed="true" name="fullname" stored="true" type="text"/>
<dynamicField indexed="true" name="address_*" stored="true" type="string"/>
</fields>
<uniqueKey>fullname</uniqueKey>
</schema>
The Schema

$Insert Rows (…and Index Documents) cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'}); cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'}); cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'}); …and that’s it. No ETL. No writing to a second datastore.$

$Wildcards cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address -----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'} (2 rows)$

$Sorting and Limits cqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address -----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'} (3 rows) cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address -----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} (3 rows)$

$Faceting cqlsh:test> SELECT * FROM user WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}'; facet_fields -------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}} (1 rows)$

Partition Restrictions
cqlsh:test> CREATE TABLE event(sensor_id bigint,
recording_time timestamp,
description text,
PRIMARY KEY(sensor_id, recording_time));
…
cqlsh:test> SELECT recording_time, description
FROM test.event
WHERE sensor_id = 2314234432
AND solr_query=‘description:unremarkable’;

26
Buffered
Searchable
Durable
Memory
Disk

27
Buffered
Searchable
Durable
Memory
Disk

28
RAMBuffer
Segment
Segment
Memory
Disk
Segment Segment
Buffered
Searchable
Durable
Soft Commit
Hard Commit

Replica Selection
A
A
RF=2
shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy

Failover: Phase 1
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Failover: Phase 2
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Failover: Phase 3
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Search + Analytics: Explicit Predicate Pushdown
bash$ dse spark
scala> val table = sc.cassandraTable("wiki","solr")
scala> val result = table.select("id","title")
.where(“solr_query=‘body:dog'")
.collect

http://docs.datastax.com

Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.

Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy

This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.

Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy

Cassandra EU - Data model on firePatrick McFadin

Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?

Cassandra and Spark datastaxjp

This document discusses Apache Spark and Cassandra. It provides an overview of Cassandra as a shared-nothing, masterless, peer-to-peer database with great scaling. It then discusses how Spark can be used to analyze large amounts of data stored in Cassandra in parallel across a cluster. The Spark Cassandra connector allows Spark to create partitions that align with the token ranges in Cassandra, enabling efficient distributed queries across the cluster.

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax

With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works. About the Speaker Eric Lubow CTO, SimpleReach Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax

We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector. In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent. About the Speakers Matthias Niehoff IT-Consultant, codecentric AG works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups. Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.

How We Used Cassandra/Solr to Build Real-Time Analytics PlatformDataStax Academy

This session will discuss how Cassandra/Solr can be used to create real-time analytics platform – jKool. jKool provides an in-memory analysis of time-series data, automatically performing sequencing, correlation, grouping, enriching, synchronizing, computing, querying and displaying data streams. The session will discuss architecture, challenges and approaches taken to create a real-time analytics platform on top of open source big data analytics platforms: Cassandra, Solr, Kafka & Spark.

Cassandra Community Webinar: Apache Cassandra InternalsDataStax

Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation. Speaker: Aaron Morton, Apache Cassandra Committer Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

Cassandra 2.0 and timeseriesPatrick McFadin

Successful Architectures for Fast DataPatrick McFadin

Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered: Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind. Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application. Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax

Making sure your Data Model will work on the production cluster after 6 months as well as it does on your laptop is an important skill. It's one that we use every day with our clients at The Last Pickle, and one that relies on tools like the cassandra-stress. Knowing how the data model will perform under stress once it has been loaded with data can prevent expensive re-writes late in the project. In this talk Christopher Batey, Consultant at The Last Pickle, will shed some light on how to use the cassandra-stress tool to test your own schema, graph the results and even how to extend the tool for your own use cases. While this may be called premature optimisation for a RDBS, a successful Cassandra project depends on it's data model. About the Speaker Christopher Batey Consultant / Software Engineer, The Last Pickle Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info

Real data models of silicon valleyPatrick McFadin

Cassandra 2.0 better, faster, strongerPatrick McFadin

Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.

Advanced Apache Cassandra Operations with JMXzznate

Nodetool is a command line interface for managing a Cassandra node. It provides commands for node administration, cluster inspection, table operations and more. The nodetool info command displays node-specific information such as status, load, memory usage and cache details. The nodetool compactionstats command shows compaction status including active tasks and progress. The nodetool tablestats command displays statistics for a specific table including read/write counts, space usage, cache usage and latency.

Time series with apache cassandra strataPatrick McFadin

Cassandra Community Webinar | In Case of Emergency Break GlassDataStax

The design of Apache Cassandra allows applications to provide constant uptime. Peer-to-Peer technology ensures there are no single points of failure, and the Consistency guarantees allow applications to function correctly while some nodes are down. There is also a wealth of information provided by the JMX API and the system log. All of this means that when things go wrong you have the time, information and platform to resolve them without downtime. This presentation will cover some of the common, and not so common, performance issues, failures and management tasks observed in running clusters. Aaron will discuss how to gather information and how to act on it. Operators, Developers and Managers will all benefit from this exposition of Cassandra in the wild.

Cassandra Fundamentals - C* 2.0Russell Spitzer

- Apache Cassandra is a linearly scalable and fault tolerant NoSQL database that increases throughput linearly with additional machines - It is an AP system that is eventually consistent according to the CAP theorem, sacrificing consistency in favor of availability and partition tolerance - Cassandra uses replication and consistency levels to control fault tolerance at the server and client levels respectively - Its data model and use of SSTables allows for fast writes and queries along clustering columns

Advanced data modeling with apache cassandraPatrick McFadin

New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd

Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin

Cassandra 3.0 advanced previewPatrick McFadin

This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!

Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.

This talk was given during Lucene Revolution 2017. They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases? In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.

What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax

Have you ever wondered what is in all of those SSTable files and how it helps Cassandra find and manage your data? If you go to the Datastax website they will give you a high level explanation of what is in each file. In this talk we will go much deeper explaining each file and walking through a dump of its contents. We will also explore the differences between Cassandra 2.1 and 3.4. About the Speaker John Schulz Prinicipal Consultant, The Pythian Group John has 40 of years experience working with data. Data in files and in Databases from flat files through ISAM to relational databases and most recently NoSQL. For the last 15 he's worked on a variety of Open source technologies including MySQL, PostgreSQL, Cassandra, Riak, Hadoop and Hbase. He has been working with Cassandra since 2010. For the last eighteen months he has been working for The Pythian Group to help their customers improve their existing databases and select new ones.

Owning time series with team apache Strata San Jose 2015Patrick McFadin

Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.

Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit

This document discusses how the Spark Cassandra Connector optimizes for data locality when performing analytics on Cassandra data using Spark. It does this by using the partition keys and token ranges to create Spark partitions that correspond to the data distribution across the Cassandra nodes, allowing work to be done locally to each data node without moving data across the network. This improves performance and avoids the costs of data shuffling.

Cassandra 3.0 AwesomenessJon Haddad

This document summarizes new features in Cassandra 3.0, including user defined functions, improved garbage collection, hints management, materialized views, and a new storage engine. User defined functions allow running custom Java or JavaScript functions on Cassandra data. The G1 garbage collector replaces older collectors for better performance and predictability. Hints are now written to files instead of using Cassandra as a queue. Materialized views automatically create and maintain secondary indexes. The new storage engine reduces data duplication and wasted space.

DataSource V2 and Cassandra – A Whole New WorldDatabricks

Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie

Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe

More Related Content

What's hot (20)

Cassandra Community Webinar: Apache Cassandra InternalsDataStax

Cassandra 2.0 and timeseriesPatrick McFadin

Successful Architectures for Fast DataPatrick McFadin

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax

Real data models of silicon valleyPatrick McFadin

Cassandra 2.0 better, faster, strongerPatrick McFadin

Advanced Apache Cassandra Operations with JMXzznate

Time series with apache cassandra strataPatrick McFadin

Cassandra Community Webinar | In Case of Emergency Break GlassDataStax

Cassandra Fundamentals - C* 2.0Russell Spitzer

Advanced data modeling with apache cassandraPatrick McFadin

New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd

Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin

Cassandra 3.0 advanced previewPatrick McFadin

Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.

What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax

Owning time series with team apache Strata San Jose 2015Patrick McFadin

Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit

Cassandra 3.0 AwesomenessJon Haddad

DataSource V2 and Cassandra – A Whole New WorldDatabricks

Cassandra Community Webinar: Apache Cassandra InternalsDataStax

Cassandra 2.0 and timeseriesPatrick McFadin

Successful Architectures for Fast DataPatrick McFadin

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax

Real data models of silicon valleyPatrick McFadin

Cassandra 2.0 better, faster, strongerPatrick McFadin

Advanced Apache Cassandra Operations with JMXzznate

Time series with apache cassandra strataPatrick McFadin

Cassandra Community Webinar | In Case of Emergency Break GlassDataStax

Cassandra Fundamentals - C* 2.0Russell Spitzer

Advanced data modeling with apache cassandraPatrick McFadin

New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd

Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin

Cassandra 3.0 advanced previewPatrick McFadin

Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.

What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax

Owning time series with team apache Strata San Jose 2015Patrick McFadin

Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit

Cassandra 3.0 AwesomenessJon Haddad

DataSource V2 and Cassandra – A Whole New WorldDatabricks

Viewers also liked (20)

Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie

Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe

Understanding DSE Search by Matt StumpDataStax

Apache Cassandra Developer Training Slide DeckDataStax Academy

DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax

Join us as we talk about the current state as well as the future of DSE Search. Nick Panahi will discuss high level architecture while Ariel will dive deep into some of the integration. We'll talk about future features, improvements and enhancements as well as some of the challenges of our custom integration and what that means for scale and availability. About the Speakers Nick Panahi Sr. Product Manager, DSE Search, DataStax I am the product manager for DSE search, prior to product management, I was a solution architect for DataStax. Ariel Weisberg Software Engineer, DataStax Ariel is currently a Cassandra contributor and Datastax employee and former lead architect for VoltDB. Ariel aspires to be or considers himself a shared-nothing database expert depending on the time of day and whether Benedict is in the room, and has a passion for things measured in nanoseconds. Ariel has presented at events like Strangeloop, PAX Dev, OpenSQL camp Boston, NYC MySQL Meetup, and Boston New Technology Group meetup.

Copa menstrual y esponjas vaginalesTupper Sex Andalucia

Descripción: Una alternativa para la mujer, sana, higiénica y económica que otros productos menstruales. Protección invisible que no afecta en la vida cotidiana y que incluso se puede usar por la noche. Su capacidad para recoger el flujo menstrual es mayor que la de los productos convencionales con diferentes tamaños. Características: La copa menstrual Ave es innovadora, cómoda, y ecológica. La solución ideal para su higiene íntima durante la menstruación. Elaborada en silicona 100% platinium, su tolerancia es óptima, permitiendo una utilización prolongada máxima de 12 horas. No produce alergias, ni ocasiona infecciones fúngicas, ni uretritis. No altera negativamente al medio natural de la vagina. Ideal para la práctica del deporte. Cómoda de día, de noche o a la hora de viajar. Incluye una práctica bolsa para después de su uso Libro de instrucciones en su interior. Testado dermatológicamente en mujeres Europeas.

Servidor web lampyaser6700

Magonia getxo blogMikel Agirregabiria

Scala for rubyistsMichel Perez

This document discusses Scala and compares it to other languages like Ruby and Java. It covers Scala's features like functional programming, object-oriented programming, pattern matching, implicit parameters, monads, actors and testing frameworks. Pros and cons are listed for Scala and other languages like Clojure and Erlang. The document encourages learning Scala through online courses and exercises.

Accesus - Catalogo andamio para vias ferroviariasAccesus Plataformas Suspendidas

Tams 2012Lance McConkey

This document provides suggestions for prewriting activities to engage middle school students in writing. It discusses quick-writes, photo analysis, interviews, experiences, music and more as potential prewriting strategies. Common strategies like freewriting and brainstorming are explained. Procedures for teaching prewriting like modeling and providing examples are outlined. Specific methods like the four-square prewriting model and brags/whines imagined monologues are presented. Finally, using student-created children's books as a prewriting activity is proposed.

Adquirir una propiedad en españa en 7 pasosMariscal Abogados | International Law Firm in Spain

2013 brand id&printCarl H. Bradford III

This document provides branding and identity materials for "In the Mix Magazine". It includes the magazine's masthead, headers, sectionals and other branding elements. It also includes brand extension examples like media kits, print collateral, event signage and product advertisements. The document provides these materials to guide consistent branding efforts across In the Mix Magazine's applications. It establishes file formats and contact information for questions about using the branding elements.

Pairform cci formproChristian Colin

los bracekts pv996073774

Este documento describe los diferentes tipos de brackets utilizados en ortodoncia. Explica que los brackets son elementos que los odontólogos recomiendan para corregir malformaciones dentales y que existen variaciones como brackets metálicos, linguales, de zafiro y de porcelana. También resume que aunque el tipo de bracket depende de la elección del paciente, todos cumplen el objetivo de alinear los dientes para lograr una sonrisa deseada.

9Guia1Wilson

Una modesta proposiciónShanie Weissman

Este documento propone una solución poco convencional al alto costo de vida en Tel Aviv: trasladar a todos los residentes de Tel Aviv a la ciudad más barata de Haifa, y simultáneamente trasladar a todos los residentes de Haifa a Tel Aviv. De esta manera, los residentes de Tel Aviv podrían disfrutar de una vida más asequible en Haifa, mientras que los precios de la vivienda en ambas ciudades se estabilizarían. El documento sugiere que esta solución radical podría funcionar ya que los seres

Dossier ii torneo once caballeros c.f.Jacobo Vázquez Mariño

Presentacion corporativa sevenminds agosto2012 (1)Rafael Lopez Rodriguez

Este documento presenta la empresa Sevenminds y su solución de captura de datos en tiempo real. Sevenminds fue fundada en 2006 en Barcelona y ahora tiene su sede principal en Colombia. Su solución permite a las empresas recolectar datos de campo usando dispositivos móviles u ordenadores y generar informes en tiempo real para tomar mejores decisiones. La solución beneficia procesos como la productividad, control de calidad y aumento de ventas.

Project Management Diploma with InstructorsCisco

The document discusses a Project Management Diploma program offered by the American University of Beirut. The program aims to prepare project practitioners through applying best practices to improve organizational delivery capabilities. It is composed of 11 components divided into project management certification workshops and diploma courses focusing on areas like risk management, program management and engineering project controls. The program provides international certificates and is taught by experienced instructors and subject matter experts in project management.

Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie

Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe

Understanding DSE Search by Matt StumpDataStax

Apache Cassandra Developer Training Slide DeckDataStax Academy

DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax

Copa menstrual y esponjas vaginalesTupper Sex Andalucia

Servidor web lampyaser6700

Magonia getxo blogMikel Agirregabiria

Scala for rubyistsMichel Perez

Accesus - Catalogo andamio para vias ferroviariasAccesus Plataformas Suspendidas

Tams 2012Lance McConkey

Adquirir una propiedad en españa en 7 pasosMariscal Abogados | International Law Firm in Spain

2013 brand id&printCarl H. Bradford III

Pairform cci formproChristian Colin

los bracekts pv996073774

9Guia1Wilson

Una modesta proposiciónShanie Weissman

Dossier ii torneo once caballeros c.f.Jacobo Vázquez Mariño

Presentacion corporativa sevenminds agosto2012 (1)Rafael Lopez Rodriguez

Project Management Diploma with InstructorsCisco

Similar to DataStax: An Introduction to DataStax Enterprise Search (20)

Distributed Queries in IDS: New features.Keshav Murthy

Proxysql shardingMarco Tusa

Advance Sharding Solution with ProxySQL ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. Historically MySQL lacks in sharding capability. This significant missing part had often cause developer do implement sharding at application level, or DBA/SA to move on to another solution. ProxySQL comes with an elegant and simple solution that allow us to implement sharding capability with MySQL without the need to perform significant, or at all, changes in the code. This brief presentation will illustrate how to successfully configure and use ProxySQL to perform sharding, from very simple approach based on connection user/ip/port, to complicate ones that see the need to read values inside queries.

Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl

11thingsabout11g 12659705398222 Phpapp01Karam Abuataya

The document discusses several new features and enhancements in Oracle Database 11g Release 1. Key points include: 1) Encrypted tablespaces allow full encryption of data while maintaining functionality like indexing and foreign keys. 2) New caching capabilities improve performance by caching more results and metadata to avoid repeat work. 3) Standby databases have been enhanced and can now be used for more active purposes like development, testing, reporting and backups while still providing zero data loss protection.

11 Things About11gfcamachob

The document discusses new features in Oracle Database 11g Release 1. Key points include: 1. Encrypted tablespaces allow encryption of data at the tablespace level while still supporting indexing and queries. 2. New caching capabilities improve performance by caching more results in memory, such as function results and query results. 3. Standby databases have enhanced capabilities and can now be used for more active purposes like development, testing and reporting for increased usability and value.

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico

HighLoad++ 2017 Зал «Кейптаун», 8 ноября, 16:00 Тезисы: http://www.highload.ru/2017/abstracts/3115.html During this session we will cover the last development in ProxySQL to support regular expressions (RE2 and PCRE) and how we can use this strong technique in correlation with ProxySQL's query rules to anonymize live data quickly and transparently. We will explain the mechanism and how to generate these rules quickly. We show live demo with all challenges we got from the Community and we finish the session by an interactive brainstorm testing queries from the audience.

How To Control IO Usage using Resource ManagerAlireza Kamrani

Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...DataStax

Spark is an execution framework designed to operate on distributed systems like Cassandra. It's a handy tool for many things, including ETL (extract, transform, and load) jobs. In this session, let me share with you some tips and tricks that I have learned through experience. I'm no oracle, but I can guarantee these tips will get you well down the path of pulling your relational data into Cassandra. About the Speaker Jim Hatcher Principal Architect, IHS Markit Jim Hatcher is a software architect with a passion for data. He has spent most of his 20 year career working with relational databases, but he has been working with Big Data technologies such as Cassandra, Solr, and Spark for the last several years. He has supported systems with very large databases at companies like First Data, CyberSource, and Western Union. He is currently working at IHS, supporting an Electronic Parts Database which tracks half a billion electronic parts using Cassandra.

Using Spark to Load Oracle Data into CassandraJim Hatcher

The document discusses lessons learned from using Spark to load data from Oracle into Cassandra. It describes problems encountered with Spark SQL handling Oracle NUMBER and timeuuid fields incorrectly. It also discusses issues generating IDs across RDDs and limitations on returning RDDs of tuples over 22 items. The resources section provides references for learning more about Spark, Scala, and using Spark with Cassandra.

Wait Events 10gsagai

The document discusses 12 enhancements to wait event monitoring and analysis in Oracle 10g, including more descriptive wait event names, new columns in views like v$session and v$sqlarea, and new views such as v$event_histogram and v$session_wait_history that provide additional insight. It focuses on improvements that help DBAs more easily understand what sessions are waiting for and identify potential performance bottlenecks through better organized wait event classification and more granular wait time statistics.

Meetup cassandra sfo_jdbczznate

NetDevOps 202: Life After ConfigurationCumulus Networks

This webinar presentation from July 2017 talks about the challenges that network operators and IT folks face after the network is configured. How do you handle changes after the initial configuration? What about rolling in new racks or DCs? Learn how DevOps can help with validation, troubleshooting, and life cycle management. Full recording of webinar can be accessed at http://go.cumulusnetworks.com/l/32472/2017-05-04/91sy7b

SQLMAP Tool Usage - A Heads UpMindfire Solutions

07 application security fundamentals - part 2 - security mechanisms - data ...appsec

This document discusses data validation concepts and best practices. It covers four core concepts: 1) whitelisting and blacklisting known good/bad values, 2) validating data length and format, 3) validating data before use in SQL, eval functions, or writing to buffers, and 4) encoding output to prevent XSS. Real world examples demonstrate how failing to validate data can enable SQL injection, XSS attacks, buffer overflows, and more. The document advocates restricting input length, whitelisting valid characters, encoding output, and using safe functions like strncpy() to avoid security issues.

Meetup cassandra for_java_cqlzznate

This document provides an overview of Apache Cassandra including: - What Cassandra is and how it differs from an RDBMS by not supporting joins, having an optional schema, and being transactionless. - Cassandra's data model using keyspaces, column families, and static vs dynamic column families. - How to integrate Cassandra with Java applications using the Hector client and ColumnFamilyTemplate for querying, updating, and deleting data. - Additional topics covered include the CAP theorem, data storage and compaction, and using CQL via JDBC.

Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASInvenire Aude

PresentationDimitris Stripelis

GumGum: Multi-Region Cassandra in AWSDataStax Academy

GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe. This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it. How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?

Dbms lab ManualVivek Kumar Sinha

The document provides instructions on how to create tables, insert data, and write queries for a database with tables for students, library memberships, books, and book issue records. It includes examples of creating the tables with primary and foreign keys, inserting sample data, and queries to list student names and issued books, count books issued per student, and create views of issue records and daily issues.

Updates from Cassandra Summit 2016 & SASI IndexesJim Hatcher

Distributed Queries in IDS: New features.Keshav Murthy

Proxysql shardingMarco Tusa

Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl

11thingsabout11g 12659705398222 Phpapp01Karam Abuataya

11 Things About11gfcamachob

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico

How To Control IO Usage using Resource ManagerAlireza Kamrani

Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...DataStax

Using Spark to Load Oracle Data into CassandraJim Hatcher

Wait Events 10gsagai

Meetup cassandra sfo_jdbczznate

NetDevOps 202: Life After ConfigurationCumulus Networks

SQLMAP Tool Usage - A Heads UpMindfire Solutions

07 application security fundamentals - part 2 - security mechanisms - data ...appsec

Meetup cassandra for_java_cqlzznate

Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASInvenire Aude

PresentationDimitris Stripelis

GumGum: Multi-Region Cassandra in AWSDataStax Academy

Dbms lab ManualVivek Kumar Sinha

Updates from Cassandra Summit 2016 & SASI IndexesJim Hatcher

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy

Introduction to DataStax Enterprise Graph DatabaseDataStax Academy

DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data. Download DataStax Enterprise: Academy.DataStax.com/Download Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph

Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy

DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture. Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html

Cassandra on Docker @ Walmart LabsDataStax Academy

This document discusses using Docker containers to run Cassandra clusters at Walmart. It proposes transforming existing Cassandra hardware into containers to better utilize unused compute. It also suggests building new Cassandra clusters in containers and migrating old clusters to double capacity on existing hardware and save costs. Benchmark results show Docker containers outperforming virtual machines on OpenStack and Azure in terms of reads, writes, throughput and latency for an in-house application.

Cassandra 3.0 Data ModelingDataStax Academy

The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.

Cassandra Adoption on Cisco UCS & Open stackDataStax Academy

Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.

Data Modeling for Apache CassandraDataStax Academy

Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.

Coursera Cassandra DriverDataStax Academy

Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice. In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.

Production Ready CassandraDataStax Academy

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy

This document summarizes three presentations from a Cassandra Meetup: 1. Jason Cacciatore discussed monitoring Cassandra health at scale across hundreds of clusters and thousands of nodes using the reactive stream processing system Mantis. 2. Minh Do explained how Cassandra uses the gossip protocol for tasks like discovering cluster topology and sharing load information. Gossip also has limitations and race conditions that can cause problems. 3. Chris Kalantzis presented Cassandra Tickler, an open source tool he created to help repair operations that get stuck by running lightweight consistency checks on an old Cassandra version or a node with space issues.

Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy

The document discusses Cassandra's use by Sony Network Entertainment to handle the large amount of user and transaction data from the growing PlayStation Network. It describes how the relational database they previously used did not scale sufficiently, so they transitioned to using Cassandra in a denormalized and customized way. Some of the techniques discussed include caching user data locally on application servers, secondary indexing, and using a real-time indexer to enable personalized search by friends.

Standing Up Your First ClusterDataStax Academy

This document provides guidance on setting up server monitoring, application metrics, log aggregation, time synchronization, replication strategies, and garbage collection for a Cassandra cluster. Key recommendations include: 1. Use monitoring tools like Monit, Munin, Nagios, or OpsCenter to monitor processes, disk usage, and system performance. Aggregate all logs centrally with tools like Splunk, Logstash, or Greylog. 2. Install NTP to synchronize server times which are critical for consistency. 3. Use the NetworkTopologyStrategy replication strategy and avoid SimpleStrategy for production. 4. Avoid shared storage and focus on low latency and high throughput using multiple local disks. 5. Understand

Real Time Analytics with DseDataStax Academy

This document discusses real time analytics using Spark and Spark Streaming. It provides an introduction to Spark and highlights limitations of Hadoop for real-time analytics. It then describes Spark's advantages like in-memory processing and rich APIs. The document discusses Spark Streaming and the Spark Cassandra Connector. It also introduces DataStax Enterprise which integrates Spark, Cassandra and Solr to allow real-time analytics without separate clusters. Examples of streaming use cases and demos are provided.

Introduction to Data Modeling with Apache CassandraDataStax Academy

This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.

Cassandra Core ConceptsDataStax Academy

The document discusses different data storage options for small, medium, and large datasets. It argues that relational databases do not scale well for large datasets due to limitations with replication, normalization, sharding, and high availability. The document then introduces Apache Cassandra as a fast, distributed, highly available, and linearly scalable database that addresses these limitations through its use of a hash ring architecture and tunable consistency levels. It describes Cassandra's key features including replication, compaction, and multi-datacenter support.

Bad Habits Die Hard DataStax Academy

The document discusses common bad habits that can occur when working with Apache Cassandra and provides recommendations to avoid them. Specifically, it addresses issues like sliding back into a relational mindset when the data model is different, improperly benchmarking Cassandra systems, having slow client performance, and neglecting important operations tasks. The presentation provides guidance on how to approach data modeling, querying, benchmarking, driver usage, and operations management in a Cassandra-oriented way.

Advanced Data Modeling with Apache CassandraDataStax Academy

This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.

Advanced CassandraDataStax Academy

The document discusses best practices for using Apache Cassandra, including: - Topology considerations like replication strategies and snitches - Booting new datacenters and replacing nodes - Security techniques like authentication, authorization, and SSL encryption - Using prepared statements for efficiency - Asynchronous execution for request pipelining - Batch statements and their appropriate uses - Improving performance through techniques like the new row cache

Apache Cassandra and DriversDataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy

Introduction to DataStax Enterprise Graph DatabaseDataStax Academy

Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy

Cassandra on Docker @ Walmart LabsDataStax Academy

Cassandra 3.0 Data ModelingDataStax Academy

Cassandra Adoption on Cisco UCS & Open stackDataStax Academy

Data Modeling for Apache CassandraDataStax Academy

Coursera Cassandra DriverDataStax Academy

Production Ready CassandraDataStax Academy

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy

Standing Up Your First ClusterDataStax Academy

Real Time Analytics with DseDataStax Academy

Introduction to Data Modeling with Apache CassandraDataStax Academy

Cassandra Core ConceptsDataStax Academy

Bad Habits Die Hard DataStax Academy

Advanced Data Modeling with Apache CassandraDataStax Academy

Advanced CassandraDataStax Academy

Apache Cassandra and DriversDataStax Academy

Recently uploaded (20)

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/ HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar. Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten. In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich - Zugriff auf die Konsole - Auffinden und Interpretieren von Protokolldateien - Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS) - Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien - Nutzung der Client Clocking-Funktion

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In Francechb3

Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada

Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next. Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/ Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/ HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client. Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience. In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including - Accessing the console - Locating and interpreting log files - Accessing the data folder within the browser’s cache (using OPFS) - Understand the difference between single- and multi-user scenarios - Utilizing Client Clocking

Into The Box Conference Keynote Day 1 (ITB2025)Ortus Solutions, Corp

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark? At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍 Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

Heap, Types of Heap, Insertion and DeletionJaydeep Kale

Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan

This is a Quick Research Guide (QRG). QRGs include the following: - A brief, high-level overview of the QRG topic. - A milestone timeline for the QRG topic. - Links to various free online resource materials to provide a deeper dive into the QRG topic. - Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic. QRGs planned for the series: - Artificial Intelligence QRG - Quantum Computing QRG - Big Data Analytics QRG - Spacecraft Guidance, Navigation & Control QRG (coming 2026) - UK Home Computing & The Birth of ARM QRG (coming 2027) Any questions or comments? - Please contact Arthur Morgan at [email protected]. 100% human made.

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Mobile App Development Company in Saudi ArabiaSteve Jonas

EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex

Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how: • Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules. • Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance. • Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity. • Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications. • Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market. With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications. Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family

Semantic Cultivators : The Critical Future Role to Enable AIartmondano

Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In Francechb3

Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Into The Box Conference Keynote Day 1 (ITB2025)Ortus Solutions, Corp

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

Heap, Types of Heap, Insertion and DeletionJaydeep Kale

Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Mobile App Development Company in Saudi ArabiaSteve Jonas

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex

Semantic Cultivators : The Critical Future Role to Enable AIartmondano

Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo

DataStax: An Introduction to DataStax Enterprise Search

1. An Introduction to DSE Search Caleb Rackliffe Software Engineer [email protected] @calebrackliffe

2. What problem were we trying to solve?

3. 3 Application DataStax Driver

4. 4 SELECT * FROM customers WHERE country LIKE '%land%';

5. 5 What about secondary indexes?

6. Why not just create your own secondary index implementation that supports wildcard queries?

7. 7 I need full-text search!

9. Why did we build something new?

10. 10 Application DataStax Driver Solr Client

11. Polyglot Persistence!

12. 12 Application DataStax Driver Solr Client Consistency Cost Complexity

14. 14 partitioning multi-DC replication geospatial wildcards monitoring C* field type support (UDT, Tuple, collections) security live indexing sorting faceting fault-tolerant distributed search caching text analysis grouping automatic index updates JVM CQL repair

15. 15 Application DataStax Driver Solr Client Consistency Complexity Cost

16. How about some examples?

17. Creating a Solr Core bash$ dse cassandra -s cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'Solr':1}; cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>); bash$ dsetool create_core test.user generateResources=true Start a node… Create a table… Create the core…

18. bash$ dsetool get_core_schema test.user <?xml version="1.0" encoding="UTF-8" standalone=“no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="text"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="string"/> </types> <fields> <field indexed="true" name="username" stored="true" type="string"/> <field indexed="true" name="fullname" stored="true" type="text"/> <dynamicField indexed="true" name="address_*" stored="true" type="string"/> </fields> <uniqueKey>fullname</uniqueKey> </schema> The Schema

19. Insert Rows (…and Index Documents) cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'}); cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'}); cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'}); …and that’s it. No ETL. No writing to a second datastore.

20. Wildcards cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address -----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'} (2 rows)

21. Sorting and Limits cqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address -----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'} (3 rows) cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address -----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} (3 rows)

22. Faceting cqlsh:test> SELECT * FROM user WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}'; facet_fields -------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}} (1 rows)

23. Partition Restrictions cqlsh:test> CREATE TABLE event(sensor_id bigint, recording_time timestamp, description text, PRIMARY KEY(sensor_id, recording_time)); … cqlsh:test> SELECT recording_time, description FROM test.event WHERE sensor_id = 2314234432 AND solr_query=‘description:unremarkable’;

24. What do the internals look like?

25. Indexing

26. 26 Buffered Searchable Durable Memory Disk

27. 27 Buffered Searchable Durable Memory Disk

28. 28 RAMBuffer Segment Segment Memory Disk Segment Segment Buffered Searchable Durable Soft Commit Hard Commit

29. Querying

30. Replica Selection A A RF=2 shards: A-E B B CC D D E E coordinator1 2 34 5 Healthy Unhealthy

31. Replica Selection A A RF=2 shards: A-E B B CC D D E E coordinator1 2 34 5 Healthy Unhealthy

32. What happens if a shard query fails?

33. Failover: Phase 1 4 nodes RF = 2 shards: A-D no vnodes 1 2 3 4

34. Failover: Phase 2 4 nodes RF = 2 shards: A-D no vnodes 1 2 3 4

36. Failover: Phase 3 4 nodes RF = 2 shards: A-D no vnodes 1 2 3 4

37. Platform Integrations

38. Search + Analytics: Explicit Predicate Pushdown bash$ dse spark scala> val table = sc.cassandraTable("wiki","solr") scala> val result = table.select("id","title") .where(“solr_query=‘body:dog'") .collect

39. http://docs.datastax.com

Editor's Notes

#2: “Hello! My name is Caleb Rackliffe, and I’m a member of the search team at DataStax. Today I’d like to walk you through a brief (but action-packed) introduction to DataStax Enterprise Search. I’ll start with a question…”
#3: “Before we talk about what DSE Search is, let’s make sure we know why we built it.”
#4: “Here we have a small Cassandra cluster and an application sitting on top of it, using the Datastax driver. We can go a long way with CQL and proper denormalization, but what happens when we find ourselves wanting to do something as seemingly simple as…”
#5: “…this. You’ll recognize the SQL-style wildcard query, which Cassandra does not support out of the box.”
#6: Cassandra’s built in secondary indexes might seem like a solution, but they… …don’t support wildcard queries. …can perform poorly unless limited to a single partition. …can perform poorly for very high or very low cardinality fields. …may fail for a frequently updated/deleted column.
#7: “You could, but then you’d be saddled with the cost of building and maintaining that, and you’ll still end up with something that is designed for a fairly specific use-case.”
#8: “So when our search problem lacks the structure to make denormalization effective, and is beyond the capabilities of C* secondary indexes, we need to think a bit more broadly.”
#9: “Fortunately, there are technologies out there that handle full-text and other more advanced kinds of search well, and most of them, like Solr, are built on the foundation of the Apache Lucene project. ”
#10: “Well, let’s see what it would look like to use a separate, Lucene-based search cluster alongside our Cassandra cluster…”
#11: “…here we are. Our application is now sitting on top of both a Cassandra cluster and separate search cluster. Notice that we’ve added a new client to our application, specifically for search. So we’ve got Cassandra doing key-value lookups and probably some range queries…we’ve got our search cluster handling the more advanced ad-hoc queries for us.”
#12: “This is polyglot persistence at its best…right?”
#13: “Well, maybe not…and we can talk about this along 3 axes.” Complexity - The persistence layer of our application is now more complex. We have to configure two clients, write to two data stores, and, if we write to one of them asynchronously, manage a queueing solution. Consistency - Since the two data stores have no explicit knowledge of each other, we have to manage questions of consistency between them in our application. Cost - Aside from the implicit cost of complexity, we’ll also need to deal with the explicit cost of infrastructure and hardware for a separate cluster.
#14: “So if you need avoid data loss, scale your writes, and replicate your index over multiple DCs, your architecture might start to look like this lovely Rube Goldberg machine. We wanted to provide all of this in an operationally simple package…”
#15: “DSE Search is designed to address those problems. We’ve built a coherent search platform that integrates Cassandra’s distributed persistence, Lucene’s core search and indexing functionality, and the advanced features of Solr in the same JVM…and then we’ve made a number of our own enhancements, which we’ll see in the coming slides.”
#16: “So back to our architecture diagram. First, with DSE search, we can eliminate the cost associated with running a separate search cluster. We can eliminate much of the complexity at the application layer, since we don’t have to deal with two clients, and we only have to manage one write path…and with all of our data stored in Cassandra alone and collocated with the relevant shards of our search index, we’ve eliminated many of the potential issues of consistency between the two.”
#17: “We’ll go into more details on the indexing and query paths, but before we we do that, let’s run through some basic examples and get a feel for the ergonomics of our solution.”
#18: “First, we’ll startup a single node. (The -s switch here tells the node it’s going to handle a search workload.) Second, we create a table from the CQL prompt. Third, we create a Solr core over that table from dsetool…and that’s it. We’re ready to index documents. Note that we don’t have to create the Solr schema explicitly, because DSE Search creates it for us, using the CQL schema to determine its type mappings.”
#19: “Under the hood, the schema actually looks something like this, but you shouldn’t need to trouble yourself with it, unless our default type mappings aren’t quite right for you. In that case, you can just tweak the auto-generated schema and re-upload it.”
#20: “Next we insert a few rows, which will be indexed automatically for search. There is no ETL involved and no explicit writing to a second data store. We’re ready to make some queries…”
#21: “…so let’s start with a simple wildcard query. Here, we want to find everyone who’s home address starts with a U, and of course we find users in the United States and the UK.”
#22: “Sorting and Limits! In the first query, we just find all our users and sort them descending by home address. In the second query, we do the same thing except we also use the CQL LIMIT keyword to narrow our results down to just the top result by home address.”
#23: “Faceting allows us to take the results of a query, in this case a query for all documents, group them, and count the members in each group. In this example, faceting on our users’ work addresses tells us that we have one working in Spain, one at corporate headquarters, and one in the UK. This is very common in the context of a product search, where a user wants to drill into results by brand.”
#24: “What if we want to restrict our search to a specific partition? Here I have another table, one that records series of sensor events. Using a CQL partition key restriction in our WHERE clause, we can ensure that our query visits only the node that contains that partition and then filters on it once we get there. Much like our earlier usage of LIMIT, this is a case where we’re translating CQL instructions to search-specific instructions under the hood.”
#25: “Now that we have an idea of what basic usage looks like, let’s take a high-level look at what’s going on in the indexing and query internals…”
#26: “The indexing process starts with a Cassandra write. It arrives at the coordinator, is distributed to the proper replicas, and it written the commit log and Memtable, as you would expect. At this point, we create an updated Lucene document and queue it up for indexing, then we return to the coordinator and the client. Then, asynchronously, we update the index. Finally, also in the background, when a C* Memtable is flushed to disk, we also flush the corresponding index updates to disk, ensuring their durability.”
#27: “In near-real-time search systems, updated documents, once indexed, progress through 3 stages: a buffered stage, where they are just accumulated in memory; a searchable stage, where they move to disk and become visible to ongoing queries; and a durable stage, where they are permanently added to the index and will survive restart.” “Because moving from the “buffered” layer to the “searchable“ layer is expensive, we are forced to make a tradeoff between the visibility of our data and indexing throughput. i.e. We can make our writes visible to ongoing searches more quickly at the cost of slower indexing throughput, or we can maximize indexing throughput with longer delays before write are visible to searches.”
#28: In DSE 4.7, we released a feature called “Live Indexing”. Essentially, we’ve made indexed documents buffered in memory searchable, eliminating the need to build a separate “searchable” representation of the index and the need to make a hard decision between update availability and throughput. This might remind your of the Cassandra write path, where we have “searchable” Memtables buffered in memory that are periodically flushed to “durable” SSTables.
#29: “This is what it would look like if we mapped these stages to their equivalents in Solr. Notice that the soft commit process creates searchable segments, which must later be merged by Lucene in the background. Since live indexing bypasses this second level, we can accumulate larger segments before flushing to disk, and this reduces the cost of the segment merges that occur in the background.”
#30: “On the query side, we’ve implemented our own distributed search, informed by the topology of the cluster that Cassandra makes available to us. Here we have a 4-node cluster with a replication factor of 2. Our first step is to determine the set of nodes that optimally covers the ring, in this case, the tokens from 0 -> 1000. We then scatter the query to those nodes, find the IDs for matching documents, and read the documents themselves, which are stored only in Cassandra. Notice here that, to minimize fan-out, we only contact node 3, not 4 + 2 to cover ranges 0 -> 250 and 250 -> 500.”
#31: “When we need to chose between replicas of a particular token, we do our best to minimize fan-out, to cover the entire dataset optimally. When multiple nodes could be optimal selections, we look more closely at the health and activity of those nodes. In this example we have a 5-node cluster with a replication factor of 2 and index shards A-E. We’ll denote health here by color, with green being health, red being unhealthy, and yellow in the middle. If we need to cover shard B, we can query either node 2 or node 3, but we’ll pick node 2, because it’s healthier.”
#32: “However, node health is not the only criterion we use for selection. If node 2 is healthy, but is also in the middle of an expensive operation, let’s say, rebuilding its search index, we’ll want to choose node 3, since node 2 is not potentially both out of date and not able to devote as many resources to handling incoming queries.”
#34: “Here we have a healthy 4-node cluster with a replication factor of 2 and 4 index shards. If node 1 coordinates our request, it only needs to contact itself and node 3 to cover all 4 of the shards A-D…”
#35: “…but then node 3 fails. It could have been a disk failure or a network issue…”
#36: “…but it was probably because you let this guy near it.”
#37: “In any case, we still need to cover shards B and C, but node 3 was the only node that contained both of them, so we’ll need to contact nodes 2 and 4.”
#38: “To this point, I’ve talked about search in a fairly isolated way, but in the context of a larger platform, there are opportunities to step outside that.”
#39: “One example is the integration we released in DSE 4.7 with Spark - a component of DSE Analytics. There are cases where pushing a search query through a Spark job can meaningfully cut down on the size of the RDD Spark presents for analysis. In this example, we’re filtering every Wikipedia article that contains the word ‘dog’ using search, avoiding some unnecessary filtering after we build the RDD.”
#40: “Well that wraps it up for me. If you’d like to dig deeper into any of the topics I covered here, or you’d like to try DSE out for yourself, please visit docs.datastax.com. Thank you all so much for coming, and enjoy the rest of your Summit!”

DataStax: An Introduction to DataStax Enterprise Search

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to DataStax: An Introduction to DataStax Enterprise Search (20)

More from DataStax Academy (20)

Recently uploaded (20)

DataStax: An Introduction to DataStax Enterprise Search

Editor's Notes