We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
The Last Pickle: Distributed Tracing from Application to DatabaseDataStax Academy
Monitoring provides information on system performance, however tracing is necessary to understand individual request performance. Detailed query tracing has been provided by Cassandra since version 1.2 and is invaluable when diagnosing problems. Although knowing what queries to trace and why the application makes them still requires deep technical knowledge. By merging Application tracing via Zipkin and Cassandra query tracing we automate the process and make it easier to identify and resolve problems. In this talk Mick Semb Wever, Team Member at The Last Pickle, will introduce Cassandra query tracing and Zipkin. He will then propose an extension that allows clients to pass a trace identifier through to Cassandra, and a way to integrate Zipkin tracing into Cassandra. Driving all this is the desire to create one tracing view across the entire system.
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
Managing Cassandra Databases with OpenStack TroveTesora
This document summarizes OpenStack Trove, an OpenStack service for provisioning and managing databases in OpenStack clouds. It discusses what OpenStack and Trove are, how Trove integrates with other OpenStack services, and Trove's capabilities like provisioning, backup/restore, replication, clustering, and resizing for both SQL and NoSQL databases like Cassandra, MongoDB, and PostgreSQL. It also introduces Tesora as a major contributor to Trove that provides an enterprise-grade Trove platform with additional support and customization options.
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
As a leader in the financial industry, Capital One applications generate huge amounts of data that require fast and accurate handling, storage and analysis. We are transforming how we report operational data to our internal users so that they can make quick and precise business decisions to serve our customers. As part of this transformation, we are building a new Go-based data processing framework that will enable us to transfer data from multiple data stores (RDBMS, files, etc.) to a single NoSQL database - Cassandra. This new NoSQL store will act as a reporting database that will receive data on a near real-time basis and serve the data through scorecards and reports. We would like to share our experience in defining this fast data platform and the methodologies used to model financial data in Cassandra.
Data Pipelines with Spark & DataStax EnterpriseDataStax
This document discusses building data pipelines for both static and streaming data using Apache Spark and DataStax Enterprise (DSE). For static data, it recommends using optimized data storage formats, distributed and scalable technologies like Spark, interactive analysis tools like notebooks, and DSE for persistent storage. For streaming data, it recommends using scalable distributed technologies, Kafka to decouple producers and consumers, and DSE for real-time analytics and persistent storage across datacenters.
Why you need benchmarks
Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience.
You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution.
Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice.
In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment.
We will cover:
Data model impact on performance and latency
Client behavior related to database capabilities
Failover and high availability testing
Hardware selection and cluster configuration impact
We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case.
Attend this virtual workshop if you are:
Looking to minimize the cost of your database deployment
Making a database decision based on performance and scale data
Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.
Proofpoint: Fraud Detection and Security on Social MediaDataStax Academy
Social media has become the new frontier for cyber-attackers. The explosive growth of this new communications platform, combined with the potential to reach millions of people through a single post, has provided a low barrier for exploitation. In this talk, we will focus on how Cassandra is used to enable our fight against bad actors on social media. In particular, we will discuss how we use Cassandra for anomaly detection, social mob alerting, trending topics, and fraudulent classification. We will also speak about our Cassandra data models, integration with Spark Streaming, and how we use KairosDB for our time series data. Watch us don our superhero-Cassandra capes as we fight against the bad guys!
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsDataStax Academy
The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.
DataStax recently announced the general availability of DataStax Enterprise 4.7 (DSE 4.7), the leading database platform purpose-built for the performance and availability demands of web, mobile, and IOT applications. In this product launch webinar, Robin Schumacher, VP of Products, explores the wide range of enhancements in DSE 4.7 including enterprise class search, analytics, and in-memory.
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
Eliran Sinvani presented on how to shrink a datacenter footprint by 50% using workload prioritization. He discussed how OLTP and OLAP workloads have different needs and how existing solutions like multi-datacenter deployments and time-division waste resources. Workload prioritization uses CPU scheduling to divide resources dynamically based on workload priorities. It allows combining workloads without degrading performance or wasting hardware.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This document provides guidance on diagnosing problems in Cassandra production systems. It recommends first using OpsCenter to identify issues, then monitoring servers, applications, and logs. Common problems discussed include incorrect timestamps, tombstones slowing queries, not using a snitch, version mismatches, and disk space not being reclaimed. Diagnostic tools like htop, iostat, and nodetool are presented. The document also covers JVM garbage collection profiling to identify issues like early object promotion and long minor GCs slowing the system.
Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services?
In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world.
We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix.
About the Speakers
ajay upadhyay Senior Database Engineer, Netflix
Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team.
Arun Agrawal Senior Software Engineer, Netflix
Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.
mParticle's Journey to Scylla from CassandraScyllaDB
mParticle processes 50 billion monthly messages and needed a data store that provides full availability and performance. They previously used Cassandra but faced issues with high latency, complicated tuning, and backlogs of up to 20 hours. They tested Scylla and found it provided significantly lower latency and compaction backlogs with minimal tuning needed. Scylla also offered knowledgeable support. mParticle migrated their data from Cassandra to Scylla, which immediately kept up with their data loads with little to no backlog.
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
This document discusses best practices for managing Cassandra clusters based on Instaclustr's experience managing over 500 nodes and 3 million node-hours. It covers choosing the right Cassandra version, hardware configuration, cost estimation, load testing, data modeling practices, common issues like modeling errors and overload, and important monitoring techniques like logs, metrics, cfstats and histograms. Maintaining a well-designed cluster and proactively monitoring performance are keys to avoiding issues with Cassandra.
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...DataStax
Depleting water supplies coupled with increasing global demand is an environmental challenge with lasting impact on societies across the world. Join this webinar to learn how i2O Water, a pioneer in smart water management technologies, is leading the charge against a global crisis with an Internet of Things (IOT) solution built on Apache Cassandra™
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
Most Cassandra usages take advantage of its exceptional performance and ability to handle massive data sets. At PagerDuty, we use Cassandra for entirely different reasons: to reliably manage mutable application states and to maintain durability requirements even in the face of full data center outages. We achieve this by deploying Cassandra clusters with hosts in multiple WAN-separated data centers, configured with per-data center replica placement requirements, and with significant application-level support to use Cassandra as a consistent datastore. Accumulating several years of experience with this approach, we've learned to accommodate the impact of WAN network latency on Cassandra queries, how to horizontally scale while maintaining our placement invariants, why asymmetric load is experienced by nodes in different data centers, and more. This talk will go over our workload and design goals, detail the resultant Cassandra system design, and explain a number of our unintuitive operational learnings about this novel Cassandra usage paradigm.
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Muvr is a real-time personal trainer system. It must be highly available, resilient and responsive, and so it relies on heavily on Spark, Mesos, Akka, Cassandra, and Kafka—the quintuple also known as the SMACK stack. In this talk, we are going to explore the architecture of the entire muvr system, exploring, in particular, the challenges of ingesting very large volume of data, applying trained models on the data to provide real-time advice to our users, and training & evaluating new models using the collected data. We will specifically emphasize on how we have used Cassandra for consuming lots of fast incoming biometric data from devices and sensors, and how to securely access the big data sets from Cassandra in Spark to compute the models.
We will finish by showing the mechanics of deploying such a distributed application. You will get a clear understanding of how Mesos, Marathon, in conjunction with Docker, is used to build an immutable infrastructure that allows us to provide reliable service to our users and a great environment for our engineers.
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
This document discusses building event streaming architectures using Scylla and Confluent Kafka. It provides an overview of Scylla and how it can be used with Kafka at Numberly. It then discusses change data capture (CDC) in Scylla and how to stream data from Scylla to Kafka using Kafka Connect and the Scylla source connector. The Kafka Connect framework and connectors allow capturing changes from Scylla tables in Kafka topics to power downstream applications and tasks.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
The document discusses building fault tolerant Java applications using Apache Cassandra. It provides an overview of fault tolerance, Cassandra's architecture, failure scenarios, and using the Cassandra Java driver. Examples are given of modeling data in Cassandra and performing common operations like getting all events for a customer or events within a time slice.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsDataStax Academy
The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.
DataStax recently announced the general availability of DataStax Enterprise 4.7 (DSE 4.7), the leading database platform purpose-built for the performance and availability demands of web, mobile, and IOT applications. In this product launch webinar, Robin Schumacher, VP of Products, explores the wide range of enhancements in DSE 4.7 including enterprise class search, analytics, and in-memory.
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
Eliran Sinvani presented on how to shrink a datacenter footprint by 50% using workload prioritization. He discussed how OLTP and OLAP workloads have different needs and how existing solutions like multi-datacenter deployments and time-division waste resources. Workload prioritization uses CPU scheduling to divide resources dynamically based on workload priorities. It allows combining workloads without degrading performance or wasting hardware.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This document provides guidance on diagnosing problems in Cassandra production systems. It recommends first using OpsCenter to identify issues, then monitoring servers, applications, and logs. Common problems discussed include incorrect timestamps, tombstones slowing queries, not using a snitch, version mismatches, and disk space not being reclaimed. Diagnostic tools like htop, iostat, and nodetool are presented. The document also covers JVM garbage collection profiling to identify issues like early object promotion and long minor GCs slowing the system.
Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services?
In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world.
We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix.
About the Speakers
ajay upadhyay Senior Database Engineer, Netflix
Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team.
Arun Agrawal Senior Software Engineer, Netflix
Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.
mParticle's Journey to Scylla from CassandraScyllaDB
mParticle processes 50 billion monthly messages and needed a data store that provides full availability and performance. They previously used Cassandra but faced issues with high latency, complicated tuning, and backlogs of up to 20 hours. They tested Scylla and found it provided significantly lower latency and compaction backlogs with minimal tuning needed. Scylla also offered knowledgeable support. mParticle migrated their data from Cassandra to Scylla, which immediately kept up with their data loads with little to no backlog.
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
This document discusses best practices for managing Cassandra clusters based on Instaclustr's experience managing over 500 nodes and 3 million node-hours. It covers choosing the right Cassandra version, hardware configuration, cost estimation, load testing, data modeling practices, common issues like modeling errors and overload, and important monitoring techniques like logs, metrics, cfstats and histograms. Maintaining a well-designed cluster and proactively monitoring performance are keys to avoiding issues with Cassandra.
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...DataStax
Depleting water supplies coupled with increasing global demand is an environmental challenge with lasting impact on societies across the world. Join this webinar to learn how i2O Water, a pioneer in smart water management technologies, is leading the charge against a global crisis with an Internet of Things (IOT) solution built on Apache Cassandra™
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
Most Cassandra usages take advantage of its exceptional performance and ability to handle massive data sets. At PagerDuty, we use Cassandra for entirely different reasons: to reliably manage mutable application states and to maintain durability requirements even in the face of full data center outages. We achieve this by deploying Cassandra clusters with hosts in multiple WAN-separated data centers, configured with per-data center replica placement requirements, and with significant application-level support to use Cassandra as a consistent datastore. Accumulating several years of experience with this approach, we've learned to accommodate the impact of WAN network latency on Cassandra queries, how to horizontally scale while maintaining our placement invariants, why asymmetric load is experienced by nodes in different data centers, and more. This talk will go over our workload and design goals, detail the resultant Cassandra system design, and explain a number of our unintuitive operational learnings about this novel Cassandra usage paradigm.
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Muvr is a real-time personal trainer system. It must be highly available, resilient and responsive, and so it relies on heavily on Spark, Mesos, Akka, Cassandra, and Kafka—the quintuple also known as the SMACK stack. In this talk, we are going to explore the architecture of the entire muvr system, exploring, in particular, the challenges of ingesting very large volume of data, applying trained models on the data to provide real-time advice to our users, and training & evaluating new models using the collected data. We will specifically emphasize on how we have used Cassandra for consuming lots of fast incoming biometric data from devices and sensors, and how to securely access the big data sets from Cassandra in Spark to compute the models.
We will finish by showing the mechanics of deploying such a distributed application. You will get a clear understanding of how Mesos, Marathon, in conjunction with Docker, is used to build an immutable infrastructure that allows us to provide reliable service to our users and a great environment for our engineers.
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
This document discusses building event streaming architectures using Scylla and Confluent Kafka. It provides an overview of Scylla and how it can be used with Kafka at Numberly. It then discusses change data capture (CDC) in Scylla and how to stream data from Scylla to Kafka using Kafka Connect and the Scylla source connector. The Kafka Connect framework and connectors allow capturing changes from Scylla tables in Kafka topics to power downstream applications and tasks.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
The document discusses building fault tolerant Java applications using Apache Cassandra. It provides an overview of fault tolerance, Cassandra's architecture, failure scenarios, and using the Cassandra Java driver. Examples are given of modeling data in Cassandra and performing common operations like getting all events for a customer or events within a time slice.
Cassandra - A Decentralized Structured Storage SystemVarad Meru
Slides created as a part of CS 295's week 4 on NoSQL Basics.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax Academy
This document provides guidance on how to successfully implement Apache Cassandra in a production environment without issues. It recommends starting with a small, well-defined project like monitoring website events or users, rather than trying to build a large, multi-year platform. The document outlines choosing a specific pain point to address, implementing a simple proof of concept using Cassandra for tasks like event tracking, and iterating from there. It cautions against copying relational data models into Cassandra and emphasizes understanding how Cassandra works differently from SQL databases. The goal is to start small and grow capability over time rather than taking on too much at once.
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
Do you love some Cassandra, but that relational brain is still on? You aren't alone. Let's take that OLAP data model and get it OLTP. This will be an updated talk with some of the new features brought to you by Cassandra 3.0. Real techniques to translate application patterns into effective models. Common pitfalls that can slow you down and send you running back to RDBMS land. Don't do it! Finally, if you didn't get it right the first time, I'll show you how to fix that data model without any downtime. Turn a hot cup of fail into a tall glass of awesome!
Cassandra is a distributed database that provides high availability and scalability. It uses a ring topology to replicate and distribute data across multiple nodes. Cassandra sacrifices consistency in favor of availability and partition tolerance. Data is modeled using tables containing partitions and clustered rows accessed by partition and clustering keys. Writes are replicated across the ring and stored in memory and on disk for fault tolerance.
This document summarizes new features in Cassandra 3.0, including user defined functions, improved garbage collection, hints management, materialized views, and a new storage engine. User defined functions allow running custom Java or JavaScript functions on Cassandra data. The G1 garbage collector replaces older collectors for better performance and predictability. Hints are now written to files instead of using Cassandra as a queue. Materialized views automatically create and maintain secondary indexes. The new storage engine reduces data duplication and wasted space.
An introduction to core concepts in Apache Cassandra. We cover the evolution of database architecture as you try to scale a relational database to solve big data problems, and explain how Cassandra handles these problems efficiently.
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy
In this in-depth workshop you will gain hands on experience with using Spark and Cassandra inside the DataStax Enterprise Platform. The focus of the workshop will be working through data analytics exercises to understand the major developer developer considerations. You will also gain an understanding of the internals behind the integration that allow for large scale data loading and analysis. It will also review some of the major machine learning libraries in Spark as an example of data analysis.
The workshop will start with a review the basics of how Spark and Cassandra are integrated. Then we will work through a series of exercises that will show how to perform large scale Data Analytics with Spark and Cassandra. A major part of the workshop will be to understand effective data modeling techniques in Cassandra that allow for fast parallel loading of the data into Spark to perform large scale analytics on that data. The exercises will also look at how to how to use the open source Spark Notebook to run interactive data analytics with the DataStax Enterprise Platform.
The internal battle has been fought, and Cassandra is your group's NoSQL platform of choice! Hooray! But now what? Wouldn't it be great to know what NOT to do? Come to this talk to hear about some of the common Ops mistakes that new users make and what the better decision will be.
DataStax: Making Cassandra Fail (for effective testing)DataStax Academy
This document discusses testing Cassandra applications by making Cassandra fail deterministically. It introduces Stubbed Cassandra, a tool that allows priming Cassandra to respond to queries and prepared statements with different failures like timeouts, unavailability, and coordinator issues. Tests can verify application behavior and retries under failures. Stubbed Cassandra runs as a separate process and exposes REST APIs to prime failures and verify query activity, allowing integration into testing frameworks. It aims to help test edge cases and fault tolerance more effectively than existing Cassandra testing tools.
Security is often an afterthought; configured and applied at the last minute before rolling out a new system. Instaclustr has deployed Cassandra for customers with many different requirements.
From deployments in Heroku requiring total public access through to private data centres, we will walk you through securing Cassandra the right way.
Diagnosing Problems in Production (Nov 2015)Jon Haddad
Diagnosing Problems in Production involves first preparing monitoring tools like OpsCenter, server monitoring, application metrics, and log aggregation. Common issues include incorrect server times causing data inconsistencies, tombstone overhead slowing queries, not using the proper snitch, and version mismatches breaking functionality. Diagnostic tools like htop, iostat, vmstat, dstat, strace, jstack, nodetool, histograms, and query tracing help narrow down performance problems which could be due to compaction, garbage collection, or other bottlenecks.
Diagnosing Problems in Production - CassandraJon Haddad
1) The document discusses various tools for diagnosing problems in Cassandra production environments, including OpsCenter for monitoring, application metrics collection with Statsd/Graphite, and log aggregation with Splunk or Logstash.
2) Some common issues covered are incorrect server times causing data inconsistencies, tombstone overhead slowing queries, not using the proper snitch, and disk space not being reclaimed on new nodes.
3) Diagnostic tools described are htop, iostat, vmstat, dstat, strace, tcpdump, and nodetool for investigating process activity, disk usage, memory, networking, and Cassandra-specific statistics. GC profiling and query tracing are also recommended.
These are the slides from my talk at Hulu in March 2015 discussing Apache Spark & Cassandra. I cover the evolution of data from a single machine to RDBMS (MySQL is the primary example) to big data systems.
On the Spark side, I covered batch jobs, streaming, Apache Kafka, an introduction to machine learning, clustering, logistic regression and recommendations systems (collaborative filtering).
The talk was recorded and is available on youtube: https://www.youtube.com/watch?v=_gFgU3phogQ
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of how to enable search capabilities in Cassandra applications using Datastax Enterprise (DSE). It discusses how DSE allows indexing and searching of Cassandra data by integrating the Solr/Lucene search engine. Specifically, it explains that with DSE, data remains stored in Cassandra while indexes are maintained in Solr/Lucene. This provides search capabilities without requiring ETL processes to migrate data out of Cassandra. The document includes code examples of how to define a table and secondary index in Cassandra to support full-text search on tags columns using DSE.
Cake Solutions: Cassandra as event sourced journal for big data analyticsDataStax Academy
The document discusses using event sourcing, CQRS, and related technologies like Spark, Mesos, Akka, Cassandra, and Kafka to handle large amounts of data and enable analytics. It provides an overview of these techniques and technologies, and uses an exercise domain as an example to discuss preprocessing data, extracting features, training and testing models, and performing both batch and streaming analytics. The goal is to enable insights from data and create value.
Cassandra meetup slides - Oct 15 Santa Monica ColoftJon Haddad
This document summarizes Shift.com's migration from MongoDB to Cassandra. Shift is a platform that enables marketers to communicate across organizations. The initial database stack included MongoDB, but it was replaced with Cassandra for better operational benefits like easier node management, better control of data storage, and improved long-term scalability. The migration goals were zero downtime and no loss of performance. The strategy involved carefully structuring the Cassandra data model and schema to match MongoDB's performance. Benefits of Cassandra included its familiar CQL query language and improved support for features like time series data storage.
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
- Traditional relational databases do not scale well for large datasets due to limitations in replication, sharding, and consistency.
- Lessons from using relational databases for big data problems include that consistency is impractical, manual sharding is difficult, and additional components increase complexity.
- Apache Cassandra addresses these issues with a distributed architecture that sacrifices consistency for availability and scalability, automates replication and sharding, and uses a simplified design.
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
We will present our Office 365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DSE on Azure.
The presentation will feature demos on how you too can build similar applications.
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierDataStax
Want help building applications with real-time value at epic scale? How about solving your database performance and availability issues? Then, you want to hear more about DataStax Enterprise 5.0. Join this webinar to learn what’s new in DSE 5.0 ‒ the largest software release to date at DataStax. DSE 5.0 introduces multi-model support including Graph and JSON data models along with a ton of new and enhanced enterprise database capabilities.
View webinar recording here: https://youtu.be/3pfm4ntASJ0
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software
Navigating the turbulence on take-off: Setting up SharePoint on Azure IaaS th...Jason Himmelstein
The document discusses setting up SharePoint on Azure IaaS. It begins with an introduction of the speaker and their background. It then provides an overview of key Azure IaaS concepts like virtual machines, disks, availability sets, and virtual networks. The document discusses why SharePoint may be deployed on IaaS and provides examples use cases like development/testing environments and disaster recovery. It then outlines the "Jumpstart Method" for automating SharePoint deployments on Azure and provides recommendations for SharePoint, SQL Server, storage, and Active Directory configurations.
The document summarizes VisiQuate's journey migrating a client's data architecture to Azure. It describes initial architectures using Azure services like SQL Database and HDInsight that required improvements. The architecture evolved through versions 2 and 3 using Spark and Hive on HDInsight and Azure Synapse for analytics. Key lessons included performance issues, undocumented features, and differences between Spark and Hive metadata. The summary recommends considering multiple migration options and being prepared to iterate on rebuilding architectures in the cloud.
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
When you’re building the next killer mobile app, how can you ensure that your app is both stable and capable of near-instant data updates? The answer: Build a backend! Siva Katir says that there’s much more to building a backend than standing up a SQL server in your datacenter and calling it a day. Since different types of apps demand different backend services, how do you know what sort of backend you need? And, more importantly, how can you ensure that your backend scales so you can survive an explosion of users when you are featured in the app store? Siva discusses the common scenarios facing mobile app developers looking to expand beyond just the device. He’ll share best practices learned while building the PlayFab and other companies’ backends. Join Siva to learn how you can ensure that your app can scale safely and affordably into the millions of concurrent users and across multiple platforms.
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to [email protected], or tweet with #DBSurvival.
SQL Server 2016 introduces several new features for In-Memory OLTP including support for up to 2 TB of user data in memory, system-versioned tables, row-level security, and Transparent Data Encryption. The in-memory processing has also been updated to support more T-SQL functionality such as foreign keys, LOB data types, outer joins, and subqueries. The garbage collection process for removing unused memory has also been improved.
How to grow to a modern workplace in 16 steps with microsoft 365Tim Hermie ☁️
In this session we will give actual insights on how we move customers to Microsoft 365 in a +15 steps approach. From identity, to Endpoint Manager, Security Mechanisms, Migration of data.. We’ll cover the whole stack.
Sandeep Grandhi has over 6 years of experience in data warehousing and ETL development. He currently works as a Technology Analyst at Infosys where he has led several projects involving extracting data from various sources such as Salesforce, Oracle, and flat files and loading it into data warehouses. Some of the key projects he has worked on include migrating a CRM platform from STARS to Salesforce and building a compliance data repository for a bank.
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
Traditionally, data warehouse platforms have been perceived as cost prohibitive, challenging to maintain and complex to scale. The combination of Apache Spark and Spark SQL – running on AWS – provides a fast, simple, and scalable way to build a new generation of data warehouses that revolutionizes how data scientists and engineers analyze their data sets.
In this webinar you will learn how Databricks - a fully managed Spark platform hosted on AWS - integrates with variety of different AWS services, Amazon S3, Kinesis, and VPC. We’ll also show you how to build your own data warehousing platform in very short amount of time and how to integrate it with other tools such as Spark’s machine learning library and Spark streaming for real-time processing of your data.
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
You need to start moving some on-premises databases to the cloud.
- Where do you begin?
- What are your options?
- What will your job look like afterward?
-What tools can you use to manage databases in the cloud?
- How does troubleshooting database performance problems in the cloud differ from on-premise?
- How can you help manage monthly cloud costs so the effort actually is cost effective?
Moving to the cloud is not as easy as one might think. So, knowing the answers to these kinds of question will place your feet on the path to success. See how DB PowerStudio can readily assist with these efforts and questions.
The presenter, Bert Scalzo, is an Oracle ACE, blogger, author, speaker and database technology consultant. He has worked with all major relational databases, including Oracle, SQL Server, Db2, Sybase, MySQL, and PostgreSQL. Bert’s work experience includes stints as product manager for multiple-database tools, such as DBArtisan and Aqua Data Studio at IDERA. He has three decades of Oracle database experience and previously worked for both Oracle Education and Oracle Consulting. Bert holds several Oracle Masters certifications and his academic credentials include a BS, MS, and PhD in computer science, as well as an MBA.
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
Speaker: Mohammed Guller, Application Architect & Lead Developer at Glassbeam.
Learn how Cassandra can be used to build a multi-tenant solution for analyzing operational data from Internet of Complex Things (IoCT). IoCT includes complex systems such as computing, storage, networking and medical devices. In this session, we will discuss why Glassbeam migrated from a traditional RDBMS-based architecture to a Cassandra-based architecture. We will discuss the challenges with our first-generation architecture and how Cassandra helped us overcome those challenges. In addition, we will share our next-gen architecture and lessons learned.
HarishKumar Chennupati provides a curriculum vitae summarizing his professional experience and technical skills. He has over 8 years of experience in information technology as a team lead, scrum master, and senior developer working with technologies like .NET, SQL Server, SSIS, SSRS, Informatica, and QlikView. Some of his projects include applications for HP, Western Digital, and American International Assurance involving development, testing, reporting, ETL processes, and maintenance support. He is proficient in languages like C#, VB.NET, and databases like SQL Server, Oracle, and Vertica.
Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
This document discusses using Docker containers to run Cassandra clusters at Walmart. It proposes transforming existing Cassandra hardware into containers to better utilize unused compute. It also suggests building new Cassandra clusters in containers and migrating old clusters to double capacity on existing hardware and save costs. Benchmark results show Docker containers outperforming virtual machines on OpenStack and Azure in terms of reads, writes, throughput and latency for an in-house application.
The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
This document promotes Datastax Academy and Certification resources for learning Cassandra including a three step process of learning Cassandra, getting certified, and profiting. It lists community evangelists like Luke Tillman, Patrick McFadin, Jon Haddad, and Duy Hai Doan who can provide help and resources.
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
This document summarizes three presentations from a Cassandra Meetup:
1. Jason Cacciatore discussed monitoring Cassandra health at scale across hundreds of clusters and thousands of nodes using the reactive stream processing system Mantis.
2. Minh Do explained how Cassandra uses the gossip protocol for tasks like discovering cluster topology and sharing load information. Gossip also has limitations and race conditions that can cause problems.
3. Chris Kalantzis presented Cassandra Tickler, an open source tool he created to help repair operations that get stuck by running lightweight consistency checks on an old Cassandra version or a node with space issues.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
The document discusses Cassandra's use by Sony Network Entertainment to handle the large amount of user and transaction data from the growing PlayStation Network. It describes how the relational database they previously used did not scale sufficiently, so they transitioned to using Cassandra in a denormalized and customized way. Some of the techniques discussed include caching user data locally on application servers, secondary indexing, and using a real-time indexer to enable personalized search by friends.
This document provides guidance on setting up server monitoring, application metrics, log aggregation, time synchronization, replication strategies, and garbage collection for a Cassandra cluster. Key recommendations include:
1. Use monitoring tools like Monit, Munin, Nagios, or OpsCenter to monitor processes, disk usage, and system performance. Aggregate all logs centrally with tools like Splunk, Logstash, or Greylog.
2. Install NTP to synchronize server times which are critical for consistency.
3. Use the NetworkTopologyStrategy replication strategy and avoid SimpleStrategy for production.
4. Avoid shared storage and focus on low latency and high throughput using multiple local disks.
5. Understand
This document discusses real time analytics using Spark and Spark Streaming. It provides an introduction to Spark and highlights limitations of Hadoop for real-time analytics. It then describes Spark's advantages like in-memory processing and rich APIs. The document discusses Spark Streaming and the Spark Cassandra Connector. It also introduces DataStax Enterprise which integrates Spark, Cassandra and Solr to allow real-time analytics without separate clusters. Examples of streaming use cases and demos are provided.
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
The document discusses different data storage options for small, medium, and large datasets. It argues that relational databases do not scale well for large datasets due to limitations with replication, normalization, sharding, and high availability. The document then introduces Apache Cassandra as a fast, distributed, highly available, and linearly scalable database that addresses these limitations through its use of a hash ring architecture and tunable consistency levels. It describes Cassandra's key features including replication, compaction, and multi-datacenter support.
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
The document discusses common bad habits that can occur when working with Apache Cassandra and provides recommendations to avoid them. Specifically, it addresses issues like sliding back into a relational mindset when the data model is different, improperly benchmarking Cassandra systems, having slow client performance, and neglecting important operations tasks. The presentation provides guidance on how to approach data modeling, querying, benchmarking, driver usage, and operations management in a Cassandra-oriented way.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
The document discusses best practices for using Apache Cassandra, including:
- Topology considerations like replication strategies and snitches
- Booting new datacenters and replacing nodes
- Security techniques like authentication, authorization, and SSL encryption
- Using prepared statements for efficiency
- Asynchronous execution for request pipelining
- Batch statements and their appropriate uses
- Improving performance through techniques like the new row cache
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
21. Resource Group
container for multiple resources
resources exist in one* resource group
resource groups can span regions
resource groups can span services
RESOURCE GROUP
Deployment
tracks template execution
created within a resource group
allows nested deployments
22. • Template describes the topology (outside the box)
• Template extensions can initiate state configuration (inside the box)
• Multiple extensions available for Windows and Linux VMs
– DSC
– Chef
– Puppet
– Custom Scripts
– AppService + WebDeploy
– SQLDB + BACPAC
Inside the Box vs. Outside the Box
23. Common Use Cases for ARM Templates
• Enterprises and System Integrators
– Delivering a capability or cloud capacity (building block templates, e.g. DSE)
– Delivering an end to end application (solution templates)
• Cloud Service Vendors (CSVs)
– Support different multi-tenancy approaches
• Distinct deployments per customer
– Within the CSV’s subscription
– “Bring Your Own Subscription” model that uses customer subscriptions
• Scale units within a central multi-tenant system
• Marketplace integration
• All deploy known configurations/skus/t-shirt sizes
– Lots of variables makes free form less desirable
– T-shirt Sizes / SKUs are the common approach
24. Design and deploy a building block template
Go to http://github.com/azure/azure-quickstart-templates
to find 100s of quick start deployment templates for finished solutions.
DataStax is evolving ARM deployment templates in this
github repo to include DSE specific capabilities (e.g.
multi-region topology) for those who want to manage
their own deployment.
Deploying DataStax with the Azure CLI
Deploying DataStax with Azure Marketplace
25. Compute and storage options for nodes in the cluster
• Compute families for production clusters
– D-Series, G-Series (Xeon® E5 v3)
• Local SSD disks
– DS-Series, GS-Series
• Premium Storage optimized, host caching for reads
• Storage options for nodes
– Maintain data and logs on local ephemeral SSD disks
• ~100k IOPs and 1.5 GB/sec on G5
– Leverage Premium Storage Disks for persistent data and logs
• P10, P20, P30 (128GB to 1TB, up to 5000 IOPs and 200MB/sec)
• Striped volumes to balance storage size, throughput and costs
• Max 64TB, 80000 IOPs and 1GB/sec per node
– Use Standard Storage for backup snapshots
• Low cost, geo-replicated
26. Networking deployment options
• Supporting your replication topology (NetworkTopologyStrategy), including geo-
replication, for disaster recovery or workload segregation purposes
• Within a VNET, bandwidth is a function of VM type/size
– Up to 20Gbps for G5
• Cross-region VNET gateways
– Standard (100Mbps) or High Performance (200Mbps), No-Crypto option
– Latency impact proportional to distance