Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Caching for Microservices Architectures: Session IVMware Tanzu
This document discusses how caching can help address performance, scalability, and autonomy challenges for microservices architectures. It introduces Pivotal Cloud Cache (PCC) as a caching solution for microservices on Pivotal Cloud Foundry. PCC provides an in-memory cache that can scale horizontally and increase performance. It also allows for data autonomy between microservices and teams while providing high availability. PCC offers an easy and cost-effective way to cache data and adopt microservices on Pivotal Cloud Foundry.
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
The workshop implements an innovative fraud detection solution as a PoC for a bank who provides payment processing services for commerce to their merchant customers all across the globe, helping them save costs by applying machine learning and advanced analytics to detect fraudulent transactions. Since their customers are around the world, the right solutions should minimize any latencies experienced using their service by distributing as much of the solution as possible, as closely as possible, to the regions in which their customers use the service. The workshop designs a data pipeline solution that leverages Cosmos DB for both the scalable ingest of streaming data, and the globally distributed serving of both pre-scored data and machine learning models. Cosmos DB’s major advantage when operating at a global scale is its high concurrency with low latency and predictable results.
This combination is unique to Cosmos DB and ideal for the bank needs. The solution leverages the Cosmos DB change data feed in concert with the Azure Databricks Delta and Spark capabilities to enable a modern data warehouse solution that can be used to create risk reduction solutions for scoring transactions for fraud in an offline, batch approach and in a near real-time, request/response approach. https://github.com/Microsoft/MCW-Cosmos-DB-Real-Time-Advanced-Analytics Takeaway: How to leverage Azure Cosmos DB + Azure Databricks along with Spark ML for building innovative advanced analytics pipelines.
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajScyllaDB
Freshworks migrated from Cassandra to ScyllaDB to handle growing audit log data efficiently. Cassandra required frequent scaling, complex repairs, and had non-linear scaling. ScyllaDB reduced costs with fewer machines and improved operations. Using Zero Downtime Migration (ZDM), they bulk-migrated data, performed dual writes, and validated consistency.
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
Introduction to Databus - Linkedin's Change Data Capture Pipeline
https://github.com/linkedin/databus
as presented at Eventbrite - May 07 2013
Data & Analytics Forum: Moving Telcos to Real TimeSingleStore
MemSQL is a real-time database that allows users to simultaneously ingest, serve, and analyze streaming data and transactions. It is an in-memory distributed relational database that supports SQL, key-value, documents, and geospatial queries. MemSQL provides real-time analytics capabilities through Streamliner, which allows one-click deployment of Apache Spark for real-time data pipelines and analytics without batch processing. It is available in free community and paid enterprise editions with support and additional features.
This document discusses database as a service and cloud computing. It introduces concepts like software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). It also covers topics like virtualization, multi-tenancy, service level agreements, storage models, distributed storage, replication, and security in the context of database as a service. The document will be covering these topics in more depth throughout the seminar.
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Mydbops
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applications by Bhanu Jamwal, Head of Solution Engineering, PingCAP at the Mydbops Opensource Database Meetup 14.
This presentation discusses the challenges in choosing the right database for modern applications, focusing on MySQL alternatives. It highlights the growth of new applications, the need to improve infrastructure, and the rise of cloud-native architecture.
The presentation explores alternatives to MySQL, such as MySQL forks, database clustering, and distributed SQL. It introduces TiDB as a distributed SQL database for modern applications, highlighting its features and top use cases.
Case studies of companies benefiting from TiDB are included. The presentation also outlines TiDB's product roadmap, detailing upcoming features and enhancements.
This presentation discusses several high availability best practices from Oracle's Maximum Availability Architecture (MAA) team for minimizing planned and unplanned downtime. It provides examples of how features like Oracle Data Guard, database restore points, transportable tablespaces, and Real Application Clusters can be used to reduce downtime for maintenance activities like database upgrades and platform migrations. Specific tips are provided around tuning Data Guard configurations, using flashback technology to create database clones for testing, and leveraging SQL Apply to minimize downtime during upgrades. Real-world examples from Oracle customers like The Hartford are also presented.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here?
In this webinar, we look at this foundational technology for modern Data Management and show how it evolved to meet the workloads of today, as well as when other platforms make sense for enterprise data.
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft Private Cloud
This document discusses server consolidation using SQL Server 2008 R2. It begins by describing the trend toward consolidation to reduce costs by combining underutilized servers onto fewer servers. Key enablers of consolidation include advances in software, hardware, virtualization and improved bandwidth. SQL Server 2008 R2 provides benefits for consolidation such as low TCO, security, manageability and support for virtualization. The document reviews options for consolidating servers using SQL Server 2008 R2, including multiple databases, multiple instances and virtualization. It also discusses management, high availability, security and reducing storage requirements when consolidating with SQL Server 2008 R2.
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Continuent
Marketo uses Continuent Tungsten to solve key data management challenges at scale. Tungsten provides high availability, online maintenance, and parallel replication to allow Marketo to process over 600 million MySQL transactions per day across more than 7TB of data without downtime. Tungsten's innovative caching and sharding techniques help replicas keep up with Marketo's high transaction volumes and uneven tenant sizes. The solution has enabled fast failover, rolling maintenance, and scaling to thousands of customers.
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...HostedbyConfluent
This document summarizes a presentation about designing an automated data transport system using Kafka Connect and KSQL. The system handles complex data dependencies across tables while respecting business rules. It establishes a feedback loop to notify the source system about successful deliveries and errors. The solution uses Kafka components like streams and tables, source and sink connectors, and microservices. Data is extracted from multiple source tables and written to topics. Sink connectors load the data to the destination. KSQL is used for aggregations to declare when an event is complete or needs alerting. The design leverages existing tools with minimal custom coding.
Best Practices in the Cloud for Data Management (US)Denodo
Watch here: https://bit.ly/2Npt82U
If you have data, you are engaged in data management—be sure to do it effectively.
As organizations are assessing how COVID-19 has impacted their operations, new possibilities and uncharted routes are becoming the norm for many businesses. While exploring and implementing different deployment and operational models, the question of data management naturally surfaces while considering how these changes impact your data. Is this the right time to focus on data management? The reality is that if you have data, you are engaged in data management and so the real question is, are you doing it well?
Join Brice Giesbrecht from Caserta and Mitesh Shah from Denodo to explore data management challenges and solutions facing data driven organizations.
The document discusses the benefits and challenges of running big data workloads on cloud native platforms. Some key points discussed include:
- Big data workloads are migrating to the cloud to take advantage of scalability, flexibility and cost effectiveness compared to on-premises solutions.
- Enterprise cloud platforms need to provide centralized management and monitoring of multiple clusters, secure data access, and replication capabilities.
- Running big data on cloud introduces challenges around storage, networking, compute resources, and security that systems need to address, such as consistency issues with object storage, network throughput reductions, and hardware variations across cloud vendors.
- The open source community is helping users address these challenges to build cloud native data architectures
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayScyllaDB
This document discusses strategies for migrating from SQL to NoSQL databases using Apache Kafka. It outlines the challenges of modernizing legacy databases, how Confluent can help with the migration process, and proposes a three-phase plan. The plan involves initially migrating data sources using connectors, then optimizing the data with stream processing in ksqlDB, and finally modernizing by sending the data to cloud databases. The document provides an overview of Confluent's technologies and services that can help accelerate and simplify the database migration.
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
Architecture matters. That's why today's innovators are taking a hard look at streaming data, an increasingly attractive option that can transform business in several ways: replacing aging data ingestion techniques like ETL; solving long-standing data quality challenges; improving business processes ranging from sales and marketing to logistics and procurement; or any number of activities related to accelerating data warehousing, business intelligence and analytics.
Register for this DM Radio Deep Dive Webinar to learn how streaming data can rejuvenate or supplant traditional data management practices. Host Eric Kavanagh will explain how streaming-first architectures can relieve data engineers from time-consuming, error-prone processes, ideally bidding farewell to those unpleasant batch windows. He'll be joined by Kevin Petrie of Attunity, who will explain why (with real-world story successes) streaming data solutions can keep the business fueled with trusted data in a timely, efficient manner for improved business outcomes.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
This talk provides an architecture overview of data-centric microservices illustrated with an example application. The following Microservices concepts are illustrated - domain driven design, event-driven services, Saga transactions, Application tracing and Health monitoring with different microservices using a variety of data types supported in the database - business data, documents, spatial, graph, and events. A running example of a mobile food delivery application (called GrubDash) is used, with a hands-on-lab that is available for attendees to work through on the Oracle Cloud after these sessions. The rest of the talks will build upon this Microservices architecture framework.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Five ways database modernization simplifies your data lifeSingleStore
This document provides an overview of how database modernization with MemSQL can simplify a company's data life. It discusses five common customer scenarios where database limitations are impacting data-driven initiatives: 1) Slow event to insight delays, 2) High concurrency causing "wait in line" analytics, 3) Costly performance requiring specialized hardware, 4) Slow queries limiting big data analytics, and 5) Deployment inflexibility restricting multi-cloud usage. For each scenario, it provides an example customer situation and solution using MemSQL, highlighting benefits like real-time insights, scalable user access, cost efficiency, accelerated big data analytics, and deployment flexibility. The document also introduces MemSQL capabilities for fast data ingestion, instant
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder of DataTorrent presented "Streaming Analytics with Apache Apex" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...ScyllaDB
With over a billion Indians set to shop online, Meesho is redefining e-commerce by making it accessible, affordable, and inclusive at an unprecedented scale. But scaling for Bharat isn’t just about growth—it’s about building a tech backbone that can handle massive traffic surges, dynamic pricing, real-time recommendations, and seamless user experiences. In this session, we’ll take you behind the scenes of Meesho’s journey in democratizing e-commerce while operating at Monster Scale. Discover how ScyllaDB plays a crucial role in handling millions of transactions, optimizing catalog ranking, and ensuring ultra-low-latency operations. We’ll deep dive into our real-world use cases, performance optimizations, and the key architectural decisions that have helped us scale effortlessly.
Ad
More Related Content
Similar to Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA (20)
This presentation discusses several high availability best practices from Oracle's Maximum Availability Architecture (MAA) team for minimizing planned and unplanned downtime. It provides examples of how features like Oracle Data Guard, database restore points, transportable tablespaces, and Real Application Clusters can be used to reduce downtime for maintenance activities like database upgrades and platform migrations. Specific tips are provided around tuning Data Guard configurations, using flashback technology to create database clones for testing, and leveraging SQL Apply to minimize downtime during upgrades. Real-world examples from Oracle customers like The Hartford are also presented.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here?
In this webinar, we look at this foundational technology for modern Data Management and show how it evolved to meet the workloads of today, as well as when other platforms make sense for enterprise data.
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft Private Cloud
This document discusses server consolidation using SQL Server 2008 R2. It begins by describing the trend toward consolidation to reduce costs by combining underutilized servers onto fewer servers. Key enablers of consolidation include advances in software, hardware, virtualization and improved bandwidth. SQL Server 2008 R2 provides benefits for consolidation such as low TCO, security, manageability and support for virtualization. The document reviews options for consolidating servers using SQL Server 2008 R2, including multiple databases, multiple instances and virtualization. It also discusses management, high availability, security and reducing storage requirements when consolidating with SQL Server 2008 R2.
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Continuent
Marketo uses Continuent Tungsten to solve key data management challenges at scale. Tungsten provides high availability, online maintenance, and parallel replication to allow Marketo to process over 600 million MySQL transactions per day across more than 7TB of data without downtime. Tungsten's innovative caching and sharding techniques help replicas keep up with Marketo's high transaction volumes and uneven tenant sizes. The solution has enabled fast failover, rolling maintenance, and scaling to thousands of customers.
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...HostedbyConfluent
This document summarizes a presentation about designing an automated data transport system using Kafka Connect and KSQL. The system handles complex data dependencies across tables while respecting business rules. It establishes a feedback loop to notify the source system about successful deliveries and errors. The solution uses Kafka components like streams and tables, source and sink connectors, and microservices. Data is extracted from multiple source tables and written to topics. Sink connectors load the data to the destination. KSQL is used for aggregations to declare when an event is complete or needs alerting. The design leverages existing tools with minimal custom coding.
Best Practices in the Cloud for Data Management (US)Denodo
Watch here: https://bit.ly/2Npt82U
If you have data, you are engaged in data management—be sure to do it effectively.
As organizations are assessing how COVID-19 has impacted their operations, new possibilities and uncharted routes are becoming the norm for many businesses. While exploring and implementing different deployment and operational models, the question of data management naturally surfaces while considering how these changes impact your data. Is this the right time to focus on data management? The reality is that if you have data, you are engaged in data management and so the real question is, are you doing it well?
Join Brice Giesbrecht from Caserta and Mitesh Shah from Denodo to explore data management challenges and solutions facing data driven organizations.
The document discusses the benefits and challenges of running big data workloads on cloud native platforms. Some key points discussed include:
- Big data workloads are migrating to the cloud to take advantage of scalability, flexibility and cost effectiveness compared to on-premises solutions.
- Enterprise cloud platforms need to provide centralized management and monitoring of multiple clusters, secure data access, and replication capabilities.
- Running big data on cloud introduces challenges around storage, networking, compute resources, and security that systems need to address, such as consistency issues with object storage, network throughput reductions, and hardware variations across cloud vendors.
- The open source community is helping users address these challenges to build cloud native data architectures
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayScyllaDB
This document discusses strategies for migrating from SQL to NoSQL databases using Apache Kafka. It outlines the challenges of modernizing legacy databases, how Confluent can help with the migration process, and proposes a three-phase plan. The plan involves initially migrating data sources using connectors, then optimizing the data with stream processing in ksqlDB, and finally modernizing by sending the data to cloud databases. The document provides an overview of Confluent's technologies and services that can help accelerate and simplify the database migration.
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
Architecture matters. That's why today's innovators are taking a hard look at streaming data, an increasingly attractive option that can transform business in several ways: replacing aging data ingestion techniques like ETL; solving long-standing data quality challenges; improving business processes ranging from sales and marketing to logistics and procurement; or any number of activities related to accelerating data warehousing, business intelligence and analytics.
Register for this DM Radio Deep Dive Webinar to learn how streaming data can rejuvenate or supplant traditional data management practices. Host Eric Kavanagh will explain how streaming-first architectures can relieve data engineers from time-consuming, error-prone processes, ideally bidding farewell to those unpleasant batch windows. He'll be joined by Kevin Petrie of Attunity, who will explain why (with real-world story successes) streaming data solutions can keep the business fueled with trusted data in a timely, efficient manner for improved business outcomes.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
This talk provides an architecture overview of data-centric microservices illustrated with an example application. The following Microservices concepts are illustrated - domain driven design, event-driven services, Saga transactions, Application tracing and Health monitoring with different microservices using a variety of data types supported in the database - business data, documents, spatial, graph, and events. A running example of a mobile food delivery application (called GrubDash) is used, with a hands-on-lab that is available for attendees to work through on the Oracle Cloud after these sessions. The rest of the talks will build upon this Microservices architecture framework.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Five ways database modernization simplifies your data lifeSingleStore
This document provides an overview of how database modernization with MemSQL can simplify a company's data life. It discusses five common customer scenarios where database limitations are impacting data-driven initiatives: 1) Slow event to insight delays, 2) High concurrency causing "wait in line" analytics, 3) Costly performance requiring specialized hardware, 4) Slow queries limiting big data analytics, and 5) Deployment inflexibility restricting multi-cloud usage. For each scenario, it provides an example customer situation and solution using MemSQL, highlighting benefits like real-time insights, scalable user access, cost efficiency, accelerated big data analytics, and deployment flexibility. The document also introduces MemSQL capabilities for fast data ingestion, instant
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder of DataTorrent presented "Streaming Analytics with Apache Apex" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...ScyllaDB
With over a billion Indians set to shop online, Meesho is redefining e-commerce by making it accessible, affordable, and inclusive at an unprecedented scale. But scaling for Bharat isn’t just about growth—it’s about building a tech backbone that can handle massive traffic surges, dynamic pricing, real-time recommendations, and seamless user experiences. In this session, we’ll take you behind the scenes of Meesho’s journey in democratizing e-commerce while operating at Monster Scale. Discover how ScyllaDB plays a crucial role in handling millions of transactions, optimizing catalog ranking, and ensuring ultra-low-latency operations. We’ll deep dive into our real-world use cases, performance optimizations, and the key architectural decisions that have helped us scale effortlessly.
Navigating common mistakes and critical success factors
Is your team considering or starting a database migration? Learn from the frontline experience gained guiding hundreds of high-stakes migration projects – from startups to Google and Twitter. Join us as Miles Ward and Tim Koopmans have a candid chat about what tends to go wrong and how to steer things right.
We will explore:
- What really pushes teams to the database migration tipping point
- How to scope and manage the complexity of a migration
- Proven migration strategies and antipatterns
- Where complications commonly arise and ways to prevent them
Expect plenty of war stories, along with pragmatic ways to make your own migration as “blissfully boring” as possible.
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsScyllaDB
Explore critical strategies – and antipatterns – for achieving low latency at extreme scale
If you’re getting started with ScyllaDB, you’re probably intrigued by its potential to achieve predictable low latency at extreme scale. But how do you ensure that you’re maximizing that potential for your team’s specific workloads and technical requirements?
This webinar offers practical advice for navigating the various decision points you’ll face as you evaluate ScyllaDB for your project and move into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
Join us for an inside look at the lessons learned across thousands of real-world distributed database projects.
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...ScyllaDB
Cloudflare’s boot infrastructure dynamically generates and signs boot artifacts for nodes worldwide, ensuring secure, scalable, and customizable deployments. This talk dives into its architecture, scaling decisions, and how it enables seamless testing while maintaining a strong chain of trust.
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamScyllaDB
Learn about Agoda's performance tuning strategies for ScyllaDB. Worakarn shares how they optimized disk performance, fine-tuned compaction strategies, and adjusted SSTable settings to match their workload for peak efficiency.
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanScyllaDB
Yieldmo processes hundreds of billions of ad requests daily with subsecond latency. Initially using DynamoDB for its simplicity and stability, they faced rising costs, suboptimal latencies, and cloud provider lock-in. This session explores their journey to ScyllaDB’s DynamoDB-compatible API.
There’s a common adage that it takes 10 years to develop a file system. As ScyllaDB reaches that 10 year milestone in 2025, it’s the perfect time to reflect on the last decade of ScyllaDB development – both hits and misses. It’s especially appropriate given that our project just reached a critical mass with certain scalability and elasticity goals that we dreamed up years ago. This talk will cover how we arrived at ScyllaDB X Cloud achieving our initial vision, and share where we’re heading next.
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanScyllaDB
This talk will explore why ScyllaDB Cloud is a cost-effective alternative to DynamoDB, highlighting efficient design implementations like shared compute, local NVMe storage, and storage compression. It will also discuss new X Cloud features, better plans and pricing, and a direct cost comparison between ScyllaDB and DynamoDB
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuScyllaDB
Terence share how Clearview AI's infra needs evolved and why they chose ScyllaDB after first-principles research. From fast ingestion to production queries, the talk explores their journey with Rust, embedded DB readers, and the ScyllaDB Rust driver—plus config tips for bulk ingestion and achieving data parity.
Vector Search with ScyllaDB by Szymon WasikScyllaDB
Vector search is an essential element of contemporary machine learning pipelines and AI tools. This talk will share preliminary results on the forthcoming vector storage and search features in ScyllaDB. By leveraging Scylla's scalability and USearch library's performance, we have designed a system with exceptional query latency and throughput. The talk will cover vector search use cases, our roadmap, and a comparison of our initial implementation with other vector databases.
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...ScyllaDB
Workload Prioritization is a ScyllaDB exclusive feature for controlling how different workloads compete for system resources. It's used to prioritize urgent application requests that require immediate response times versus others that can tolerate slighter delays (e.g., large scans). Join this session for a demo of how applying workload prioritization reduces infrastructure costs while ensuring predictable performance at scale.
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...ScyllaDB
Should you move code to data or data to code? Conventional wisdom favors the former, but cloud trends push the latter. This session by the creator of PACELC explores the shift, its risks, and the ongoing debate in data virtualization between push- and pull-based processing.
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...ScyllaDB
Scaling from 66M to 25B+ records in a core financial system is tough—every number must be right, and data must be fresh. In this session, Dmytro shares real-world strategies to balance accuracy with real-time performance and avoid scaling pitfalls. It's purely practical, no-BS insights for engineers.
Object Storage in ScyllaDB by Ran Regev, ScyllaDBScyllaDB
In this talk we take a look at how Object Storage is used by Scylla. We focus on current usage, namely - for backup, and we look at the shift in implementation from an external tool to native Scylla. We take a close look at the complexity of backup and restore mostly in the face of topology changes and token assignments. We also take a glimpse to the future and see how Scylla is going to use Object Storage as its native storage. We explore a few possible applications of it and understand the tradeoffs.
Lessons Learned from Building a Serverless Notifications System by Srushith R...ScyllaDB
Reaching your audience isn’t just about email. Learn how we built a scalable, cost-efficient notifications system using AWS serverless—handling SMS, WhatsApp, and more. From architecture to throttling challenges, this talk dives into key decisions for high-scale messaging.
A Dist Sys Programmer's Journey into AI by Piotr SarnaScyllaDB
This talk explores the culture shock of transitioning from distributed databases to AI. While AI operates at massive scale, distributed storage and compute remain essential. Discover key differences, unexpected parallels, and how database expertise applies in the AI world.
High Availability: Lessons Learned by Paul PreuveneersScyllaDB
How does ScyllaDB keep your data safe, and your mission critical applications running smoothly, even in the face of disaster? In this talk we’ll discuss what we have learned about High Availability, how it is implemented within ScyllaDB and what that means for your business. You’ll learn about ScyllaDB cloud architecture design, consistency, replication and even load balancing and much more.
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...ScyllaDB
Natura, a top global cosmetics brand with 3M+ beauty consultants in Latin America, processes massive data for orders, campaigns, and analytics. In this talk, Rodrigo Luchini & Marcus Monteiro share how Natura leverages ScyllaDB’s CDC Source Connector for real-time sales insights.
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...ScyllaDB
This is a case study on managing mutable big data: Exploring the evolution of the persistence layer in a processing graph, tackling design challenges, and refining key operational principles along the way.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...Fwdays
Why the "more leads, more sales" approach is not a silver bullet for a company.
Common symptoms of an ineffective Client Partnership (CP).
Key reasons why CP fails.
Step-by-step roadmap for building this function (processes, roles, metrics).
Business outcomes of CP implementation based on examples of companies sized 50-500.
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtLynda Kane
Slide Deck from Buckeye Dreamin' 2024 presentation Assessing and Resolving Technical Debt. Focused on identifying technical debt in Salesforce and working towards resolving it.
Hands On: Create a Lightning Aura Component with force:RecordDataLynda Kane
Slide Deck from the 3/26/2020 virtual meeting of the Cleveland Developer Group presentation on creating a Lightning Aura Component using force:RecordData.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
2. 2
About me
2
Christina Lin
Developer Advocate, Redpanda
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile Integration
Data
Mesh
Active MQ
Live data stack
Resilience - handle failures and scale gracefully
Elasticity – infrastructure that can scale dynamically
Decentralization - data ownership, empowering individual teams
Performance - low latency and high throughput
Autonomy – self service, define quality, and access
Nimble - efficient data movement
Distributed -distributed data processing for cloud native
Agility – quickly respond to change in data
7. Schema Registry
7
Producer
Data structure encoding
- Avro, Protobuf and JSON
Data structure
- {name:type}
Serialize
Download the
schema (version)
Consumer
Schema Registry
Deserialize
Value
(Binary)
Schema
ID
Key
(Binary)
Value
(Binary)
8. Schema Registry
8
Server-side validation
Value
(Binary)
Sche
ma ID
Key
(Binar
y)
Value
(Binary)
Schema Registry
Check if schema id is
valid
Schema Registry
Producer
• Backward
• Forward
• Full
compatibility
• None
Schema Registry
Version 1
Version 2
Version 3
9. Schema Registry in Redpanda
9
Service Registry
Service Registry
Restful Endpoint
Restful Endpoint
_schemas
_schemas
10. Schema Registry
• Assign a default value to the fields that you might remove in the future
• Do not rename an existing field—add an alias instead
• When using schema evolution, always provide a default value
• Never delete a required field
When not to use Schema registry
• You’re certain the schema won’t change in the future
• If hardware resources are limited and low latency is critical, it may impact
performance (e.g., for IoT)
• You want to serialize the data with an unsupported serialization scheme
10
12. In broker validation – how it works
12
Replicate
across clusters
customer
partition 1
Load to
cache
Validate
against
schema
Transform
Write back to
disk with DMA
Customer validated
partition 1
Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro
13. In broker validation & transformation
• Firsthand processing, quick filtering
• Simple rerouting determine on ingested data
• Masking, schema validation
• Stateless, functional processing
When not to use in broker transformation?
• When it requires external data dependencies
• Windowing, complex processing, with multiple streams of input
• When it requires to keep the state of the processes
13
20. High Throughput
• There is no on size fits all, there are many factor when it comes to
performances.
• More partition will allow more parallel processing, hence higher throughput,
but it comes with cost.(Avoid over-partitioning or under-partitioning.)
• Experiment with acks settings, Enable write caching,
• Explore how the producer batches messages. Increasing the value
of batch.size and linger.ms can increase throughput by making the
producer add more messages into one batch
• Explore consumer fetch frequency and message size.
• Start with a baseline configuration and gradually make changes, measuring
the impact of each change on performance.
20
26. Summary
■ Use schema to insure data shape for consumer
■ When designing, think about compatibility
■ Validation to ensures consumer always get the correct format.
■ In broker transform are great for simple, functions, stateless processes
■ Provision appropriate partition to your topics
■ Depends on your use case, for producer, always set the right Ack, and buffer
■ For stateful streams processing, use snapshot for fault tolerance
26