Co-Founder Peter Mattis presented this deck to the NYC PostgreSQL User Group on Nov. 4, 2015. It compares PostgreSQL to CockroachDB SQL- logical data structures, kv layers, data storage, online schema changes, and more.
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Running MariaDB in multiple data centersMariaDB plc
The document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
This document provides an overview and comparison of MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet. It discusses the components, goals, and features of each solution. MySQL InnoDB Cluster uses Group Replication to provide high availability, automatic failover, and data consistency. MySQL InnoDB ReplicaSet uses asynchronous replication and provides availability and read scaling through manual primary/secondary configuration and failover. Both solutions integrate MySQL Shell, Router, and automatic member provisioning for easy management.
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesKenny Gryp
MySQL InnoDB Cluster provides a complete high availability solution for MySQL. MySQL Shell includes AdminAPI which enables you to easily configure and administer a group of at least three MySQL server instances to function as an InnoDB cluster.
This talk includes best practices.
MySQL Replication Performance Tuning for Fun and Profit!Vitor Oliveira
MySQL Replication, in addition to bringing high-availability, is the foundation to build high-performance MySQL database systems. Using read scale-out and sharding one can design systems that go from the capacity of a single server to supporting the largest internet sites. But to design and operate high-performance, efficient, manageable and reliable deployments requires knowing the intricacies of the underlying technologies.
This session will provide insights on the main factors that affect the performance of Asynchronous Replication and Group Replication, and how to configure them to make the most out of the underlying computing system. It will also show the latest developments in MySQL 5.7 and 8.0, in areas spanning from group communication to the multi-threaded slave applier, and how effective they are in helping meet the performance requirements in terms of throughput, latency and durability to support the most demanding workload types.
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
As more and more people are moving to PostgreSQL from Oracle, a pattern of mistakes is emerging. They can be caused by the tools being used or just not understanding how PostgreSQL is different than Oracle. In this talk we will discuss the top mistakes people generally make when moving to PostgreSQL from Oracle and what the correct course of action.
This document discusses upgrading to Oracle Database 19c and migrating to Oracle Multitenant. It provides an overview of key features such as being able to have 3 user-created PDBs without a Multitenant license in 19c. It also demonstrates how to use AutoUpgrade to perform an upgrade and migration to Multitenant with a single command. The document highlights various Multitenant concepts such as resource sharing, connecting to containers, and cloning PDBs.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Parquet performance tuning: the missing guideRyan Blue
Parquet performance tuning focuses on optimizing Parquet reads by leveraging columnar organization, encoding, and filtering techniques. Statistics and dictionary filtering can eliminate unnecessary data reads by filtering at the row group and page levels. However, these optimizations require columns to be sorted and fully dictionary encoded within files. Increasing dictionary size thresholds and decreasing row group sizes can help avoid dictionary encoding fallback and improve filtering effectiveness. Future work may include new encodings, compression algorithms like Brotli, and page-level filtering in the Parquet format.
This document discusses configuring and implementing a MariaDB Galera cluster for high availability on 3 Ubuntu servers. It provides steps to install MariaDB with Galera patches, configure the basic Galera settings, and start the cluster across the nodes. Key aspects covered include state transfers methods, Galera architecture, and important status variables for monitoring the cluster.
Lightweight locks (LWLocks) in PostgreSQL provide mutually exclusive access to shared memory structures. They support both shared and exclusive locking modes. The LWLocks framework uses wait queues, semaphores, and spinlocks to efficiently manage acquiring and releasing locks. Dynamic monitoring of LWLock events is possible through special builds that incorporate statistics collection.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
The document provides an overview of Oracle database performance tuning best practices for DBAs and developers. It discusses the connection between SQL tuning and instance tuning, and how tuning both the database and SQL statements is important. It also covers the connection between the database and operating system, how features like data integrity and zero downtime updates are important. The presentation agenda includes topics like identifying bottlenecks, benchmarking, optimization techniques, the cost-based optimizer, indexes, and more.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
TFA Collector - what can one do with it Sandesh Rao
The document provides an overview of the Oracle Trace File Analyzer (TFA) features and capabilities. TFA is installed as part of Oracle Grid Infrastructure and Oracle Database installations and provides a single interface to collect diagnostic data across clusters and consolidate it in one place. It reduces the time required to obtain diagnostic data needed to diagnose problems, saving businesses money. TFA can automatically detect events, collect relevant diagnostics, notify administrators, and upload collections to Oracle Support.
This document discusses Oracle Multitenant 19c and pluggable databases. It begins with an introduction to the speaker and overview of pluggable databases. It then describes the traditional Oracle database architecture and the multitenant architecture in Oracle 19c. It discusses the different components of a container database including the root, seed PDB, and application containers. It also covers how to create pluggable databases from scratch, through cloning locally and remotely, relocating PDBs, and plugging in unplugged PDBs.
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationKenny Gryp
This document provides an overview of different database replication technologies including Galera Cluster, Percona XtraDB Cluster, and MySQL Group Replication. It discusses similarities between the technologies such as multi-master replication topologies and consistency models. Key differences are also outlined relating to node provisioning, failure handling, and operational limitations of each solution. Known issues uncovered through quality assurance testing are also briefly mentioned.
RocksDB storage engine for MySQL and MongoDBIgor Canadi
My talk from Percona Live Europe 2015. Presenting RocksDB storage engine for MySQL and MongoDB. The talk covers RocksDB story, its internals and gives some hints on performance tuning.
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
As more and more people are moving to PostgreSQL from Oracle, a pattern of mistakes is emerging. They can be caused by the tools being used or just not understanding how PostgreSQL is different than Oracle. In this talk we will discuss the top mistakes people generally make when moving to PostgreSQL from Oracle and what the correct course of action.
This document discusses upgrading to Oracle Database 19c and migrating to Oracle Multitenant. It provides an overview of key features such as being able to have 3 user-created PDBs without a Multitenant license in 19c. It also demonstrates how to use AutoUpgrade to perform an upgrade and migration to Multitenant with a single command. The document highlights various Multitenant concepts such as resource sharing, connecting to containers, and cloning PDBs.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Parquet performance tuning: the missing guideRyan Blue
Parquet performance tuning focuses on optimizing Parquet reads by leveraging columnar organization, encoding, and filtering techniques. Statistics and dictionary filtering can eliminate unnecessary data reads by filtering at the row group and page levels. However, these optimizations require columns to be sorted and fully dictionary encoded within files. Increasing dictionary size thresholds and decreasing row group sizes can help avoid dictionary encoding fallback and improve filtering effectiveness. Future work may include new encodings, compression algorithms like Brotli, and page-level filtering in the Parquet format.
This document discusses configuring and implementing a MariaDB Galera cluster for high availability on 3 Ubuntu servers. It provides steps to install MariaDB with Galera patches, configure the basic Galera settings, and start the cluster across the nodes. Key aspects covered include state transfers methods, Galera architecture, and important status variables for monitoring the cluster.
Lightweight locks (LWLocks) in PostgreSQL provide mutually exclusive access to shared memory structures. They support both shared and exclusive locking modes. The LWLocks framework uses wait queues, semaphores, and spinlocks to efficiently manage acquiring and releasing locks. Dynamic monitoring of LWLock events is possible through special builds that incorporate statistics collection.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
The document provides an overview of Oracle database performance tuning best practices for DBAs and developers. It discusses the connection between SQL tuning and instance tuning, and how tuning both the database and SQL statements is important. It also covers the connection between the database and operating system, how features like data integrity and zero downtime updates are important. The presentation agenda includes topics like identifying bottlenecks, benchmarking, optimization techniques, the cost-based optimizer, indexes, and more.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
TFA Collector - what can one do with it Sandesh Rao
The document provides an overview of the Oracle Trace File Analyzer (TFA) features and capabilities. TFA is installed as part of Oracle Grid Infrastructure and Oracle Database installations and provides a single interface to collect diagnostic data across clusters and consolidate it in one place. It reduces the time required to obtain diagnostic data needed to diagnose problems, saving businesses money. TFA can automatically detect events, collect relevant diagnostics, notify administrators, and upload collections to Oracle Support.
This document discusses Oracle Multitenant 19c and pluggable databases. It begins with an introduction to the speaker and overview of pluggable databases. It then describes the traditional Oracle database architecture and the multitenant architecture in Oracle 19c. It discusses the different components of a container database including the root, seed PDB, and application containers. It also covers how to create pluggable databases from scratch, through cloning locally and remotely, relocating PDBs, and plugging in unplugged PDBs.
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationKenny Gryp
This document provides an overview of different database replication technologies including Galera Cluster, Percona XtraDB Cluster, and MySQL Group Replication. It discusses similarities between the technologies such as multi-master replication topologies and consistency models. Key differences are also outlined relating to node provisioning, failure handling, and operational limitations of each solution. Known issues uncovered through quality assurance testing are also briefly mentioned.
RocksDB storage engine for MySQL and MongoDBIgor Canadi
My talk from Percona Live Europe 2015. Presenting RocksDB storage engine for MySQL and MongoDB. The talk covers RocksDB story, its internals and gives some hints on performance tuning.
RocksDB is an embedded key-value store written in C++ and optimized for fast storage environments like flash or RAM. It uses a log-structured merge tree to store data by writing new data sequentially to an in-memory log and memtable, periodically flushing the memtable to disk in sorted SSTables. It reads from the memtable and SSTables, and performs background compaction to merge SSTables and remove overwritten data. RocksDB supports two compaction styles - level style, which stores SSTables in multiple levels sorted by age, and universal style, which stores all SSTables in level 0 sorted by time.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
This document discusses Parse's evaluation and adoption of the RocksDB storage engine for MongoDB. Some key points:
- Parse has a large MongoDB deployment handling millions of collections and indexes across 35 replica sets.
- RocksDB provides higher write throughput, compression, and avoids stalls compared to MongoDB's default storage engines.
- After hidden testing, Parse has deployed RocksDB as the primary storage engine for 25% of replica sets and secondaries for 50% of sets.
- Monitoring and operational tools are being enhanced for RocksDB. Performance continues to improve and wider adoption within the MongoDB community is hoped for.
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDBMongoDB
Criteo grew rapidly from 15 employees in 2007 to over 700 employees in 2012. They migrated from using Microsoft SQL Server to MongoDB to support their growing product catalogues and infrastructure needs. MongoDB provided a scalable distributed database that could replicate data across data centers with high availability and handle their increasing traffic of over 30 billion HTTP requests and 1 billion banners served daily. While MongoDB worked well overall, Criteo encountered some performance issues with large datasets and low read/write ratios that they had to address through optimizations.
MongoDB blows away its open source NoSQL competitors, Couchbase and DataStax (Cassandra), when it comes to scalability in real world deployments, according to United Software Associates, Inc., an independent benchmarking and performance testing organization.
This document discusses Criteo's transition from using SQL databases to NoSQL databases like Couchbase to handle their real-time advertising needs at scale. It describes how Criteo grew to handle over 10 million hits per second across 24 Couchbase clusters containing 550 servers with 107 TB of RAM and SSD storage. Key lessons learned included not mixing RAM and persisted data usages, extracting Couchbase stats to Graphite for flexibility, and investing as much in development as operations. The presentation concludes by outlining Criteo's plans to utilize Couchbase replication and improve failover capabilities between data centers.
The document discusses compaction in RocksDB, an embedded key-value storage engine. It describes the two compaction styles in RocksDB: level style compaction and universal style compaction. Level style compaction stores data in multiple levels and performs compactions by merging files from lower to higher levels. Universal style compaction keeps all files in level 0 and performs compactions by merging adjacent files in time order. The document provides details on the compaction process and configuration options for both styles.
1 Million Writes per second on 60 nodes with Cassandra and EBSJim Plush
This document summarizes a presentation about achieving 1 million writes per second on Cassandra using 60 nodes with Amazon EBS storage. It discusses how CrowdStrike tested Cassandra on EBS and achieved higher performance than anticipated by optimizing configurations, using larger instance types with more CPUs and storage, and tweaking Cassandra settings. It provides examples of testing methodology and results showing high throughput and low latency even with large volumes of data and I/O.
This document discusses how companies use NoSQL and Couchbase. Couchbase is an open-source, distributed document-oriented database that supports both key-value and document data models. It offers features like easy scalability, high performance, flexible data modeling with JSON documents, and 24/7 availability. Common use cases for Couchbase include mobile apps, caching, session stores, user profiles, content management, and real-time analytics. The document provides examples of companies using Couchbase for applications in social gaming, advertising targeting, and education.
Pierre Mavro from Criteo discussed Couchbase usage at their company. Criteo has over 100 Couchbase clusters storing over 90TB of data serving up to 25 million queries per second. They benchmarked Couchbase and found network bandwidth and replicas increased latency. To improve, Criteo monitored latency, split workloads across clusters, automated operations, and tuned Couchbase and systems. Their changes helped Couchbase scale for Criteo's large workload.
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. It uses non-blocking I/O and event-driven architecture, making it suitable for real-time applications with many concurrent connections. Key features include a module system, asynchronous I/O bindings, common network protocols like HTTP and TCP/IP, and a streaming API. Addons allow extending Node.js with native code modules written in C/C++ for additional functionality.
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
This document summarizes a presentation on Spark SQL and its capabilities. Spark SQL allows users to run SQL queries on Spark, including HiveQL queries with UDFs, UDAFs, and SerDes. It provides a unified interface for reading and writing data in various formats. Spark SQL also allows users to express common operations like selecting columns, joining data, and aggregation concisely through its DataFrame API. This reduces the amount of code users need to write compared to lower-level APIs like RDDs.
In this talk Ben will walk you through running Cassandra in a docker environment to give you a flexible development environment that uses only a very small set of resources, both locally and with your favorite cloud provider. Lessons learned running Cassandra with a very small set of resources are applicable to both your local development environment and larger, less constrained production deployments.
This document provides an overview comparison of SAS and Spark for analytics. SAS is a commercial software while Spark is an open source framework. SAS uses datasets that reside in memory while Spark uses resilient distributed datasets (RDDs) that can scale across clusters. Both support SQL queries but Spark SQL allows querying distributed data lazily. Spark also provides machine learning APIs through MLlib that can perform tasks like classification, clustering, and recommendation at scale.
If you’re already a SQL user then working with Hadoop may be a little easier than you think, thanks to Apache Hive. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
This cheat sheet covers:
-- Query
-- Metadata
-- SQL Compatibility
-- Command Line
-- Hive Shell
César D. Rodas gave a presentation on advanced MongoDB topics at Yahoo! Open Hack Day 2010 in São Paulo, Brazil. The presentation covered MongoDB queries, designing a data model for a blog site, and optimizing the data model to work in a sharded environment. Key points included how to connect to MongoDB from PHP, perform queries and pagination, and configure sharding of the blog and user collections to distribute the databases and collections across multiple servers.
This document provides an overview of using the Perl programming language for bioinformatics applications. It introduces Perl variables like scalars, arrays, and hashes. It also covers flow control and loops. The document demonstrates how to open and read/write files in Perl. It provides examples of commonly used bioinformatics tools that incorporate Perl components and recommends resources for learning more about Perl and BioPerl.
This document summarizes the new features in version 2.6 of MongoDB, including:
1) Aggregation cursors that allow controlling batch size and iterating over aggregation results.
2) A maxTimeMS option that sets a timeout for queries and commands.
3) New and enhanced update operations like $mul, $bit, $min/$max, $currentDate, and $push enhancements.
4) A new security model with roles and rights to customize access control.
In this presentation, Amit explains querying with MongoDB in detail including Querying on Embedded Documents, Geospatial indexing and Querying etc.
The tutorial includes a recap of MongoDB, the wrapped queries, queries which are using modifiers, Upsert (saving/ updating queries), updating multiple documents at once, etc. Moreover, it gives a brief explanation about specifying which keys to return, the AND/OR queries, querying on embedded documents, cursors and Geospatial indexing. The tutorial begins with a section about MongoDB which includes steps to install and start MongoDB, to show and select Database, to drop collection and database, steps to insert a document and get up to 20 matching documents. Furthermore, it also includes steps to store and use Javascript functions on the server side.
The next section after the MongoDB section is about wrapped queries and queries using modifiers which includes the types of wrapped queries which are used like LikeQuery, SortQuery, LimitQuery, SkipQuery. It also includes the types of queries using modifiers like NotEqualModifier, Greater/Lesser modifier, Increment Modifier, Set Modifier, Unset Modifier, Push Modifier etc. Then comes the section about Upsert (Save or update). There are steps mentioned for saving or updating queries in this section.
At the same time, there are steps to update multiple documents altogether. The next section which is called “specifying which keys to return” talks about ways to specify the keys the user wants. After this section comes OR/AND query. It informs us about the general steps to do an OR query. Also, it includes the general steps to do an AND query. After this section comes another section called “querying on embedded document” which tells the user about ways of querying for an embedded document.
One of the important sections of this tutorial is about cursors, uses of a cursor and also methods to chain additional options onto a query before it is performed. Following is a section about indexing which talks about indexing as a term and how indexing helps in improving the query’s speed. At the end is a section which gives a brief explanation on geospatial indexing which is another type of query that became common with the emergence of mobile devices. Also, it includes the ways geospatial queries can be performed.
Explained how to the handset on SQL. Explain what is a database. How to create database how to drop database how to create a table. Insert Primary Key, Foreign Key all the constraints, dates. It also explained about the Index.
Database is a collection of organized data that allows for easy updating and modification of stored data. Data is stored permanently in tables which organize data into rows and columns. SQL is the language used to access and modify database data using statements. DDL statements are used to define database schema through commands like Create, Alter, and Drop. DML statements manipulate data through Insert, Update, Delete, and Select commands. JDBC provides an API for connecting Java programs to databases to perform operations like executing statements and queries, and retrieving and modifying data.
The document describes a database application to implement the Apriori algorithm for association rule mining. It discusses the Apriori algorithm and its steps. The program takes transaction data and configuration parameters as input files, runs the Apriori algorithm to find frequent itemsets, and outputs the results. The program execution time is also reported.
The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.
This document discusses Android automated testing. It defines automated testing as using separate software to control test execution and compare actual vs. expected outcomes. It provides information on different types of Android tests, including local unit tests that run on a machine and instrumentation tests that run on an emulator/device. It also gives tips on testing best practices like using the Arrange-Act-Assert pattern and focusing tests to be fast, isolated, repeatable, and self-validating.
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
Monal Daxini presented on the declarative benchmarking tool NDBench and its Cassandra plugin. The tool allows users to define performance test profiles that specify the Cassandra schema, queries, load patterns, and other parameters. It executes the queries against Cassandra clusters and collects metrics to analyze performance. The plugin supports all Cassandra data types and allows testing different versions. Netflix uses it to validate data models and certify Cassandra upgrades. Future enhancements include adding more data generators and supporting other data stores.
This document discusses integration testing of ColdBox applications using TestBox and MockBox. It defines integration testing as testing individual units combined as a group to expose faults in their interaction. The key benefits outlined are that integration tests allow testing of real use cases faster than end-to-end tests. The document recommends always integration testing as part of the development process and provides an overview of how to set up and run integration tests for a ColdBox application using TestBox.
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
This document provides an overview of Scala-ActiveRecord, a type-safe Active Record model library for Scala. It discusses features such as being type-safe, having Rails ActiveRecord-like functionality, automatic transaction control, and support for associations and validations. The document also covers getting started, defining schemas, CRUD operations, queries, caching queries, validations, callbacks, and relationships.
Scaling Writes on CockroachDB with Apache NiFiChris Casano
This document discusses scaling writes on CockroachDB using Apache NiFi. It provides an overview of CockroachDB and Apache NiFi, how they can work together for data ingestion and migrations. It then demonstrates scaling writes into CockroachDB using NiFi through a Twitter ingestion pipeline and change data capture audit demo. The document recommends starting with the code samples on GitHub to build these ingestion demos.
Obtain better data accuracy using reference tablesKiran Venna
Data accuracy can be improved tremendously by using reference tables, especially when data is loaded from external files into target tables. Checking the data accuracy of source tables can be easily done with help of reference tables. By using SAS® macro,
data accuracy for many files can be performed with a single reference table. Step by step approach to build reference tables and a way to automate the data accuracy
checks will be discussed in this paper.
PHX - Session #4 Treating Databases as First-Class Citizens in DevelopmentSteve Lange
The document discusses treating databases as first-class citizens in development by managing schemas and data through database projects and tools. It addresses questions around where the truth of a schema resides, how to version databases, generate test data, perform unit testing, and manage changes. The key points are using database projects to represent the truth of the schema, version control to manage versions, test data generators for testing, and schema/data comparison tools to facilitate refactoring and managing changes.
Session #4: Treating Databases as First-Class Citizens in DevelopmentSteve Lange
The document discusses treating databases as first-class citizens in development by managing schemas and data through database projects and tools. It addresses questions around where the truth of a schema resides, how to version databases, generate test data, perform unit testing, and manage changes. The key points are using database projects to represent the truth of the schema, version control to manage versions, test data generators for testing, and tools for schema/data compares and refactoring to facilitate change management.
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Lucidworks
Fred Dushin discusses distributed search in Riak, a NoSQL database. He explains how Riak distributes data across nodes and repairs divergence to maintain availability. Riak integrates with Solr for search capabilities through Yokozuna, which automatically indexes data stored in Riak and distributes queries across nodes. Yokozuna also uses active anti-entropy to repair differences between the search index and underlying Riak data.
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexRob Skillington
The cardinality of monitoring data we are collecting today continues to rise, in no small part due to the ephemeral nature of containers and compute platforms like Kubernetes. Querying a flat dataset comprised of an increasing number of metrics requires searching through millions and in some cases billions of metrics to select a subset to display or alert on. The ability to use wildcards or regex within the tag name and values of these metrics and traces are becoming less of a nice-to-have feature and more useful for the growing popularity of ad-hoc exploratory queries.
In this talk we will look at how Prometheus introduced the concept of a reverse index existing side-by-side with a traditional column based TSDB in a single process. We will then walk through the evolution of M3’s metric index, starting with ElasticSearch and evolving over the years to the current M3DB reverse index. We will give an in depth overview of the alternate designs and dive deep into the architecture of the current distributed index and the optimizations we’ve made in order to fulfill wildcards and regex queries across billions of metrics.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Rock, Paper, Scissors: An Apex Map Learning JourneyLynda Kane
Slide Deck from Presentations to WITDevs (April 2021) and Cleveland Developer Group (6/28/2023) on using Rock, Paper, Scissors to learn the Map construct in Salesforce Apex development.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
19. @cockroachdb
INSERT INTO test VALUES (1, “ball”, 3.33);
PostgreSQL: Logical Data Storage
Tuple ID (Page# / Item#) Row
(0, 1) (1, “ball”, 3.33)
test (heap)
20. @cockroachdb
INSERT INTO test VALUES (1, “ball”, 3.33);
PostgreSQL: Logical Data Storage
Tuple ID (Page# / Item#) Row
(0, 1) (1, “ball”, 3.33)
Index Key Tuple ID
1 (0, 1)
test (heap)test_pkey (btree)
21. @cockroachdb
INSERT INTO test VALUES (1, “ball”, 3.33);
INSERT INTO test VALUES (2, “glove”, 4.44);
PostgreSQL: Logical Data Storage
Tuple ID (Page# / Item#) Row
(0, 1) (1, “ball”, 3.33)
(0, 2) (2, “glove”, 4.44)
Index Key Tuple ID
1 (0, 1)
2 (0, 2)
test (heap)test_pkey (btree)
36. @cockroachdb
■NULL indicates value does not exist
■NULL is weird: NULL != NULL
■CockroachDB: NULL values are not explicitly stored
CockroachDB: NULL Column Values
37. @cockroachdb
INSERT INTO test VALUES (1, “ball”, NULL);
CockroachDB: NULL Column Values
Key: /<table>/<index>/<key>/<column> Value
/test/primary/1/name “ball”
38. @cockroachdb
INSERT INTO test VALUES (1, “ball”, NULL);
INSERT INTO test VALUES (2, NULL, NULL);
CockroachDB: NULL Column Values
Key: /<table>/<index>/<key>/<column> Value
/test/primary/1/name “ball”
??? ???
39. @cockroachdb
INSERT INTO test VALUES (1, “ball”, NULL);
INSERT INTO test VALUES (2, NULL, NULL);
CockroachDB: NULL Column Values
Key: /<table>/<index>/<key>[/<column>] Value
/test/primary/1 Ø
/test/primary/1/name “ball”
/test/primary/2 Ø
40. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
■Multiple table rows with equal indexed values are
not allowed
CockroachDB: Unique Indexes
41. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
CockroachDB: Unique Indexes
Key: /<table>/<index>/<key> Value
/test/bar/”ball” 1
42. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
INSERT INTO test VALUES (2, “glove”, 3.33);
CockroachDB: Unique Indexes
Key: /<table>/<index>/<key> Value
/test/bar/”ball” 1
/test/bar/”glove” 2
43. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
INSERT INTO test VALUES (2, “glove”, 3.33);
INSERT INTO test VALUES (3, “glove”, 4.44);
CockroachDB: Unique Indexes
Key: /<table>/<index>/<key> Value
/test/bar/”ball” 1
/test/bar/”glove” 2
/test/bar/”glove” 3
45. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (3, NULL, NULL);
CockroachDB: Unique Indexes (NULL Values)
Key: /<table>/<index>/<key> Value
/test/bar/NULL 3
46. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (3, NULL, NULL);
INSERT INTO test VALUES (4, NULL, NULL);
CockroachDB: Unique Indexes (NULL Values)
Key: /<table>/<index>/<key> Value
/test/bar/NULL 3
/test/bar/NULL 4
47. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (3, NULL, NULL);
CockroachDB: Unique Indexes (NULL Values)
Key: /<table>/<index>/<key>[/<pkey>] Value
/test/bar/NULL/3 Ø
48. @cockroachdb
CREATE UNIQUE INDEX bar ON test (name);
INSERT INTO test VALUES (3, NULL, NULL);
INSERT INTO test VALUES (4, NULL, NULL);
CockroachDB: Unique Indexes (NULL Values)
Key: /<table>/<index>/<key>[/<pkey>] Value
/test/bar/NULL/3 Ø
/test/bar/NULL/4 Ø
49. @cockroachdb
CREATE INDEX foo ON test (name);
■Multiple table rows with equal indexed values are
allowed
CockroachDB: Non-Unique Indexes
50. @cockroachdb
CREATE INDEX foo ON test (name);
■Multiple table rows with equal indexed values are
allowed
■Primary key is a unique index
CockroachDB: Non-Unique Indexes
51. @cockroachdb
CREATE INDEX foo ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
CockroachDB: Non-Unique Indexes
Key: /<table>/<index>/<key>/<pkey> Value
/test/foo/”ball”/1 Ø
52. @cockroachdb
CREATE INDEX foo ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
INSERT INTO test VALUES (2, “glove”, 3.33);
CockroachDB: Non-Unique Indexes
Key: /<table>/<index>/<key>/<pkey> Value
/test/foo/”ball”/1 Ø
/test/foo/”glove”/2 Ø
53. @cockroachdb
CREATE INDEX foo ON test (name);
INSERT INTO test VALUES (1, “ball”, 2.22);
INSERT INTO test VALUES (2, “glove”, 3.33);
INSERT INTO test VALUES (3, “glove”, 4.44);
CockroachDB: Non-Unique Indexes
Key: /<table>/<index>/<key>/<pkey> Value
/test/foo/”ball”/1 Ø
/test/foo/”glove”/2 Ø
/test/foo/”glove”/3 Ø
54. @cockroachdb
■Keys and values are strings
■NULL column values
■Unique indexes
■Non-unique indexes
CockroachDB: Logical Data Storage
55. @cockroachdb
Logical Data Storage
PostgreSQL CockroachDB
Keys are composite structures Keys are strings
Heap storage for rows Required primary key
Per-table heap/indexes Monolithic map
59. @cockroachdb
Schema Change (the easy way)
1. Apologize for down time
2. Lock table
3. Adjust table data (add column, populate index, etc.)
4. Unlock table
60. @cockroachdb
Schema Change (the MySQL way)
1. Create new table with altered schema
2. Capture changes from source to the new table
3. Copy rows from the source to the new table
4. Synchronize source and new table
5. Swap/rename source and new table
67. @cockroachdb
CockroachDB: CREATE INDEX
CREATE INDEX foo ON TEST
1. Add index to TableDescriptor as delete-only
2. Wait for descriptor propagation
3. Mark index as write-only
4. Wait for descriptor propagation
5. Backfill index entries
6. Mark index as read-write
#4: This is a PostgreSQL meetup, why should I care about CockroachDB?
The CockroachDB SQL grammar is based on the Postgres grammar.
Postgres is a SQL database. CockroachDB is a distributed SQL database.
#5: Layered abstractions make it possible to deal with complexity
Higher levels can treat lower levels as functional black boxes
#16: This is a brief overview of logical data storage in PostgreSQL. I’m using the term “logical” to refer to how SQL data is mapped down into PostgreSQL structures and distinguishing it from “physical” data storage which is exactly how those structures are implemented.
#17: The heap structure is unindexed storage of row tuples. Think of it as a hash table where rows are given a unique id at insertion time, except that it is unfortunately not that simple.
Tuples (rows) are located by a tuple ID which is composed of a page# and item# within the page.
A Btree stores values sorted by key.
Btree index key is a tuple of the columns in the index. Value is the row’s tuple ID.
#22: Tuple IDs just happen to be ordered in this example. They are not in general. And tuple IDs are an internal detail of tables and not stable for external usage.