pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
Whether the user needs to setup a permanent replica between MySQL and PostgreSQL or perform an engine migration, pg_chamaleon is the perfect tool for the job.
The talk will cover the history the current implementation and the future releases.
The audience will learn how to setup a replica from MySQL to PostgreSQL in few easy steps. There will be also a coverage on the lessons learned during the tool’s development cycle.
This document discusses using ClickHouse for experimentation and metrics at Spotify. It describes how Spotify built an experimentation platform using ClickHouse to provide teams interactive queries on granular metrics data with low latency. Key aspects include ingesting data from Google Cloud Storage to ClickHouse daily, defining metrics through a centralized catalog, and visualizing metrics and running queries using Superset connected to ClickHouse. The platform aims to reduce load on notebooks and BigQuery by serving common queries directly from ClickHouse.
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
Flink Forward San Francisco 2022.
With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi.
by
Ethan Guo & Kyle Weller
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEOAltinity Ltd
Slides from Webinar. April 16, 2019
Data services are the latest wave of applications to catch the Kubernetes bug. Altinity is pleased to introduce the ClickHouse operator, which makes it easy to run scalable data warehouses on your favorite Kubernetes distro. This webinar shows how to install the operator and bring up a new data warehouse in three simple steps. We also cover storage management, monitoring, making config changes, and other topics that will help you operate your data warehouse successfully on Kubernetes. There is time for demos and Q&A, so bring your questions. See you online!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
Understanding ProxySQL internals and then interacting with some common features of ProxySQL such as query rewriting, mirroring, failovers, and ProxySQL Cluster
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Ltd
Webinar. August 21, 2019
By Robert Hodges and Altinity Engineering Team
Simplified management is a prerequisite for running any data warehouse at scale. Altinity is developing a new web-based console for ClickHouse called the Altinity Cluster Manager. It's now in beta and offers simplified operation of ClickHouse installations for users. In this webinar we introduce the ACM and demonstrate use on Kubernetes as well as Amazon Web Services. Attendees are welcome to sign up as beta testers and provide feedback. Please join us to see the future of Clickhouse management!
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
Presented at the webinar, July 31, 2019
Built-in replication is a powerful ClickHouse feature that helps scale data warehouse performance as well as ensure high availability. This webinar will introduce how replication works internally, explain configuration of clusters with replicas, and show you how to set up and manage ZooKeeper, which is necessary for replication to function. We'll finish off by showing useful replication tricks, such as utilizing replication to migrate data between hosts. Join us to become an expert in this important subject!
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
Why do queries run out of memory? How can I make my queries even faster? How should I size ClickHouse nodes for best cost-efficiency? The key to these questions and many others is knowing what happens inside ClickHouse when a query runs. This webinar is a gentle introduction to ClickHouse internals, focusing on topics that will help your applications run faster and more efficiently. We’ll discuss the basic flow of query execution, dig into how ClickHouse handles aggregation and joins, and show you how ClickHouse distributes processing within a single CPU as well as across many nodes in the network. After attending this webinar you’ll understand how to open up the black box and see what the parts are doing.
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
Paris Clickhouse meetup 2019: How Contentsquare successfully migrated to Clickhouse !
Discover the subtleties of a migration to Clickhouse. What to check before hand, then how to operate clickhouse in Production
Ramazan Polat gives 10 good reasons to use ClickHouse, including that it has blazing fast inserts and selects that can handle billions of rows sub-second. It scales linearly across machines and compresses data effectively. ClickHouse is also production ready with features like fault tolerance, replication, and integration capabilities. It has powerful table functions like arrays, nested columns, and materialized views. ClickHouse also has a great SQL implementation and ecosystem.
Postgres Vision 2018: WAL: Everything You Want to KnowEDB
The document is a presentation about PostgreSQL's Write-Ahead Log (WAL) system. It discusses what the WAL is, how it works, and how it is used for tasks like replication, backup and point-in-time recovery. The WAL logs all transactions to prevent data loss during crashes and ensures data integrity. It is critical for high availability and disaster recovery capabilities in PostgreSQL.
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together.
In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.
This is the presentation delivered by Karthik.P.R at MySQL User Camp Bangalore on 09th June 2017. ProxySQL is a high performance MySQL Load Balancer Designed to scale database servers.
This document discusses optimizing Spark write-heavy workloads to S3 object storage. It describes problems with eventual consistency, renames, and failures when writing to S3. It then presents several solutions implemented at Qubole to improve the performance of Spark writes to Hive tables and directly writing to the Hive warehouse location. These optimizations include parallelizing renames, writing directly to the warehouse, and making recover partitions faster by using more efficient S3 listing. Performance improvements of up to 7x were achieved.
The document is a slide presentation on MongoDB that introduces the topic and provides an overview. It defines MongoDB as a document-oriented, open source database that provides high performance, high availability, and easy scalability. It also discusses MongoDB's use for big data applications, how it is non-relational and stores data as JSON-like documents in collections without a defined schema. The presentation provides steps for installing MongoDB and describes some basic concepts like databases, collections, documents and commands.
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse #datasets #ClickHouseTutorial #opensource #ClickHouseCommunity #Altinity
-----------------
Join ClickHouse Meetups: https://www.meetup.com/San-Francisco-...
Check out more ClickHouse resources: https://altinity.com/resources/
Visit the Altinity Documentation site: https://docs.altinity.com/
Contribute to ClickHouse Knowledge Base: https://kb.altinity.com/
Join the ClickHouse Reddit community: https://www.reddit.com/r/Clickhouse/
----------------
Learn more about Altinity!
Site: https://www.altinity.com
LinkedIn: https://www.linkedin.com/company/alti...
Twitter: https://twitter.com/AltinityDB
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
pg_chameleon is a lightweight replication system written in
python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The history, the logic and the future of the tool.
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Ltd
Webinar. August 21, 2019
By Robert Hodges and Altinity Engineering Team
Simplified management is a prerequisite for running any data warehouse at scale. Altinity is developing a new web-based console for ClickHouse called the Altinity Cluster Manager. It's now in beta and offers simplified operation of ClickHouse installations for users. In this webinar we introduce the ACM and demonstrate use on Kubernetes as well as Amazon Web Services. Attendees are welcome to sign up as beta testers and provide feedback. Please join us to see the future of Clickhouse management!
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
Presented at the webinar, July 31, 2019
Built-in replication is a powerful ClickHouse feature that helps scale data warehouse performance as well as ensure high availability. This webinar will introduce how replication works internally, explain configuration of clusters with replicas, and show you how to set up and manage ZooKeeper, which is necessary for replication to function. We'll finish off by showing useful replication tricks, such as utilizing replication to migrate data between hosts. Join us to become an expert in this important subject!
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
Why do queries run out of memory? How can I make my queries even faster? How should I size ClickHouse nodes for best cost-efficiency? The key to these questions and many others is knowing what happens inside ClickHouse when a query runs. This webinar is a gentle introduction to ClickHouse internals, focusing on topics that will help your applications run faster and more efficiently. We’ll discuss the basic flow of query execution, dig into how ClickHouse handles aggregation and joins, and show you how ClickHouse distributes processing within a single CPU as well as across many nodes in the network. After attending this webinar you’ll understand how to open up the black box and see what the parts are doing.
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
Paris Clickhouse meetup 2019: How Contentsquare successfully migrated to Clickhouse !
Discover the subtleties of a migration to Clickhouse. What to check before hand, then how to operate clickhouse in Production
Ramazan Polat gives 10 good reasons to use ClickHouse, including that it has blazing fast inserts and selects that can handle billions of rows sub-second. It scales linearly across machines and compresses data effectively. ClickHouse is also production ready with features like fault tolerance, replication, and integration capabilities. It has powerful table functions like arrays, nested columns, and materialized views. ClickHouse also has a great SQL implementation and ecosystem.
Postgres Vision 2018: WAL: Everything You Want to KnowEDB
The document is a presentation about PostgreSQL's Write-Ahead Log (WAL) system. It discusses what the WAL is, how it works, and how it is used for tasks like replication, backup and point-in-time recovery. The WAL logs all transactions to prevent data loss during crashes and ensures data integrity. It is critical for high availability and disaster recovery capabilities in PostgreSQL.
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together.
In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.
This is the presentation delivered by Karthik.P.R at MySQL User Camp Bangalore on 09th June 2017. ProxySQL is a high performance MySQL Load Balancer Designed to scale database servers.
This document discusses optimizing Spark write-heavy workloads to S3 object storage. It describes problems with eventual consistency, renames, and failures when writing to S3. It then presents several solutions implemented at Qubole to improve the performance of Spark writes to Hive tables and directly writing to the Hive warehouse location. These optimizations include parallelizing renames, writing directly to the warehouse, and making recover partitions faster by using more efficient S3 listing. Performance improvements of up to 7x were achieved.
The document is a slide presentation on MongoDB that introduces the topic and provides an overview. It defines MongoDB as a document-oriented, open source database that provides high performance, high availability, and easy scalability. It also discusses MongoDB's use for big data applications, how it is non-relational and stores data as JSON-like documents in collections without a defined schema. The presentation provides steps for installing MongoDB and describes some basic concepts like databases, collections, documents and commands.
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse #datasets #ClickHouseTutorial #opensource #ClickHouseCommunity #Altinity
-----------------
Join ClickHouse Meetups: https://www.meetup.com/San-Francisco-...
Check out more ClickHouse resources: https://altinity.com/resources/
Visit the Altinity Documentation site: https://docs.altinity.com/
Contribute to ClickHouse Knowledge Base: https://kb.altinity.com/
Join the ClickHouse Reddit community: https://www.reddit.com/r/Clickhouse/
----------------
Learn more about Altinity!
Site: https://www.altinity.com
LinkedIn: https://www.linkedin.com/company/alti...
Twitter: https://twitter.com/AltinityDB
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
pg_chameleon is a lightweight replication system written in
python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The history, the logic and the future of the tool.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
Slides from the Brighton PostgreSQL meetup presentation. An all around PostgreSQL exploration. The rocky physical layer, the treacherous MVCC’s swamp and the buffer manager’s garden.
- The document discusses pgpool, an open source connection pooler and replication manager for PostgreSQL.
- It describes the history and developers of pgpool, including the ongoing pgpool-II project. Key features of pgpool include connection pooling, synchronous replication, and load balancing of queries across backend PostgreSQL servers.
- The pgpool-II project aims to enhance pgpool with parallel query processing and improved management capabilities like supporting more than two database nodes and a GUI administration tool.
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Alexander Lisachenko
Talk about solving cross-cutting concerns in PHP at DutchPHP Conference.
Discussed questions:
1) OOP features and limitations
2) OOP patterns for solving cross-cutting concerns
3) Aspect-Oriented approach for solving cross-cutting concerns
4) Examples of using AOP for real life application
Distributed System explained (with Java Microservices)Mario Romano
Since I've been working on distributed systems I always wanted to go back in time and teach myself what I know now, in order to avoid the silly mistakes I did. Things like vector clocks, the CAP theorem, how replication really works and why it's needed! This is the speech I'd needed when I wrote my first distributed system, and it's something you need to know if you fancy working in this area. In this talk you will understand these concepts using simple Java microservices talking to each other, using the three different architectures proposed by the CAP theorem. After a quick introduction to the theory, we will start looking at the code and running demos. Services will fall, the network will be partitioned... what will the winner architecture be?
JPA Week3 Entity Mapping / Hexagonal ArchitectureCovenant Ko
The document discusses Hexagonal Architecture and its principles. It explains that the core domain layer should not depend on other layers like the data layer. It provides examples of package structures for Hexagonal Architecture and sample code that separates ports and adapters. Case studies are presented on how companies have implemented Hexagonal Architecture for microservices and APIs.
Infrastructure as code might be literally impossible part 2ice799
The document discusses various issues with infrastructure as code including complexities that arise from software licenses, bugs, and inconsistencies across tools and platforms. Specific examples covered include problems with SSL and APT package management on Debian/Ubuntu, Linux networking configuration difficulties, and inconsistencies in Python packaging related to naming conventions for packages containing hyphens, underscores, or periods. Potential causes discussed include legacy code, lack of time for thorough testing and bug fixing, and economic pressures against developing fully working software systems.
Managing your own PostgreSQL servers is sometimes a burden your business does not want. In this talk we will provide an overview of some of the public cloud offerings available for hosted PostgreSQL and discuss a number of strategies for migrating your databases with a minimum of downtime.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibanapkill
Answer deep questions about the health of configuration runs on your nodes with the popular Elasticsearch, Logstash and Kibana stack. While many questions about resources, catalogs and runtimes can be answered by using the Puppet Dashboard or Puppet Enterprise, there are limitations. Putting the reports and run metrics into Elasticsearch gives users full text search and filtering. Also, you can perform metrics and aggregations over resource numbers or run times. Kibana graphs are also a great way to supplement the dashboards available in Puppet Enterprise.
This is presentation is an introduction to latest DevOps activities at MySQL in last two years related to setting up software repositories for Linux Distro users. Now, users can configure MySQL Software Repositories and upgrade to latest versions of MySQL products without requiring to upgrade the Operating System.
Pg chameleon, mysql to postgresql replica made easyFederico Campoli
Federico Campoli developed pg_chameleon to replicate data from MySQL to PostgreSQL. He has been passionate about IT since 1982 and loves PostgreSQL. Pg_chameleon version 2.0 allows replication of multiple MySQL schemas to a PostgreSQL database. It uses two subprocesses to concurrently read from and replay data to MySQL. The presentation covered pg_chameleon's history, how it works as a replica, setup instructions, and a demo. Future development plans include parallelizing the initial load to speed it up and adding logical replication from PostgreSQL.
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
This document discusses PostgreSQL point-in-time recovery (PITR). It explains that to enable PITR, the archive_mode must be enabled, WAL archiving must occur, and backups of the data directory and WAL archives are needed. During recovery, the data directory is restored, a recovery.conf file is created to set the restore_command and recovery target, and WAL files are replayed to recover to the desired point in time.
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
PostgreSQL negli ultimi anni ha aggiunto funzionalita’ “nosql” ACID compliant e si propone con forza quale attore nell’era di big data.
Dopo una rapida introduzione ai dati schemaless HSTORE e JSON verranno illustrate le problematiche correlate usando un caso reale.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
cloudgenesis cloud workshop , gdg on campus mitasiyaldhande02
Step into the future of cloud computing with CloudGenesis, a power-packed workshop curated by GDG on Campus MITA, designed to equip students and aspiring cloud professionals with hands-on experience in Google Cloud Platform (GCP), Microsoft Azure, and Azure Al services.
This workshop offers a rare opportunity to explore real-world multi-cloud strategies, dive deep into cloud deployment practices, and harness the potential of Al-powered cloud solutions. Through guided labs and live demonstrations, participants will gain valuable exposure to both platforms- enabling them to think beyond silos and embrace a cross-cloud approach to
development and innovation.
Introducing FME Realize: A New Era of Spatial Computing and ARSafe Software
A new era for the FME Platform has arrived – and it’s taking data into the real world.
Meet FME Realize: marking a new chapter in how organizations connect digital information with the physical environment around them. With the addition of FME Realize, FME has evolved into an All-data, Any-AI Spatial Computing Platform.
FME Realize brings spatial computing, augmented reality (AR), and the full power of FME to mobile teams: making it easy to visualize, interact with, and update data right in the field. From infrastructure management to asset inspections, you can put any data into real-world context, instantly.
Join us to discover how spatial computing, powered by FME, enables digital twins, AI-driven insights, and real-time field interactions: all through an intuitive no-code experience.
In this one-hour webinar, you’ll:
-Explore what FME Realize includes and how it fits into the FME Platform
-Learn how to deliver real-time AR experiences, fast
-See how FME enables live, contextual interactions with enterprise data across systems
-See demos, including ones you can try yourself
-Get tutorials and downloadable resources to help you start right away
Whether you’re exploring spatial computing for the first time or looking to scale AR across your organization, this session will give you the tools and insights to get started with confidence.
Fully Open-Source Private Clouds: Freedom, Security, and ControlShapeBlue
In this presentation, Swen Brüseke introduced proIO's strategy for 100% open-source driven private clouds. proIO leverage the proven technologies of CloudStack and LINBIT, complemented by professional maintenance contracts, to provide you with a secure, flexible, and high-performance IT infrastructure. He highlighted the advantages of private clouds compared to public cloud offerings and explain why CloudStack is in many cases a superior solution to Proxmox.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
Offshore IT Support: Balancing In-House and Offshore Help Desk Techniciansjohn823664
In today's always-on digital environment, businesses must deliver seamless IT support across time zones, devices, and departments. This SlideShare explores how companies can strategically combine in-house expertise with offshore talent to build a high-performing, cost-efficient help desk operation.
From the benefits and challenges of offshore support to practical models for integrating global teams, this presentation offers insights, real-world examples, and key metrics for success. Whether you're scaling a startup or optimizing enterprise support, discover how to balance cost, quality, and responsiveness with a hybrid IT support strategy.
Perfect for IT managers, operations leads, and business owners considering global help desk solutions.
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Lorenzo Miniero
Slides for my "Multistream support in the Janus SIP and NoSIP plugins" presentation at the OpenSIPS Summit 2025 event.
They describe my efforts refactoring the Janus SIP and NoSIP plugins to allow for the gatewaying of an arbitrary number of audio/video streams per call (thus breaking the current 1-audio/1-video limitation), plus some additional considerations on what this could mean when dealing with application protocols negotiated via SIP as well.
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification o...Ivan Ruchkin
A poster presented by Thomas Waite and Radoslav Ivanov at the 2nd International Conference on Neuro-symbolic Systems (NeuS) in May 2025.
Paper: https://arxiv.org/abs/2502.21308
Abstract: It remains a challenge to provide safety guarantees for autonomous systems with neural perception and control. A typical approach obtains symbolic bounds on perception error (e.g., using conformal prediction) and performs verification under these bounds. However, these bounds can lead to drastic conservatism in the resulting end-to-end safety guarantee. This paper proposes an approach to synthesize symbolic perception error bounds that serve as an optimal interface between perception performance and control verification. The key idea is to consider our error bounds to be heteroskedastic with respect to the system's state -- not time like in previous approaches. These bounds can be obtained with two gradient-free optimization algorithms. We demonstrate that our bounds lead to tighter safety guarantees than the state-of-the-art in a case study on a mountain car.
For those who have ever wanted to recreate classic games, this presentation covers my five-year journey to build a NES emulator in Kotlin. Starting from scratch in 2020 (you can probably guess why), I’ll share the challenges posed by the architecture of old hardware, performance optimization (surprise, surprise), and the difficulties of emulating sound. I’ll also highlight which Kotlin features shine (and why concurrency isn’t one of them). This high-level overview will walk through each step of the process—from reading ROM formats to where GPT can help, though it won’t write the code for us just yet. We’ll wrap up by launching Mario on the emulator (hopefully without a call from Nintendo).
Adtran’s new Ensemble Cloudlet vRouter solution gives service providers a smarter way to replace aging edge routers. With virtual routing, cloud-hosted management and optional design services, the platform makes it easy to deliver high-performance Layer 3 services at lower cost. Discover how this turnkey, subscription-based solution accelerates deployment, supports hosted VNFs and helps boost enterprise ARPU.
Content and eLearning Standards: Finding the Best Fit for Your-TrainingRustici Software
Tammy Rutherford, Managing Director of Rustici Software, walks through the pros and cons of different standards to better understand which standard is best for your content and chosen technologies.
New Ways to Reduce Database Costs with ScyllaDBScyllaDB
How ScyllaDB’s latest capabilities can reduce your infrastructure costs
ScyllaDB has been obsessed with price-performance from day 1. Our core database is architected with low-level engineering optimizations that squeeze every ounce of power from the underlying infrastructure. And we just completed a multi-year effort to introduce a set of new capabilities for additional savings.
Join this webinar to learn about these new capabilities: the underlying challenges we wanted to address, the workloads that will benefit most from each, and how to get started. We’ll cover ways to:
- Avoid overprovisioning with “just-in-time” scaling
- Safely operate at up to ~90% storage utilization
- Cut network costs with new compression strategies and file-based streaming
We’ll also highlight a “hidden gem” capability that lets you safely balance multiple workloads in a single cluster. To conclude, we will share the efficiency-focused capabilities on our short-term and long-term roadmaps.
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 ProfessioKari Kakkonen
My slides at Professio Testaus ja AI 2025 seminar in Espoo, Finland.
Deck in English, even though I talked in Finnish this time, in addition to chairing the event.
I discuss the different motivations for testing to use AI tools to help in testing, and give several examples in each categories, some open source, some commercial.
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AIPeter Spielvogel
Explore how AI in SAP Fiori apps enhances productivity and collaboration. Learn best practices for SAPUI5, Fiori elements, and tools to build enterprise-grade apps efficiently. Discover practical tips to deploy apps quickly, leveraging AI, and bring your questions for a deep dive into innovative solutions.
Adtran’s SDG 9000 Series brings high-performance, cloud-managed Wi-Fi 7 to homes, businesses and public spaces. Built on a unified SmartOS platform, the portfolio includes outdoor access points, ceiling-mount APs and a 10G PoE router. Intellifi and Mosaic One simplify deployment, deliver AI-driven insights and unlock powerful new revenue streams for service providers.
"AI in the browser: predicting user actions in real time with TensorflowJS", ...Fwdays
With AI becoming increasingly present in our everyday lives, the latest advancements in the field now make it easier than ever to integrate it into our software projects. In this session, we’ll explore how machine learning models can be embedded directly into front-end applications. We'll walk through practical examples, including running basic models such as linear regression and random forest classifiers, all within the browser environment.
Once we grasp the fundamentals of running ML models on the client side, we’ll dive into real-world use cases for web applications—ranging from real-time data classification and interpolation to object tracking in the browser. We'll also introduce a novel approach: dynamically optimizing web applications by predicting user behavior in real time using a machine learning model. This opens the door to smarter, more adaptive user experiences and can significantly improve both performance and engagement.
In addition to the technical insights, we’ll also touch on best practices, potential challenges, and the tools that make browser-based machine learning development more accessible. Whether you're a developer looking to experiment with ML or someone aiming to bring more intelligence into your web apps, this session will offer practical takeaways and inspiration for your next project.
"AI in the browser: predicting user actions in real time with TensorflowJS", ...Fwdays
pg_chameleon MySQL to PostgreSQL replica made easy
1. pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 1 / 46
2. Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
3. Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
4. Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Devrim PostgreSQL tattoo’s copycat
Works at Transferwise as Data Engineer
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
5. Disclaimer
I’m not a developer
I’m a DBA...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
6. Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
7. Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
8. Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...I use tabs
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
10. Table of contents
1 History
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 5 / 46
12. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
13. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
14. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
It’s a good checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
15. I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
16. I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
Abandoned after a while
SQLAlchemy’s limitations were frustrating as well (see slide 3)
And pgloader did the same job much much better
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
17. pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
18. pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
The amazing library python-mysql-replication allowed me build a proof of
concept
Evolved later in pg chameleon 1.x
Kudos to the python-mysql-replication team!
https://github.com/noplay/python-mysql-replication
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
19. pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
20. pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Compatible with CPython 2.7/3.3+
No more SQLAlchemy
The MySQL driver changed from MySQLdb to PyMySQL
Command line helper
Supports type override on the fly (Danger!)
Installs in virtualenv and system wide via pypi
Can detach the replica for minimal downtime migrations
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
21. pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
22. pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
23. pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Read and replay not concurrent with risk of high lag
The optional threaded mode very inefficient and fragile
A single error in the replay process and the replica is broken
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
24. MySQL Replica in a nutshell
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 12 / 46
25. MySQL Replica
The MySQL replica is logical
When the replica is enabled the data changes are stored in the master’s
binary log files
The slave gets from the master’s binary log files
The slave saves the stream of data into local relay logs
The relay logs are replayed against the slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 13 / 46
27. Log formats
MySQL have three ways of storing the changes in the binary logs.
STATEMENT: It logs the statements which are replayed on the slave.
It’s the best solution for the bandwidth. However, when replaying statements
with not deterministic functions this format generates different values on the
slave (e.g. using an insert with a column autogenerated by the uuid function).
ROW: It’s deterministic. This format logs the row images.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
All three formats always log the DDL query events.
The python-mysql-replication library and therefore pg chameleon, require the
ROW format to work properly.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 15 / 46
28. A chameleon in the middle
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 16 / 46
29. pg chameleon
pg chameleon mimics a mysql slave’s behaviour
It performs the initial load for the replicated tables
It connects to the MySQL replica protocol
It stores the row images into a PostgreSQL table
A plpgSQL function decodes the rows and replay the changes
It can detach the replica for minimal downtime migrations
PostgreSQL acts as relay log and replication slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 17 / 46
30. MySQL replica + pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 18 / 46
31. pg chameleon 2.0 #1
Developed at the pgconf.eu 2017 and on the commute
Released as stable the 1st of January 2018
Compatible with python 3.3+
Installs in virtualenv and system wide via pypi
Replicates multiple schemas from a single MySQL into a target PostgreSQL
database
Conservative approach to the replica. Tables which generate errors are
automatically excluded from the replica
Daemonised replica process with two distinct subprocesses, for concurrent
read and replay
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 19 / 46
32. pg chameleon 2.0 #2
Soft locking replica initialisation. The tables are locked only during the copy.
Rollbar integration for a simpler error detection and messaging
Experimental support for the PostgreSQL source type
The tables are loaded in a separate schema which is swapped with the
existing.
This approach requires more space but it makes the init a replica virtually
painless, leaving the old data accessible until the init replica is complete.
The DDL are translated in the PostgreSQL dialect keeping the schema in
sync with MySQL automatically
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 20 / 46
33. Version 2.0’s limitations
Tables for being replicated require primary or unique keys
When detaching the replica the foreign keys are created always ON
DELETE/UPDATE RESTRICT
The source type PostgreSQL supports only the init replica process
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 21 / 46
34. Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
35. Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
pg chameleon flushes the tables with read lock one by one. The lock is held only
during the copy.
The log coordinates are stored in the replica catalogue along the table’s name and
used by the replica process to determine whether the table’s binlog data should be
used or not.
The replica starts inconsistent and gains consistency over time.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
36. Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
37. Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
COPY is fast but is single transaction
One failure and the entire batch is rolled back
If this happens the procedure loads the same data using the INSERT
statements
Which can be very slow
The process attempts to clean the NUL markers which are allowed by MySQL
If the row still fails on insert then it’s discarded
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
39. MySQL configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check that the
following parameters are set.
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 25 / 46
40. MySQL user for replica
Setup a replication user on MySQL
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
In our example we are using the sakila test database.
https://dev.mysql.com/doc/sakila/en/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 26 / 46
41. PostgreSQL setup
Add an user on PostgreSQL capable to create schemas and relations in the
destination database
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 27 / 46
42. Install pg chameleon
Install pg chameleon and create the configuration files
pip install pip --upgrade
pip install pg_chameleon
chameleon set_configuration_files
cd ~/.pg_chameleon/configuration
cp config-example.yml default.yml
Edit the file default.yml setting the correct values for connection and source.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 28 / 46
43. Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
44. Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
45. Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Type override (optional)
t y p e o v e r r i d e :
" tinyint (1) ":
o v e r r i d e t o : b o o l e a n
o v e r r i d e t a b l e s :
- "*"
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
46. Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
47. Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
48. Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
l i m i t t a b l e s :
s k i p t a b l e s :
g r a n t s e l e c t t o :
- u s r r e a d o n l y
l o c k t i m e o u t : " 120 s"
m y s e r v e r i d : 100
r e p l i c a b a t c h s i z e : 10000
rep l ay max row s: 10000
b a t c h r e t e n t i o n : ’1 day ’
copy max memory: " 300 M"
copy mode: ’file ’
o u t d i r : /tmp
s l e e p l o o p : 1
o n e r r o r r e p l a y : c o n t i n u e
o n e r r o r r e a d : c o n t i n u e
auto maintenance: "1 day "
type: mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
49. Add the source and initialise the replica
Add the source mysql and initialise the replica for it. We are using debug in order
to get the logging on the console.
chameleon create_replica_schema --debug
chameleon add_source --config default --source mysql --debug
chameleon init_replica --config default --source mysql --debug
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 31 / 46
50. Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
51. Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Show the replica status
chameleon show_status --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
52. Time for a demo
Demo!
The demo will fail miserably for sure and you will hate this project forever.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 33 / 46
54. Strictness is an illusion. MySQL doubly so
MySQL’s lack of strictness is not a mystery.
The funny way the default with NOT NULL is managed by MySQL can break the
replica.
Therefore any field with NOT NULL added after the initialisation are created
always as NULLable in PostgreSQL.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 35 / 46
55. The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
56. The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
57. The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
MySQL even in ROW format emits the DDL as statements
The class sql token uses the regular expressions to tokenise the DDL
The tokenised data is used to build the DDL in the PostgreSQL dialect
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
59. To boldly go where no chameleon has gone before
Short team goals, version 2.0
Re sync automatically the tables when they error on replay
Improve the replay speed and cpu efficiency
GTID support for MySQL source
Medium term goals version 2.1
Parallel copy and index creation in order to speed up the init replica process
Logical replica from PostgreSQL
Improve the default column handling
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 38 / 46
60. Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 39 / 46
61. Feedback please!
Please report any issue on github and follow pg chameleon on twitter for the
announcements.
https://github.com/the4thdoctor/pg chameleon
@pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 40 / 46
62. Did you say hire?
WE ARE HIRING!
https://transferwise.com/jobs/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 41 / 46
63. That’s all folks!
Thank you for listening!
Any questions?
Please be very basic, I’m just an electrician after all.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 42 / 46
64. Image credits
Palpatine,Dr. Evil disclaimer,It could work. Young Frankenstein source
memegenerator
MySQL Image source, WikiCommons
Hard Disk image, source WikiCommons
Tron image, source Tron Wikia
Twitter icon, source Open Icon Library
The PostgreSQL logo, copyright the PostgreSQL global development group
Boromir get rid of mysql, source imgflip
Morpheus, source imgflip
Keep calm chameleon, source imgflip
The dolphin picture - Copyright artnoose
Perseus, Framed - Copyright Federico Campoli
Pinkie Pie that’s all folks, Copyright by dan232323, used with permission
Doom, source RetroPie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 43 / 46
65. License
This document is distributed under the terms of the Creative Commons
Attribution, Not Commercial, Share Alike
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 44 / 46
66. pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 45 / 46