Slides from the Brighton PostgreSQL meetup presentation. An all around PostgreSQL exploration. The rocky physical layer, the treacherous MVCC’s swamp and the buffer manager’s garden.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
The document outlines an introduction to databases presentation using PostgreSQL. It includes an introduction to databases concepts, an overview of PostgreSQL, demonstrations of SQL commands like CREATE TABLE, INSERT, SELECT and JOIN in psql, and discussions of database administration and GUI tools. Exercises are provided for attendees to practice the concepts covered.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Knowing your Garbage Collector / Python Madridfcofdezc
The document discusses garbage collection in Python. It introduces concepts like the heap, mutator, and collector. It then describes the reference counting algorithm used by CPython as well as techniques for handling cycles. It also discusses the mark and sweep algorithm used by PyPy as an alternative approach.
The document discusses analyzing clones and macro co-changes (MCC) in codebases to determine if cloned files tend to change consistently with other cloned files they are related to. The analysis found that cloned files were more likely to co-change with other cloned files they have relations with, rather than other cloned files. When analyzing bug fixes, files that had MCC with their clones were more likely to be modified than files only changing with other cloned files. This suggests clones may need consistent changes, though future work is needed to better define consistent changes.
DataPlotly is a plugin for QGIS that allows to create D3 like plots from spatial data. It is build on top of plotly, a javascript library which offers easy API for many languages such as Python, R, Matlab and NodeJS.
The plugin was created back in 2017 for the upcoming QGIS 3 version: today the plugin has been downloaded more than 50,000 times.
Creating plots is out of the main scopes of QGIS but thanks to the simple Python API it is easy enough to create additional scripts and plugins. Thanks to these APIs, DataPlotly is today a well maintained Python plugin with a growing community of developers, users and testers.
DataPlotly plots are completely interactive so that plot elements are directly linked with map items; therefore the user is able to query map items from the main plot canvas.
Thanks to a crowdfunding campaign launched in March 2019 during the annual QGIS User Conference, the functionalities of DataPlotly were extended: a complete refactoring of the code, more plots but especially the creation of plots in the layout composer.
More and more people are using the plugin to analyze the data and to create complex output reports of data (e.g. the Covid-19 pandemic
PostgreSQL 9.5 introduces many new capabilities that enhance its functionality as a relational database with NoSQL features. Key additions include full JSON support with CRUD operations and pretty printing, row-level security policies, UPSERT functionality, enhanced password management, improved replication after failovers, foreign table constraints and inheritance, expanded analytics with CUBE and ROLLUP, new versions of PostGIS with additional geospatial tools, and various administrative improvements. These new features further strengthen PostgreSQL's position as the most advanced open source database platform.
The document summarizes the state of the Go programming language in June 2014 and outlines plans for its future development. It discusses growth in Go's user base and communities, highlights from the recently released Go 1.3, and previews features planned for Go 1.4, including canonical import paths, internal packages, moving the standard library, and improvements to tools, runtime performance, and supported platforms. It aims to provide an overview of where Go is currently and what may be coming in the next year.
The document discusses reading and writing files in Python. It provides examples of opening files for reading, writing, and appending. It demonstrates how to read an entire file, individual lines, and loop through lines. It also shows how to write strings to files and close files once writing is complete. Additional topics covered include a template for reading files line by line and examples of counting lines, words, and characters in a file.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Apache Spark continues to grow in popularity - due to advanced analytics/machine learning, high performance processing, real-time streaming and multiple language support. Big Data technology is adding more data processing options to an already long list of legacy databases and file systems. As a result, enterprises continue to look for effective and approachable ways to federate all these data sources to solve business information needs. One under-appreciated feature of Spark is its ability to help quickly and powerfully enable federated data access. This presentation will discuss and demonstrate using Spark to query/combine multiple disparate data sources. We will see how to access the various data sources from Spark, normalize to Spark RDDs and combine for processing. The demo will show combining sources such as HDFS, JSON files, HBase, Hive and PostgreSQL and write the result back to a Data Mart for analysis. Also we will show the use of SparkSQL to access federated data in Spark through the Spark Thrift Server using the the Tableau BI tool.
This document provides information about biological databases including:
- Different types of biological databases such as relational, object-oriented, hierarchical, and hybrid systems.
- Common uses of biological databases including annotation searches, homology searches, pattern searches, predictions, and comparisons.
- Examples of database entries in common formats like GenBank, EMBL, and SwissProt that show the layout and key fields.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
Developing PostgreSQL Performance, Simon RiggsFuenteovejuna
This document discusses PostgreSQL performance improvements over multiple versions from 7.3 to 8.4. It shows graphs demonstrating significant performance gains in peak read-only and read-write transaction rates. It describes the contributors to these gains as improvements to buffer management, locking, caching, and other internal aspects of the database engine. It also outlines ongoing focus areas and potential for further 10-20% gains in transaction processing and data warehouse workloads through continued optimization.
Este documento describe un experimento para medir la tensión superficial de líquidos a diferentes temperaturas utilizando el método de ascenso capilar. Se coloca un tubo capilar en el líquido y la tensión superficial hace que el líquido ascienda por el tubo hasta alcanzar un equilibrio entre la fuerza ascendente debida a la tensión superficial y la fuerza descendente debida al peso del líquido. Midiendo la altura alcanzada, se puede calcular la tensión superficial utilizando la ecuación dada. El procedimiento experimental incluye medir la altura
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
The document outlines an introduction to databases presentation using PostgreSQL. It includes an introduction to databases concepts, an overview of PostgreSQL, demonstrations of SQL commands like CREATE TABLE, INSERT, SELECT and JOIN in psql, and discussions of database administration and GUI tools. Exercises are provided for attendees to practice the concepts covered.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Knowing your Garbage Collector / Python Madridfcofdezc
The document discusses garbage collection in Python. It introduces concepts like the heap, mutator, and collector. It then describes the reference counting algorithm used by CPython as well as techniques for handling cycles. It also discusses the mark and sweep algorithm used by PyPy as an alternative approach.
The document discusses analyzing clones and macro co-changes (MCC) in codebases to determine if cloned files tend to change consistently with other cloned files they are related to. The analysis found that cloned files were more likely to co-change with other cloned files they have relations with, rather than other cloned files. When analyzing bug fixes, files that had MCC with their clones were more likely to be modified than files only changing with other cloned files. This suggests clones may need consistent changes, though future work is needed to better define consistent changes.
DataPlotly is a plugin for QGIS that allows to create D3 like plots from spatial data. It is build on top of plotly, a javascript library which offers easy API for many languages such as Python, R, Matlab and NodeJS.
The plugin was created back in 2017 for the upcoming QGIS 3 version: today the plugin has been downloaded more than 50,000 times.
Creating plots is out of the main scopes of QGIS but thanks to the simple Python API it is easy enough to create additional scripts and plugins. Thanks to these APIs, DataPlotly is today a well maintained Python plugin with a growing community of developers, users and testers.
DataPlotly plots are completely interactive so that plot elements are directly linked with map items; therefore the user is able to query map items from the main plot canvas.
Thanks to a crowdfunding campaign launched in March 2019 during the annual QGIS User Conference, the functionalities of DataPlotly were extended: a complete refactoring of the code, more plots but especially the creation of plots in the layout composer.
More and more people are using the plugin to analyze the data and to create complex output reports of data (e.g. the Covid-19 pandemic
PostgreSQL 9.5 introduces many new capabilities that enhance its functionality as a relational database with NoSQL features. Key additions include full JSON support with CRUD operations and pretty printing, row-level security policies, UPSERT functionality, enhanced password management, improved replication after failovers, foreign table constraints and inheritance, expanded analytics with CUBE and ROLLUP, new versions of PostGIS with additional geospatial tools, and various administrative improvements. These new features further strengthen PostgreSQL's position as the most advanced open source database platform.
The document summarizes the state of the Go programming language in June 2014 and outlines plans for its future development. It discusses growth in Go's user base and communities, highlights from the recently released Go 1.3, and previews features planned for Go 1.4, including canonical import paths, internal packages, moving the standard library, and improvements to tools, runtime performance, and supported platforms. It aims to provide an overview of where Go is currently and what may be coming in the next year.
The document discusses reading and writing files in Python. It provides examples of opening files for reading, writing, and appending. It demonstrates how to read an entire file, individual lines, and loop through lines. It also shows how to write strings to files and close files once writing is complete. Additional topics covered include a template for reading files line by line and examples of counting lines, words, and characters in a file.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Apache Spark continues to grow in popularity - due to advanced analytics/machine learning, high performance processing, real-time streaming and multiple language support. Big Data technology is adding more data processing options to an already long list of legacy databases and file systems. As a result, enterprises continue to look for effective and approachable ways to federate all these data sources to solve business information needs. One under-appreciated feature of Spark is its ability to help quickly and powerfully enable federated data access. This presentation will discuss and demonstrate using Spark to query/combine multiple disparate data sources. We will see how to access the various data sources from Spark, normalize to Spark RDDs and combine for processing. The demo will show combining sources such as HDFS, JSON files, HBase, Hive and PostgreSQL and write the result back to a Data Mart for analysis. Also we will show the use of SparkSQL to access federated data in Spark through the Spark Thrift Server using the the Tableau BI tool.
This document provides information about biological databases including:
- Different types of biological databases such as relational, object-oriented, hierarchical, and hybrid systems.
- Common uses of biological databases including annotation searches, homology searches, pattern searches, predictions, and comparisons.
- Examples of database entries in common formats like GenBank, EMBL, and SwissProt that show the layout and key fields.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
Developing PostgreSQL Performance, Simon RiggsFuenteovejuna
This document discusses PostgreSQL performance improvements over multiple versions from 7.3 to 8.4. It shows graphs demonstrating significant performance gains in peak read-only and read-write transaction rates. It describes the contributors to these gains as improvements to buffer management, locking, caching, and other internal aspects of the database engine. It also outlines ongoing focus areas and potential for further 10-20% gains in transaction processing and data warehouse workloads through continued optimization.
Este documento describe un experimento para medir la tensión superficial de líquidos a diferentes temperaturas utilizando el método de ascenso capilar. Se coloca un tubo capilar en el líquido y la tensión superficial hace que el líquido ascienda por el tubo hasta alcanzar un equilibrio entre la fuerza ascendente debida a la tensión superficial y la fuerza descendente debida al peso del líquido. Midiendo la altura alcanzada, se puede calcular la tensión superficial utilizando la ecuación dada. El procedimiento experimental incluye medir la altura
1) El documento habla sobre la termodinámica de la combustión, definiendo la reacción de combustión y los requisitos para que ocurra, como la mezcla íntima del combustible y comburente. 2) Explica los tipos de combustión, combustibles y clasificación de quemadores. 3) Describe los factores que afectan la eficiencia térmica de las calderas y las pérdidas de calor principales.
Este documento presenta una serie de problemas de termodinámica relacionados con fluidos, gases ideales y cambios de estado. En el problema 3.1 se pregunta si es posible transferir energía a un fluido incompresible en forma de trabajo y cuál sería el cambio en la energía interna. En el problema 3.2 se pide calcular la presión a la que debe comprimirse el agua para que su densidad cambie en un 1%, dadas sus propiedades. En el problema 3.3 se pide derivar una expresión para la compresibilidad isotérmica consist
Van ness problemas termo cap 1 orihuela contreras joseSoldado Aliado<3
El documento presenta 1.9 problemas de termodinámica y mecánica de fluidos resueltos. En el primer problema se calcula la aceleración de la gravedad y sus unidades en un sistema de unidades específico. En el segundo problema se calcula la masa aproximada de pesos necesarios para medir presiones hasta 3500 bares con un manómetro de peso muerto. En el último problema se calcula la fuerza ejercida sobre un gas confinado, su presión y el trabajo realizado por el gas al expandirse y empujar un pistón.
Interpretación topográfica y elementos básicos de foto interpretaciónSoldado Aliado<3
Este documento describe los elementos básicos de la interpretación topográfica y fotográfica, incluyendo las vías de comunicación, hidrografía, vegetación y centros poblados. Explica cómo representar estos elementos usando diferentes colores, y cómo clasificar el terreno según factores como los tonos y texturas. También define términos como zona, roca y coluvial, y describe el proceso de crear un perfil topográfico a partir de curvas de nivel en un mapa, para ilustrar las variaciones en el relieve entre dos puntos
El documento presenta información sobre el motor Stirling de combustión externa. Explica que el motor Stirling fue inventado por Robert Stirling en 1816 y funciona mediante la expansión y contracción de un gas debido a diferencias de temperatura. También describe brevemente la historia de los motores de combustión externa y ofrece detalles sobre el ciclo termodinámico y funcionamiento del motor Stirling.
Este documento presenta una serie de problemas de termodinámica relacionados con fluidos, gases ideales y cambios de estado. En el problema 3.1 se pregunta si es posible transferir energía a un fluido incompresible en forma de trabajo y cómo cambia su energía interna al variar la presión. En el problema 3.2 se pide calcular la presión a la que debe comprimirse agua para que su densidad cambie en un 1%, dadas sus propiedades. En el problema 3.3 se pide derivar una expresión para la compresibilidad isotérmica consist
Este documento presenta tareas para el curso de dibujo técnico. Contiene instrucciones sobre cómo trazar líneas, inscribir caracteres, dividir circunferencias en partes iguales y realizar conjugaciones. El documento proporciona 30 variantes de tareas con múltiples problemas para que los estudiantes desarrollen sus habilidades en dibujo técnico siguiendo las normas y métodos adecuados.
El documento describe los motores de combustión interna, incluyendo los motores Otto y Diesel de cuatro tiempos. Explica el funcionamiento de cada uno a través de las cuatro etapas del ciclo y los componentes principales como el bloque motor, la culata y el cárter. También analiza los impactos ambientales y posibles soluciones.
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
PGDay UK 2016 -- Performace for queries with groupingAlexey Bashtanov
This document summarizes optimization techniques for queries involving grouping and aggregation in PostgreSQL. It discusses avoiding sorts through hash aggregation, count distinct optimization, handling ordered aggregates, summation optimization for data types and zero values, and denormalized data aggregation. Specific techniques covered include increasing work_mem, two-level hash aggregation, extensions for count distinct and hyperloglog, sorting arrays separately, and using MIN to correlate denormalized data in grouping.
This document summarizes a presentation about optimizing performance between PostgreSQL and JDBC.
The presenter discusses several strategies for improving query performance such as using prepared statements, avoiding closing statements, setting fetch sizes appropriately, and using batch inserts with COPY for large amounts of data. Some potential issues that can cause performance degradation are also covered, such as parameter type changes invalidating prepared statements and unexpected plan changes after repeated executions.
The presentation includes examples and benchmarks demonstrating the performance impact of different approaches. The overall message is that prepared statements are very important for performance but must be used carefully due to edge cases that can still cause issues.
PostgreSQL has kept up the momentum around JSON with version 9.4 featuring JSONB as demand for working with unstructured data continues to grow. In this talk delivered during Postgres Open 2014, Vibhor Kumar, principal systems engineer at EnterpriseDB, offered some scenarios for working with JSON in PostgreSQL and demonstrated performance metrics. This session also gave some instruction on how to use different operations and explored comparisons to BSON.
Nine Circles of Inferno or Explaining the PostgreSQL VacuumAlexey Lesovsky
The document describes the nine circles of the PostgreSQL vacuum process. Circle I discusses the postmaster process, which initializes shared memory and launches the autovacuum launcher and worker processes. Circle II focuses on the autovacuum launcher, which manages worker processes and determines when to initiate vacuuming for different databases. Circle III returns to the postmaster process and how it launches autovacuum workers. Circle IV discusses what occurs within an autovacuum worker process after it is launched, including initializing, signaling the launcher, scanning relations, and updating databases. Circle V delves into processing a single database by an autovacuum worker.
Este documento clasifica los diferentes tipos de números reales, incluyendo números naturales, enteros, primos, compuestos, racionales e irracionales. Los números naturales son los números 0, 1, 2, 3, etc., mientras que los enteros incluyen también los números negativos. Los números racionales son aquellos que pueden expresarse como fracciones de números enteros, e incluyen al cero.
pg_chameleon MySQL to PostgreSQL replica made easyFederico Campoli
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
Whether the user needs to setup a permanent replica between MySQL and PostgreSQL or perform an engine migration, pg_chamaleon is the perfect tool for the job.
The talk will cover the history the current implementation and the future releases.
The audience will learn how to setup a replica from MySQL to PostgreSQL in few easy steps. There will be also a coverage on the lessons learned during the tool’s development cycle.
Infrastructure as code might be literally impossible part 2ice799
The document discusses various issues with infrastructure as code including complexities that arise from software licenses, bugs, and inconsistencies across tools and platforms. Specific examples covered include problems with SSL and APT package management on Debian/Ubuntu, Linux networking configuration difficulties, and inconsistencies in Python packaging related to naming conventions for packages containing hyphens, underscores, or periods. Potential causes discussed include legacy code, lack of time for thorough testing and bug fixing, and economic pressures against developing fully working software systems.
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
The document discusses setting up high availability for PostgreSQL using Patroni for automated failover and pgBackRest for backups, providing configuration details for installing and configuring Patroni and pgBackRest on CentOS servers with an etcd distributed consensus store and Google Cloud Storage for backups.
The document provides a summary of the most common code changes needed when migrating Puppet code from version 3 to version 4. It outlines several key differences including:
1) Numbers being treated as numbers rather than strings, requiring file modes to be quoted.
2) Only undefined and false being treated as false, not empty strings.
3) Variable names needing to start with lowercase letters.
4) Other minor changes like hyphens not being allowed in class names and ERB variables requiring @ prefixes.
The document recommends using tools like puppet parser validate, puppet-lint, catalog_diff and catalog_preview to automatically check for issues when migrating code to Puppet 4. Overall,
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Most applications need a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance. Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.
Topics are:
– Challenges in developing a distributed, resilient data store
– Consensus, distributed transactions, distributed query optimization and execution
– The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB
The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.Peter Hurford
Vowpal Platypus is a general use, lightweight Python wrapper built on Vowpal Wabbit, that uses online learning to achieve great results. https://github.com/peterhurford/vowpal_platypus
The computer science behind a modern disributed data storeJ On The Beach
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary components which are everything else than trivial to combine, and, of course, even more challenging when heading for an acceptable performance.
Over the past years there has been significant progress in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores (ArangoDB, Cassandra, Cockroach and RethinkDB).
React Native for Business Solutions: Building Scalable Apps for SuccessAmelia Swank
See how we used React Native to build a scalable mobile app from concept to production. Learn about the benefits of React Native development.
for more info : https://www.atoallinks.com/2025/react-native-developers-turned-concept-into-scalable-solution/
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxanabulhac
Join our first UiPath AgentHack enablement session with the UiPath team to learn more about the upcoming AgentHack! Explore some of the things you'll want to think about as you prepare your entry. Ask your questions.
Join us for the Multi-Stakeholder Consultation Program on the Implementation of Digital Nepal Framework (DNF) 2.0 and the Way Forward, a high-level workshop designed to foster inclusive dialogue, strategic collaboration, and actionable insights among key ICT stakeholders in Nepal. This national-level program brings together representatives from government bodies, private sector organizations, academia, civil society, and international development partners to discuss the roadmap, challenges, and opportunities in implementing DNF 2.0. With a focus on digital governance, data sovereignty, public-private partnerships, startup ecosystem development, and inclusive digital transformation, the workshop aims to build a shared vision for Nepal’s digital future. The event will feature expert presentations, panel discussions, and policy recommendations, setting the stage for unified action and sustained momentum in Nepal’s digital journey.
Building Connected Agents: An Overview of Google's ADK and A2A ProtocolSuresh Peiris
Google's Agent Development Kit (ADK) provides a framework for building AI agents, including complex multi-agent systems. It offers tools for development, deployment, and orchestration.
Complementing this, the Agent2Agent (A2A) protocol is an open standard by Google that enables these AI agents, even if from different developers or frameworks, to communicate and collaborate effectively. A2A allows agents to discover each other's capabilities and work together on tasks.
In essence, ADK helps create the agents, and A2A provides the common language for these connected agents to interact and form more powerful, interoperable AI solutions.
Slides of Limecraft Webinar on May 8th 2025, where Jonna Kokko and Maarten Verwaest discuss the latest release.
This release includes major enhancements and improvements of the Delivery Workspace, as well as provisions against unintended exposure of Graphic Content, and rolls out the third iteration of dashboards.
Customer cases include Scripted Entertainment (continuing drama) for Warner Bros, as well as AI integration in Avid for ITV Studios Daytime.
AI x Accessibility UXPA by Stew Smith and Olivier VroomUXPA Boston
This presentation explores how AI will transform traditional assistive technologies and create entirely new ways to increase inclusion. The presenters will focus specifically on AI's potential to better serve the deaf community - an area where both presenters have made connections and are conducting research. The presenters are conducting a survey of the deaf community to better understand their needs and will present the findings and implications during the presentation.
AI integration into accessibility solutions marks one of the most significant technological advancements of our time. For UX designers and researchers, a basic understanding of how AI systems operate, from simple rule-based algorithms to sophisticated neural networks, offers crucial knowledge for creating more intuitive and adaptable interfaces to improve the lives of 1.3 billion people worldwide living with disabilities.
Attendees will gain valuable insights into designing AI-powered accessibility solutions prioritizing real user needs. The presenters will present practical human-centered design frameworks that balance AI’s capabilities with real-world user experiences. By exploring current applications, emerging innovations, and firsthand perspectives from the deaf community, this presentation will equip UX professionals with actionable strategies to create more inclusive digital experiences that address a wide range of accessibility challenges.
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...SOFTTECHHUB
The world of software development is constantly evolving. New languages, frameworks, and tools appear at a rapid pace, all aiming to help engineers build better software, faster. But what if there was a tool that could act as a true partner in the coding process, understanding your goals and helping you achieve them more efficiently? OpenAI has introduced something that aims to do just that.
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...UXPA Boston
What if product-market fit isn't enough?
We’ve all encountered companies willing to spend time and resources on product-market fit, since any solution needs to solve a problem for people able and willing to pay to solve that problem, but assuming that user experience can be “added” later.
Similarly, value proposition-what a solution does and why it’s better than what’s already there-has a valued place in product development, but it assumes that the product will automatically be something that people can use successfully, or that an MVP can be transformed into something that people can be successful with after the fact. This can require expensive rework, and sometimes stops product development entirely; again, UX professionals are deeply familiar with this problem.
Solutions with solid product-behavior fit, on the other hand, ask people to do tasks that they are willing and equipped to do successfully, from purchasing to using to supervising. Framing research as developing product-behavior fit implicitly positions it as overlapping with product-market fit development and supports articulating the cost of neglecting, and ROI on supporting, user experience.
In this talk, I’ll introduce product-behavior fit as a concept and a process and walk through the steps of improving product-behavior fit, how it integrates with product-market fit development, and how they can be modified for products at different stages in development, as well as how this framing can articulate the ROI of developing user experience in a product development context.
A national workshop bringing together government, private sector, academia, and civil society to discuss the implementation of Digital Nepal Framework 2.0 and shape the future of Nepal’s digital transformation.
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Christian Folini
Everybody is driven by incentives. Good incentives persuade us to do the right thing and patch our servers. Bad incentives make us eat unhealthy food and follow stupid security practices.
There is a huge resource problem in IT, especially in the IT security industry. Therefore, you would expect people to pay attention to the existing incentives and the ones they create with their budget allocation, their awareness training, their security reports, etc.
But reality paints a different picture: Bad incentives all around! We see insane security practices eating valuable time and online training annoying corporate users.
But it's even worse. I've come across incentives that lure companies into creating bad products, and I've seen companies create products that incentivize their customers to waste their time.
It takes people like you and me to say "NO" and stand up for real security!
accessibility Considerations during Design by Rick Blair, Schneider ElectricUXPA Boston
as UX and UI designers, we are responsible for creating designs that result in products, services, and websites that are easy to use, intuitive, and can be used by as many people as possible. accessibility, which is often overlooked, plays a major role in the creation of inclusive designs. In this presentation, you will learn how you, as a designer, play a major role in the creation of accessible artifacts.
Dark Dynamism: drones, dark factories and deurbanizationJakub Šimek
Startup villages are the next frontier on the road to network states. This book aims to serve as a practical guide to bootstrap a desired future that is both definite and optimistic, to quote Peter Thiel’s framework.
Dark Dynamism is my second book, a kind of sequel to Bespoke Balajisms I published on Kindle in 2024. The first book was about 90 ideas of Balaji Srinivasan and 10 of my own concepts, I built on top of his thinking.
In Dark Dynamism, I focus on my ideas I played with over the last 8 years, inspired by Balaji Srinivasan, Alexander Bard and many people from the Game B and IDW scenes.
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?Lorenzo Miniero
Slides for my "RTP Over QUIC: An Interesting Opportunity Or Wasted Time?" presentation at the Kamailio World 2025 event.
They describe my efforts studying and prototyping QUIC and RTP Over QUIC (RoQ) in a new library called imquic, and some observations on what RoQ could be used for in the future, if anything.
DevOpsDays SLC - Platform Engineers are Product Managers.pptxJustin Reock
Platform Engineers are Product Managers: 10x Your Developer Experience
Discover how adopting this mindset can transform your platform engineering efforts into a high-impact, developer-centric initiative that empowers your teams and drives organizational success.
Platform engineering has emerged as a critical function that serves as the backbone for engineering teams, providing the tools and capabilities necessary to accelerate delivery. But to truly maximize their impact, platform engineers should embrace a product management mindset. When thinking like product managers, platform engineers better understand their internal customers' needs, prioritize features, and deliver a seamless developer experience that can 10x an engineering team’s productivity.
In this session, Justin Reock, Deputy CTO at DX (getdx.com), will demonstrate that platform engineers are, in fact, product managers for their internal developer customers. By treating the platform as an internally delivered product, and holding it to the same standard and rollout as any product, teams significantly accelerate the successful adoption of developer experience and platform engineering initiatives.
DevOpsDays SLC - Platform Engineers are Product Managers.pptxJustin Reock
The hitchhiker's guide to PostgreSQL
1. The hitchhiker’s guide to PostgreSQL
Federico Campoli
6 May 2016
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 1 / 43
2. Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 2 / 43
3. Don’t panic!
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 3 / 43
4. Don’t panic!
Don’t panic!
Copyright by Kreg Steppe - https://www.flickr.com/photos/spyndle/
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 4 / 43
6. The Ravenous Bugblatter Beast of Traal
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 6 / 43
7. The Ravenous Bugblatter Beast of Traal
The Ravenous Bugblatter Beast of Traal
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 7 / 43
8. The Ravenous Bugblatter Beast of Traal
The data area
The PostgreSQL’s data area is the directory where the cluster stores the data on
durable storage.
Is referred by the environment variable $PGDATA on unix or %PGDATA% on
windows.
Inside the data area there are various folders.
We’ll take a look at the most important.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 8 / 43
9. The Ravenous Bugblatter Beast of Traal
The directory base
The default location when a new database is created without the TABLESPACE
clause.
Inside, for each database there is a folder with numeric names
An optional folder pgsql tmp is used for the external sorts
The base location is mapped as pg default in the pg tablespace system table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 9 / 43
10. The Ravenous Bugblatter Beast of Traal
The directory base
Each database directory contains files with numeric names where postgres stores
the relation’s data.
The maximum size a data file can grow is 1 GB, a new file is generated with
a numerical suffix
Each data file is organised in fixed size pages of 8192 bytes
The data files are called file nodes. Their relationship with the logical
relations is stored in the pg class system table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 10 / 43
11. The Ravenous Bugblatter Beast of Traal
The directory pg global
The directory pg global contains the data files used by the relations shared across
the cluster.
There is also a small 8kb file named pg control. This is a very critical file where
PostgreSQL stores the cluster’s vital data like the last checkpoint location.
A corrupted pg control prevents the cluster’s start.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 11 / 43
12. The Ravenous Bugblatter Beast of Traal
The directory pg tblspc
Contains the symbolic links to the tablespaces.
Very useful to spread tables and indices on different physical devices
Combined with the logical volume management can improve dramatically the
performance...
or drive the project to a complete failure
The objects tablespace location can be safely changed but this require an
exclusive lock on the affected object
the view pg tablespace maps the objects name and identifiers
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 12 / 43
13. The Ravenous Bugblatter Beast of Traal
The directory pg xlog
The write ahead logs are stored in this directory
Is probably the most important and critical directory in the cluster
Each WAL segment is 16 Mb
Each segment contains the records describing the tuple changed in the
volatile memory
In case of crash or an unclean shutdown the WAL are replayed to restore the
cluster’s consistent state
The number of segments is automatically managed by the database
Putting the location on a dedicated and high reliable device is vital for
performance and reliability
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 13 / 43
14. The Ravenous Bugblatter Beast of Traal
Data pages
Each block is structured almost the same, for tables and indices.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 14 / 43
15. The Ravenous Bugblatter Beast of Traal
Page header
Each page starts with a 24 bytes header followed by the tuple pointers. Those are
usually 4 bytes each and point the physical tuples are stored in the page’s end
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 15 / 43
16. The Ravenous Bugblatter Beast of Traal
The tuples
Now finally we can look to the physical tuples. For each tuple there is a 27 bytes
header. The numbers are the bytes used by the single values.
The user data can be either the data stream or a pointer to the out of line data
stream.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 16 / 43
17. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
TOAST, the best thing since sliced bread
TOAST is the acronym for The Oversized Attribute Storage Technique
The attribute is also known as field
The TOAST can store up to 1 GB in the out of line storage (and free of
charge)
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 17 / 43
18. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
Fixed length data types like integer, date, timestamp do not are not
TOASTable.
The data is stored after the tuple header.
Varlena data types as character varying without the upper bound, text or
bytea are stored in line or out of line.
The storage technique used depends from the data stream size, and the
storage method assigned to the attribute.
Depending from the storage strategy is possible to store the data in external
relations and/or compressed using the fast zlib algorithm.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 18 / 43
19. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
TOAST permits four storage strategies (shamelessly copied from the on line
manual).
PLAIN prevents either compression or out-of-line storage; This is the only
possible strategy for columns of non-TOAST-able data types.
EXTENDED allows both compression and out-of-line storage. This is the
default for most TOAST-able data types. Compression will be attempted
first, then out-of-line storage if the row is still too big.
EXTERNAL allows out-of-line storage but not compression. Use of
EXTERNAL will make substring operations on wide text and bytea columns
faster at the penalty of increased storage space.
MAIN allows compression but not out-of-line storage. Actually, out-of-line
storage will still be performed for such columns, but only as a last resort.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 19 / 43
20. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
When the out of line storage is used the data is encoded in bytea and eventually
split in multiple chunks.
An unique index over the chunk id and chunk seq avoid either duplicate data and
speed up the lookups
Figure : Toast table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 20 / 43
21. Time is an illusion. Lunchtime doubly so
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 21 / 43
22. Time is an illusion. Lunchtime doubly so
Time is an illusion. Lunchtime doubly so
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 22 / 43
23. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
t xmin contains the xid generated at tuple insert
t xmax contains the xid generated at tuple delete
t cid contains the internal command id to track the sequence inside the same
transaction
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 23 / 43
24. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
The PostgreSQL’s consistency is achieved using MVCC which stands for Multi
Version Concurrency Control.
The base logic seems simple.
A 4 byte unsigned integer called xid is incremented by 1 and assigned to the
current transaction.
Every committed xid which value is lesser than the current xid is considered
in the past and then visible to the current transaction.
Every xid which value is greater than the current xid is in the future and then
invisible to the current transaction.
The commit status is managed in the $PGDATA using the directory pg clog
where small 8k files tracks the transaction statuses.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 24 / 43
25. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
In this model there is no UPDATE field. Every time a row is updated it simply
generates a new version with the field t xmin set to the current XID value.
The old row is marked dead just by writing the same XID in the t xmax.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 25 / 43
26. The Pan Galactic Gargle Blaster
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 26 / 43
27. The Pan Galactic Gargle Blaster
The Pan Galactic Gargle Blaster
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 27 / 43
28. The Pan Galactic Gargle Blaster
A little history
Back in the days, when the world was young, PostgreSQL memory was managed
by a simple MRU algorithm.
The new 8.x development cycle introduced a powerful algorithm called Adaptive
Replacement Cache (ARC) where two self adapting memory pools managed the
most recently used and most frequently used buffers.
Because a software patent on the algorithm shortly after the release 8.0 the buffer
manager was replaced by the two queue algorithm.
The release 8.1 adopted the clock sweep memory manager still in use in the latest
version because of its flexibility.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 28 / 43
29. The Pan Galactic Gargle Blaster
The clock sweep
The buffer manager’s main goal is to keep cached in memory the most recently
used blocks and adapt dynamically for the most frequently used blocks.
To do this a small memory portion is used as free list for the buffers available for
memory eviction.
Figure : Free list
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 29 / 43
30. The Pan Galactic Gargle Blaster
The clock sweep
The buffers have a reference counter which increase by one when the buffer is
pinned, up to a small value.
Figure : Block usage counter
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 30 / 43
31. The Pan Galactic Gargle Blaster
The clock sweep
Shamelessly copied from the file src/backend/storage/buffer/README
There is a ”free list” of buffers that are prime candidates for replacement. In
particular, buffers that are completely free (contain no valid page) are always in
this list.
To choose a victim buffer to recycle when there are no free buffers available, we
use a simple clock-sweep algorithm, which avoids the need to take system-wide
locks during common operations.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 31 / 43
32. The Pan Galactic Gargle Blaster
The clock sweep
It works like this:
Each buffer header contains a usage counter, which is incremented (up to a small
limit value) whenever the buffer is pinned. (This requires only the buffer header
spinlock, which would have to be taken anyway to increment the buffer reference
count, so it’s nearly free.)
The ”clock hand” is a buffer index, NextVictimBuffer, that moves circularly
through all the available buffers. NextVictimBuffer is protected by the
BufFreelistLock.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 32 / 43
33. The Pan Galactic Gargle Blaster
It’s bigger in the inside
The algorithm for a process that needs to obtain a victim buffer is:
1 Obtain BufFreelistLock.
2 If buffer free list is nonempty, remove its head buffer. If the buffer is pinned
or has a nonzero usage count, it cannot be used; ignore it and return to the
start of step 2. Otherwise, pin the buffer, release BufFreelistLock, and return
the buffer.
3 Otherwise, select the buffer pointed to by NextVictimBuffer, and circularly
advance NextVictimBuffer for next time.
4 If the selected buffer is pinned or has a nonzero usage count, it cannot be
used. Decrement its usage count (if nonzero) and return to step 3 to
examine the next buffer.
5 Pin the selected buffer, release BufFreelistLock, and return the buffer.
(Note that if the selected buffer is dirty, we will have to write it out before we can
recycle it; if someone else pins the buffer meanwhile we will have to give up and
try another buffer. This however is not a concern of the basic
select-a-victim-buffer algorithm.)
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 33 / 43
34. The Pan Galactic Gargle Blaster
It’s bigger in the inside
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 34 / 43
35. The Pan Galactic Gargle Blaster
It’s bigger in the inside
Since the version 8.3 the buffer manager have the ring buffer strategy.
Operations which require a large amount of buffers in memory, like VACUUM or
large tables sequential scans, have a dedicated 256kb ring buffer, small enough to
fit in the processor’s L2 cache.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 35 / 43
36. Mostly harmless
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 36 / 43
38. Mostly harmless
The XID wraparound failure
XID is a 4 byte unsigned integer.
Every 4 billions transactions the value wraps
PostgreSQL uses the modulo − 231
comparison method
For each value 2 billions XID are in the future and 2 billions in the past
When a xid’s age becomes too close to 2 billions VACUUM freezes the xmin
value to an hardcoded xid forever in the past
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 38 / 43
39. Mostly harmless
The XID wraparound failure
If for any reason an xid reaches 10 millions transactions from the wraparound
failure the database starts emitting scary messages
WARNING: database "mydb" must be vacuumed within 5770099 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".
If a xid’s age reaches 1 million transactions from the wraparound failure the
database simply shut down and can be started only in single user mode to perform
the VACUUM.
Anyway, the autovacuum deamon, even if turned off starts the required VACUUM
long before this catastrophic scenario happens.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 39 / 43
41. Mostly harmless
Boring legal stuff
All the images copyright is owned by the respective authors. The sources the
author’s attribution is provided with a link alongside with image.
The section’s titles are quotes from Douglas Adams hitchhiker’s guide to the
galaxy. No copyright infringement is intended.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 41 / 43
42. Mostly harmless
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 42 / 43
43. Mostly harmless
The hitchhiker’s guide to PostgreSQL
Federico Campoli
6 May 2016
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 43 / 43