This talk is about the issues with parallel transactions in relational databases: anomalies, locks, isolation levels, and how to work with this in JDBC and JPA.
https://github.com/kslisenko/tx-isolation
This document provides a comparison of SQL and NoSQL databases. It summarizes the key features of SQL databases, including their use of schemas, SQL query languages, ACID transactions, and examples like MySQL and Oracle. It also summarizes features of NoSQL databases, including their large data volumes, scalability, lack of schemas, eventual consistency, and examples like MongoDB, Cassandra, and HBase. The document aims to compare the different approaches of SQL and NoSQL for managing data.
This document provides an overview of MySQL, including its architecture, components, features, and functionalities. MySQL is an open-source relational database management system (RDBMS) that is flexible, scalable, and free to download. It follows a client-server model and can handle large amounts of data. Key features include security, compatibility with many operating systems, and high performance even under demanding workloads.
This document provides an introduction to big data and NoSQL databases. It begins with an introduction of the presenter. It then discusses how the era of big data came to be due to limitations of traditional relational databases and scaling approaches. The document introduces different NoSQL data models including document, key-value, graph and column-oriented databases. It provides examples of NoSQL databases that use each data model. The document discusses how NoSQL databases are better suited than relational databases for big data problems and provides a real-world example of Twitter's use of FlockDB. It concludes by discussing approaches for working with big data using MapReduce and provides examples of using MongoDB and Azure for big data.
Traditionally database systems were optimized either for OLAP either for OLTP workloads. Such mainstream DBMSes like Postgres,MySQL,... are mostly used for OLTP, while Greenplum, Vertica, Clickhouse, SparkSQL,... are oriented on analytic queries. But right now many companies do not want to have two different data stores for OLAP/OLTP and need to perform analytic queries on most recent data. I want to discuss which features should be added to Postgres to efficiently handle HTAP workload.
A database is a collection of related data organized into tables. Data is any raw fact or statistic, and is important because all decisions depend on underlying data. A database management system (DBMS) is used to organize data into tables to avoid problems with file-based storage like inconsistency, redundancy, integrity issues, and security problems. It allows for concurrent access. DBMS are widely used in real-world applications like movie theaters, prisons, and banks to manage related information. A table in a database contains records organized into rows with attributes or fields forming the columns. A key uniquely identifies each record.
The document summarizes a meetup about NoSQL databases hosted by AWS in Sydney in 2012. It includes an agenda with presentations on Introduction to NoSQL and using EMR and DynamoDB. NoSQL is introduced as a class of databases that don't use SQL as the primary query language and are focused on scalability, availability and handling large volumes of data in real-time. Common NoSQL databases mentioned include DynamoDB, BigTable and document databases.
Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and performance, as well as flexibility in schemas. Cassandra finds use in large companies like Facebook, Netflix and eBay due to its abilities to scale and perform well under heavy loads. However, it may not be suited for applications requiring many joins, transactions or strong consistency guarantees.
Developing Active-Active Geo-Distributed Apps with RedisCihan Biyikoglu
Redis CRDTs provide a simple and powerful way for developers to deploy global applications with active-active topologies using a multi-master, bi-directional replication technique.
Learn the current state of the NoSQL landscape and discover the different data models within it. From document stores and key value databases to graph and Wide Column. Then you’ll learn why wide column databases are the most appropriate for scalable high performance use cases, including capabilities for massive scale-out architecture, peer-to-peer clustering to avoid bottlenecking and built-in multi-datacenter replication.
Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB
Just over 10 years ago, Daniel Abadi proposed a new way of thinking about the engineering tradeoffs behind building scalable, distributed systems. According to Abadi, this new model, known as the PACELC theorem, comes closer to explaining the design of NoSQL systems than the well-known CAP theorem.
Watch this webinar to hear Daniel’s reflections on PACELC ten years later, explore the impact of this evolution and learn how ScyllaDB Cloud takes a unique approach to support modern applications with extreme performance and low latency at scale.
Jugal Shah has over 14 years of experience in IT working in roles such as manager, solution architect, DBA, developer and software engineer. He has worked extensively with database technologies including SQL Server, MySQL, PostgreSQL and others. He has received the MVP award from Microsoft for SQL Server in multiple years. Common causes of SQL Server performance problems include configuration issues, design problems, bottlenecks and poorly written queries or code. Various tools can be used to diagnose issues including dynamic management views, Performance Monitor, SQL Server Profiler and DBCC commands.
In Apache Pulsar Meetup, Jia Zhai from StreamNative presents KoP (Kafka-on-Pulsar) which bring native Kafka protocol support on Pulsar broker. He gave a demo about how to use Kafka clients and Pulsar clients can work seamlessly on same data, and how Kafka Connectors can work on a Pulsar cluster.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
This Edureka "What is Hadoop" Tutorial (check our hadoop blog series here: https://goo.gl/lQKjL8) will help you understand all the basics of Hadoop. Learn about the differences in traditional and hadoop way of storing and processing data in detail. Below are the topics covered in this tutorial:
1) Traditional Way of Processing - SEARS
2) Big Data Growth Drivers
3) Problem Associated with Big Data
4) Hadoop: Solution to Big Data Problem
5) What is Hadoop?
6) HDFS
7) MapReduce
8) Hadoop Ecosystem
9) Demo: Hadoop Case Study - Orbitz
Subscribe to our channel to get updates.
Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
Amazon Aurora is a MySQL-compatible relational database that provides the performance and availability of commercial databases with the simplicity and cost-effectiveness of open source databases. The document compares the performance of Amazon Aurora to MySQL and describes how Aurora achieves high performance through techniques such as doing fewer I/Os, caching results, processing asynchronously, and batching operations together. It also explains how Aurora achieves high availability through a quorum system, peer-to-peer replication to multiple Availability Zones, continuous backup to S3, and fast failover capabilities.
Oracle Database is a collection of data treated as a unit. The purpose of a database is to store and retrieve related information. Oracle Database was started in 1977 as Software Development Laboratories by Larry Ellison and others. Over time, Oracle released several major versions that added new functionality, such as Oracle 12c which was designed for cloud computing. A database server is the key to solving problems of information management by allowing storage, retrieval, and manipulation of data.
- Oracle Database is a comprehensive, integrated database management system that provides an open approach to information management.
- The Oracle architecture includes database structures like data files, control files, and redo log files as well as memory structures like the system global area (SGA) and process global area (PGA).
- Key components of the Oracle architecture include the database buffer cache, shared pool, redo log buffer, and background processes that manage instances.
Spark SQL is a module for structured data processing in Spark. It provides DataFrames and the ability to execute SQL queries. Some key points:
- Spark SQL allows querying structured data using SQL, or via DataFrame/Dataset APIs for Scala, Java, Python, and R.
- It supports various data sources like Hive, Parquet, JSON, and more. Data can be loaded and queried using a unified interface.
- The SparkSession API combines SparkContext with SQL functionality and is used to create DataFrames from data sources, register databases/tables, and execute SQL queries.
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
This document discusses the key components of a database system including applications, file systems, data views, query processors, users and administrators, data languages, transaction management, and storage managers. It provides examples of common database applications and describes how data is abstracted at the physical, logical, and view levels. It also explains the roles of DDL, DML, transactions, and storage managers in database design and management.
Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores and caches? In this session, we will describe those differences and, more importantly, provide live demonstrations of the key capabilities that could have a major impact on your architectural Java application designs.
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Edureka!
This Edureka Spark SQL Tutorial will help you to understand how Apache Spark offers SQL power in real-time. This tutorial also demonstrates an use case on Stock Market Analysis using Spark SQL. Below are the topics covered in this tutorial:
1) Limitations of Apache Hive
2) Spark SQL Advantages Over Hive
3) Spark SQL Success Story
4) Spark SQL Features
5) Architecture of Spark SQL
6) Spark SQL Libraries
7) Querying Using Spark SQL
8) Demo: Stock Market Analysis With Spark SQL
The Mechanics of Testing Large Data Pipelines (QCon London 2016)Mathieu Bastian
Talk about testing large Data Pipelines, mostly inspired from my experience at LinkedIn working on relevancy and recommender system pipelines.
Abstract: Applied machine learning data pipelines are being developed at a very fast pace and often exceed traditional web/business applications codebase in terms of scale and complexity. The algorithms and processes these data workflows implement fulfill business-critical applications which require robust and scalable architectures. But how to make these data pipelines robust? When the number of developers and data jobs grow while at the same time the underlying data change how do we test that everything works as expected?
In software development we divide things in clean, independent modules and use unit and integration testing to prevent bugs and regression. So why is it more complicated with big data workflows? Partly because these workflows usually pull data from dozens of sources out of our control and have a large number of interdependent data processing jobs. Also, partly because we don't know yet how to do or lack the proper tools.
This document provides an agenda and introduction for a presentation on Apache Cassandra and DataStax Enterprise. The presentation covers an introduction to Cassandra and NoSQL, the CAP theorem, Apache Cassandra features and architecture including replication, consistency levels and failure handling. It also discusses the Cassandra Query Language, data modeling for time series data, and new features in DataStax Enterprise like Spark integration and secondary indexes on collections. The presentation concludes with recommendations for getting started with Cassandra in production environments.
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
Why & how to optimize sql server for performance from design to queryAntonios Chatzipavlis
The document discusses optimizing SQL Server performance from design to query execution. It covers why performance tuning is necessary such as allowing systems to scale and saving costs. It then outlines techniques for optimizing database design through normalization, denormalization, and other methods. The document also provides guidance on optimizing queries for performance by identifying key metrics and monitoring tools. It describes how to optimize indexing strategies and troubleshoot SQL Server.
The document discusses transaction management in database systems. It defines a transaction as a series of reads and writes to database objects that must be atomic, consistent, isolated, and durable (ACID properties). Allowing concurrent transactions can cause anomalies if their interleaved execution results in inconsistent data. Strict two-phase locking enforces serializability to avoid anomalies by requiring transactions to obtain shared or exclusive locks before reading or writing data. The database management system uses logging and two-phase commit to ensure atomicity and recover from failures.
Learn the current state of the NoSQL landscape and discover the different data models within it. From document stores and key value databases to graph and Wide Column. Then you’ll learn why wide column databases are the most appropriate for scalable high performance use cases, including capabilities for massive scale-out architecture, peer-to-peer clustering to avoid bottlenecking and built-in multi-datacenter replication.
Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB
Just over 10 years ago, Daniel Abadi proposed a new way of thinking about the engineering tradeoffs behind building scalable, distributed systems. According to Abadi, this new model, known as the PACELC theorem, comes closer to explaining the design of NoSQL systems than the well-known CAP theorem.
Watch this webinar to hear Daniel’s reflections on PACELC ten years later, explore the impact of this evolution and learn how ScyllaDB Cloud takes a unique approach to support modern applications with extreme performance and low latency at scale.
Jugal Shah has over 14 years of experience in IT working in roles such as manager, solution architect, DBA, developer and software engineer. He has worked extensively with database technologies including SQL Server, MySQL, PostgreSQL and others. He has received the MVP award from Microsoft for SQL Server in multiple years. Common causes of SQL Server performance problems include configuration issues, design problems, bottlenecks and poorly written queries or code. Various tools can be used to diagnose issues including dynamic management views, Performance Monitor, SQL Server Profiler and DBCC commands.
In Apache Pulsar Meetup, Jia Zhai from StreamNative presents KoP (Kafka-on-Pulsar) which bring native Kafka protocol support on Pulsar broker. He gave a demo about how to use Kafka clients and Pulsar clients can work seamlessly on same data, and how Kafka Connectors can work on a Pulsar cluster.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
This Edureka "What is Hadoop" Tutorial (check our hadoop blog series here: https://goo.gl/lQKjL8) will help you understand all the basics of Hadoop. Learn about the differences in traditional and hadoop way of storing and processing data in detail. Below are the topics covered in this tutorial:
1) Traditional Way of Processing - SEARS
2) Big Data Growth Drivers
3) Problem Associated with Big Data
4) Hadoop: Solution to Big Data Problem
5) What is Hadoop?
6) HDFS
7) MapReduce
8) Hadoop Ecosystem
9) Demo: Hadoop Case Study - Orbitz
Subscribe to our channel to get updates.
Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
Amazon Aurora is a MySQL-compatible relational database that provides the performance and availability of commercial databases with the simplicity and cost-effectiveness of open source databases. The document compares the performance of Amazon Aurora to MySQL and describes how Aurora achieves high performance through techniques such as doing fewer I/Os, caching results, processing asynchronously, and batching operations together. It also explains how Aurora achieves high availability through a quorum system, peer-to-peer replication to multiple Availability Zones, continuous backup to S3, and fast failover capabilities.
Oracle Database is a collection of data treated as a unit. The purpose of a database is to store and retrieve related information. Oracle Database was started in 1977 as Software Development Laboratories by Larry Ellison and others. Over time, Oracle released several major versions that added new functionality, such as Oracle 12c which was designed for cloud computing. A database server is the key to solving problems of information management by allowing storage, retrieval, and manipulation of data.
- Oracle Database is a comprehensive, integrated database management system that provides an open approach to information management.
- The Oracle architecture includes database structures like data files, control files, and redo log files as well as memory structures like the system global area (SGA) and process global area (PGA).
- Key components of the Oracle architecture include the database buffer cache, shared pool, redo log buffer, and background processes that manage instances.
Spark SQL is a module for structured data processing in Spark. It provides DataFrames and the ability to execute SQL queries. Some key points:
- Spark SQL allows querying structured data using SQL, or via DataFrame/Dataset APIs for Scala, Java, Python, and R.
- It supports various data sources like Hive, Parquet, JSON, and more. Data can be loaded and queried using a unified interface.
- The SparkSession API combines SparkContext with SQL functionality and is used to create DataFrames from data sources, register databases/tables, and execute SQL queries.
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
This document discusses the key components of a database system including applications, file systems, data views, query processors, users and administrators, data languages, transaction management, and storage managers. It provides examples of common database applications and describes how data is abstracted at the physical, logical, and view levels. It also explains the roles of DDL, DML, transactions, and storage managers in database design and management.
Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores and caches? In this session, we will describe those differences and, more importantly, provide live demonstrations of the key capabilities that could have a major impact on your architectural Java application designs.
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Edureka!
This Edureka Spark SQL Tutorial will help you to understand how Apache Spark offers SQL power in real-time. This tutorial also demonstrates an use case on Stock Market Analysis using Spark SQL. Below are the topics covered in this tutorial:
1) Limitations of Apache Hive
2) Spark SQL Advantages Over Hive
3) Spark SQL Success Story
4) Spark SQL Features
5) Architecture of Spark SQL
6) Spark SQL Libraries
7) Querying Using Spark SQL
8) Demo: Stock Market Analysis With Spark SQL
The Mechanics of Testing Large Data Pipelines (QCon London 2016)Mathieu Bastian
Talk about testing large Data Pipelines, mostly inspired from my experience at LinkedIn working on relevancy and recommender system pipelines.
Abstract: Applied machine learning data pipelines are being developed at a very fast pace and often exceed traditional web/business applications codebase in terms of scale and complexity. The algorithms and processes these data workflows implement fulfill business-critical applications which require robust and scalable architectures. But how to make these data pipelines robust? When the number of developers and data jobs grow while at the same time the underlying data change how do we test that everything works as expected?
In software development we divide things in clean, independent modules and use unit and integration testing to prevent bugs and regression. So why is it more complicated with big data workflows? Partly because these workflows usually pull data from dozens of sources out of our control and have a large number of interdependent data processing jobs. Also, partly because we don't know yet how to do or lack the proper tools.
This document provides an agenda and introduction for a presentation on Apache Cassandra and DataStax Enterprise. The presentation covers an introduction to Cassandra and NoSQL, the CAP theorem, Apache Cassandra features and architecture including replication, consistency levels and failure handling. It also discusses the Cassandra Query Language, data modeling for time series data, and new features in DataStax Enterprise like Spark integration and secondary indexes on collections. The presentation concludes with recommendations for getting started with Cassandra in production environments.
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
Why & how to optimize sql server for performance from design to queryAntonios Chatzipavlis
The document discusses optimizing SQL Server performance from design to query execution. It covers why performance tuning is necessary such as allowing systems to scale and saving costs. It then outlines techniques for optimizing database design through normalization, denormalization, and other methods. The document also provides guidance on optimizing queries for performance by identifying key metrics and monitoring tools. It describes how to optimize indexing strategies and troubleshoot SQL Server.
The document discusses transaction management in database systems. It defines a transaction as a series of reads and writes to database objects that must be atomic, consistent, isolated, and durable (ACID properties). Allowing concurrent transactions can cause anomalies if their interleaved execution results in inconsistent data. Strict two-phase locking enforces serializability to avoid anomalies by requiring transactions to obtain shared or exclusive locks before reading or writing data. The database management system uses logging and two-phase commit to ensure atomicity and recover from failures.
Transactions group database operations so they are executed atomically despite errors or concurrency. Without transactions, inconsistencies can occur in error scenarios. Transactions provide isolation levels like read committed and serializable to prevent anomalies like dirty reads, lost updates, and write skews. Higher isolation levels have stronger consistency guarantees but reduced concurrency and performance.
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
Transactions are anything but straightforward, with each database vendor offering its unique interpretation of the term. By scrutinising the internal architectures of these databases, engineers can gain valuable insights, enabling them to write more stable applications.This talk explores the intricacies of transactions, focusing on modern databases. Delving into distributed transactions, we discuss network challenges and cloud deployments in the contemporary era. The session provides a concise examination of the internal architectures of cloud-scale, multi-tenant databases such as Spanner, DynamoDB, and Amazon Aurora.
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Mydbops
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open Source Database Meetup 15
Shivji explores the evolution of transactions, implementation challenges, and insights into distributed database environments. Whether you're a database enthusiast or a tech enthusiast, this presentation offers valuable insights into the world of database management.
Contents:
• Historical perspective of transactions
• Implementing transactions
• Challenges and trade-offs in ACID properties
• Distributed transactions in modern databases like Amazon Aurora, DynamoDB, and Google Spanner
Key Takeaways:
• Understanding the evolution of transactions in databases
• Insights into the challenges of implementing ACID properties
• Exploration of distributed transaction models in leading database systems
Geek Sync | How to Detect, Analyze, and Minimize SQL Server Blocking and LockingIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/sbap50AJw6b
Learn the good, the bad, and the ugly of blocking. Join IDERA and Hilary Cotter for this Geek Sync to understand how to monitor locking and blocking.
The goals of this session are:
-To discover what is the difference between locking, blocking and deadlocking
-To understand how to minimize blocking during OLTP operations, batch processing, bulk inserts, and large scale deletes
-To study how to use the appropriate isolation levels to reduce/increase blocking and partitioning and blocking
Attend this webinar to detect, analyze, and minimize SQL Server blocking and locking. The session will also discuss what Snapshot and Read Committed Snapshot Isolation levels are all about and when you should use them. Finally, we will go over In-Memory Tables and learn how to monitor and troubleshoot blocking processes.
Speaker: Hilary Cotter is a 20 year IT veteran who has answered over 20,000 questions on the forums. Some of them correctly. He specializes in HA technologies, especially replication, performance tuning, full-text search, and SQL Server Service Broker. Hilary is also an author, or contributor on a number of books on SQL Server.
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...Rajesh Kannan S
The document discusses using OLTP systems at high scale for inventory management in Flipkart's supply chain. It describes how Flipkart uses:
- A high throughput, high concurrency inventory management system to handle reservations for orders
- Data encoding and custom transaction processing logic in the data store to improve concurrency
- Transaction buffering and slave reduction to work around performance limitations of the data store
It also discusses how Flipkart uses the Tungsten replication tool to enable real-time data propagation from MySQL databases to Vertica for operational dashboards, overcoming challenges like schema changes, transaction rollbacks, and operational overhead during database switches.
This document discusses transaction management and concurrency control. It defines a transaction as a logical unit of work that must be completed or aborted with no intermediate states. It describes the ACID properties of atomicity, consistency, isolation, and durability that transactions should have. It also discusses concurrency control techniques like locking and time stamping to ensure transactions execute serially for consistency despite concurrent access.
The Nightmare of Locking, Blocking and Isolation Levels!Boris Hristov
I am sure you all know that troubleshooting problems related to locking and blocking (hey, sometimes there are deadlocks too) can be a real nightmare! In this session, you will be able to see and understand why and how locking actually works, what problems it causes and how can we use isolation levels and various other techniques to resolve them!
This document discusses best practices for safely performing database migrations in Rails. It explains that migrations should avoid operations that lock tables for long periods, like adding columns with default values or backfilling large amounts of data inside transactions. Instead, it recommends adding columns without defaults, backfilling data in batches with transactions, and adding Postgres indexes concurrently. The document stresses testing performance before and after migrations to understand any impacts, and provides tools like strong_migrations gem to catch unsafe patterns early. The key lessons are to separate schema and data changes, avoid long-running transactions or locks, and monitor performance throughout the migration process.
Concurrency Control Techniques: Concurrency Control, Locking Techniques for Concurrency
Control, Time Stamping Protocols for Concurrency Control, Validation Based Protocol, Multiple
Granularity, Multi Version Schemes, Recovery with Concurrent Transaction,
The accuracy, internal quality, and reliability of data is frequently referred to using the term 'data integrity'. Without it, data is less valuable or even useless. This session takes a close look at what data integrity entails and how it can be enforced in multi-tier application architectures using distributed data sources and global transactions. The discussion will make clear which elements are required from any robust implementation of data oriented business rules aka data constraints and it will explain how most existing solutions are not as watertight as is frequently assumed. Steps for achieving reliable constraint enforcement are demonstrated.
Data Integrity in Java Applications (JFall 2013)Lucas Jellema
The accuracy, internal quality, and reliability of data is frequently referred to using the term 'data integrity'. Without it, data is less valuable or even useless. This session takes a close look at what data integrity entails and how it can be enforced in multi-tier application architectures using distributed data sources and global transactions. The discussion will make clear which elements are required from any robust implementation of data oriented business rules aka data constraints and it will explain how most existing solutions are not as watertight as is frequently assumed. Steps for achieving reliable constraint enforcement are demonstrated.
Summary
- what is data integrity
- types of data constraints (and various levels: attribute, record, inter-entity)
- what is the notion of a transaction (and a commit)
- data constraint enforcement in various tiers of enterprise applications: user interface (client side), web tier, business service, database
- what are the challenges for implementing data integrity in a multi user environment; what are the additional challenges in an environment with multiple independent data sources
- demonstrate a common implementation of data integrity - starting at the UI and adding additional enforcement working our way down through the tiers
- make clearly visible how because of multi-session, data caching, clustering etc. most implementations look reasonable enough but lack robustness
- explain and demonstrate how some degree of locking is required to provide true data integrity in a multi-session environment; explain what the finest grained level of locking should be and how that can be implemented.
Unlocking the secrets of successful architects: what skills and traits do you...Constantine Slisenka
Let's dive into the exciting world of architecture! As a solution architect, you'll use your tech-savvy skills to solve complex business problems. But it's not just about the tech - you'll need to understand the business goals, work with stakeholders, and make tough trade-offs. In this walk, we'll take a sample project and go through the stages of architectural design. You'll learn how to gather requirements, create a system design, and communicate your decisions effectively. By the end, you'll be equipped with the practical knowledge to become a successful architect. So, are you ready to unleash your inner architect and build something amazing? Let's go!
Lyft talks #4 Orchestrating big data and ML pipelines at LyftConstantine Slisenka
This document discusses how Lyft uses big data and machine learning pipelines at scale. It provides examples of how Lyft uses data to improve maps, calculate optimal pickup locations, detect inaccurate destinations, estimate routes and times, and forecast demand and supply. It describes Lyft's data ecosystem including 50PB of data in S3, 650k ETL pipeline runs processing 24M tasks monthly. It also summarizes the key differences between Airflow and Flyte orchestration tools, when each is best suited, and provides a live demo of Flyte.
The document discusses the process of architecture and what it takes to be an architect. It involves identifying business goals, gathering requirements from stakeholders, making design decisions, documenting the architecture, and communicating design decisions. Key aspects of architecture include styles/patterns, reference architectures, and creating views like context, component, and deployment diagrams. The architect must understand business goals, requirements, and make design choices to achieve them while documenting decisions in an architecture log.
This talk is useful for engineers who are strong enough in technology and coding and want to become architects. There is plenty of information about particular technologies and lots of talks about that in conferences. But good engineers sometimes have gap in mapping business to technology, and also in communicating and presenting their design decisions. The goal of my talk is showing these gaps and provide possible areas to further growth.
WebSocket is cool, and you probably already played with it. But it’s just a transport technology. If you have thousands of client connections you need to do lots of improvements to make it scalable, reliable and achieve high performance. You need to implement many things on top of it.
We are building financial data streaming platform for thousands of traders using WebSocket. I’m going to share my experience and cover such techniques as delta delivery, conflation, dynamic throttling, bandwidth and frequency limitation and other. I will also do a live demo of how to build scalable WebSocket backend from scratch using Java and Spring.
WebSocket is cool, and you probably already played with it. But it’s just a transport technology. If you have thousands of client connections you need to do lots of improvements to make it scalable, reliable and achieve high performance. You need to implement many things on top of it.
We are building financial data streaming platform for thousands of traders using WebSocket. I’m going to share my experience and cover such techniques as delta delivery, conflation, dynamic throttling, bandwidth and frequency limitation and other. I will also do a live demo of how to build scalable WebSocket backend from scratch using Java and Spring.
«When systems are not just dozens of subsystems, but dozens of engineering teams, even our best and most experienced engineers routinely guess wrong about the root cause of poor end-to-end performance» — that’s what think in Google.
Latency tracing approach helps Google and many other companies to control stability and performance as well as helps to find root causes of performance degradation even in huge and complex distributed systems.
I’ll tell about what is latency tracing, how that helps you, and how you can implement it in your project. Finally I will show live demo using such tools as Dynatrace and Zipkin.
examples: https://github.com/kslisenko/java-performance
http://javaday.org.ua/kanstantsin-slisenka-profiling-distributed-java-applications/
We are familiar with transactions in databases, messaging systems and caches. When messaging system works fine, but database transaction fails - we are in big trouble! How to make all this guys work as a team? - What need to be used is distributed transactions or compensation mechanisms. On this talk we see both approaches and discuss how to take advantage of them when we are building Microservices architecture.
See code examples here: https://github.com/kslisenko/distributed-transactions
This document provides an overview of best practices for building data streaming APIs. It discusses various techniques for implementing streaming such as TCP/UDP multicast, HTTP streaming, WebSocket, and push notifications. It also covers challenges like protocol fallback, API design, fault tolerance, security, and data optimization. Finally, it lists several streaming libraries, tools and cloud services that can be used to build streaming applications and APIs.
Let's dive under the hood of Java network applications. We plan to have a deep look to classic sockets and NIO having live coding examples. Then we discuss performance problems of sockets and find out how NIO can help us to handle 10000+ connections in a single thread. And finally we learn how to build high load application server using Netty.
https://github.com/kslisenko/java-networking
English:
Sometimes java applications are not as fast as we expect. What if our system contains of hundreds JVMs, databases and other components? We try using profiler, however can't find the bottleneck.
In this talk we discuss:
1. How to profile single JVM
- what is profiler and how does it work
- write simple profiler using java agent and byte-code modification
2. How to profile distributed system
- how engineers in Google doing this
- look at commercial and open source solutions: Dynatrace and Zipkin
- connect to demo-system and see live demos
Russian:
Порой наши java-программы работают медленней, чем хотелось бы. Иногда мы даже подключаемся профайлером, чтобы посмотреть, где тормозит.
А что если наша система состоит из десятков/сотен JVM, баз данных и других компонентов?
На этом техтолке мы обсудим:
1. Как профилировать одну JVM
- что такое профайлер и как он работает под капотом
- напишем простой профайлер с помощью java-агента и байт-код модификаций
2. Как профилировать сложную распределённую систему
- разберёмся как это делают инженеры в Google
- посмотрим готовые коммерческие и open-source решения: Dynatrace и Zipkin
- подключимся к демо-системе и увидим всё своими глазами
https://github.com/kslisenko/java-performance
Creating Automated Tests with AI - Cory House - Applitools.pdfApplitools
In this fast-paced, example-driven session, Cory House shows how today’s AI tools make it easier than ever to create comprehensive automated tests. Full recording at https://applitools.info/5wv
See practical workflows using GitHub Copilot, ChatGPT, and Applitools Autonomous to generate and iterate on tests—even without a formal requirements doc.
Landscape of Requirements Engineering for/by AI through Literature ReviewHironori Washizaki
Hironori Washizaki, "Landscape of Requirements Engineering for/by AI through Literature Review," RAISE 2025: Workshop on Requirements engineering for AI-powered SoftwarE, 2025.
WinRAR Crack for Windows (100% Working 2025)sh607827
copy and past on google ➤ ➤➤ https://hdlicense.org/ddl/
WinRAR Crack Free Download is a powerful archive manager that provides full support for RAR and ZIP archives and decompresses CAB, ARJ, LZH, TAR, GZ, ACE, UUE, .
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]saimabibi60507
Copy & Past Link 👉👉
https://dr-up-community.info/
PRTG Network Monitor is a network monitoring software developed by Paessler that provides comprehensive monitoring of IT infrastructure, including servers, devices, applications, and network traffic. It helps identify bottlenecks, track performance, and troubleshoot issues across various network environments, both on-premises and in the cloud.
Best Accounting Practice Management Software Guide for 2025Tidyflow
This slide deck, “Best Accounting Practice Management Software Guide for 2025,” is designed for solo CPAs, small accounting firms, and growing mid-sized practices looking to navigate the crowded and sometimes overwhelming world of accounting practice management tools.
In 2025, the pressure on accounting firms is higher than ever. Clients expect fast, clear communication. Teams are often working remotely or in hybrid setups. Deadlines are tight, and regulations are constantly changing. Choosing the right practice management software can make all the difference — helping firms stay organized, automate repetitive work, collaborate smoothly, and ultimately deliver better service to clients.
This presentation offers a clear, unbiased comparison of seven of the most popular practice management platforms used by accounting firms today, including:
Tidyflow — a lean, no-bloat tool built for small and growing firms;
Karbon — a powerful platform with integrated email and advanced team workflows;
TaxDome — a robust system designed especially for tax-focused firms;
Canopy — a modular, flexible solution for firms that want to mix and match tools;
Financial Cents — a simple, budget-friendly option for small firms;
Jetpack Workflow — a lightweight workflow tracker great for solo practitioners and small teams;
Pixie — a client-focused platform offering unlimited users at a flat price.
Each software is broken down with key details on pricing, best-fit firm type, notable features, integrations, pros, and cons. We’ve also included a quick comparison table upfront to help busy practitioners scan for top-line differences.
What you’ll learn from this slide deck:
✅ Which tools are best suited for your firm’s size and growth stage
✅ What features (client portals, time tracking, automation, integrations) matter most
✅ How pricing models compare across popular providers
✅ Where each software excels — and where it might fall short
The final slides offer guidance on how to choose the right software for your unique needs, with a checklist covering firm size, required features, ease of use, integrations, and budget considerations.
Whether you’re looking for an affordable, scalable option like Tidyflow or a powerful enterprise-level solution like Karbon, this guide will help you make a more informed, confident decision for 2025 and beyond.
For more details, visit tidyflow.com — a modern practice management platform designed specifically for small and growing accounting firms that want a simple, affordable solution without enterprise-level complexity.
Navigating EAA Compliance in Testing.pdfApplitools
Designed for software testing practitioners and managers, this session provides the knowledge and tools needed to be prepared, proactive, and positioned for success with EAA compliance. See the full session recording at https://applitools.info/0qj
Interactive Odoo Dashboard for various business needs can provide users with dynamic, visually appealing dashboards tailored to their specific requirements. such a module that could support multiple dashboards for different aspects of a business
✅Visit And Buy Now : https://bit.ly/3VojWza
✅This Interactive Odoo dashboard module allow user to create their own odoo interactive dashboards for various purpose.
App download now :
Odoo 18 : https://bit.ly/3VojWza
Odoo 17 : https://bit.ly/4h9Z47G
Odoo 16 : https://bit.ly/3FJTEA4
Odoo 15 : https://bit.ly/3W7tsEB
Odoo 14 : https://bit.ly/3BqZDHg
Odoo 13 : https://bit.ly/3uNMF2t
Try Our website appointment booking odoo app : https://bit.ly/3SvNvgU
👉Want a Demo ?📧 [email protected]
➡️Contact us for Odoo ERP Set up : 091066 49361
👉Explore more apps: https://bit.ly/3oFIOCF
👉Want to know more : 🌐 https://www.axistechnolabs.com/
#odoo #odoo18 #odoo17 #odoo16 #odoo15 #odooapps #dashboards #dashboardsoftware #odooerp #odooimplementation #odoodashboardapp #bestodoodashboard #dashboardapp #odoodashboard #dashboardmodule #interactivedashboard #bestdashboard #dashboard #odootag #odooservices #odoonewfeatures #newappfeatures #odoodashboardapp #dynamicdashboard #odooapp #odooappstore #TopOdooApps #odooapp #odooexperience #odoodevelopment #businessdashboard #allinonedashboard #odooproducts
Top 12 Most Useful AngularJS Development Tools to Use in 2025GrapesTech Solutions
AngularJS remains a popular JavaScript-based front-end framework that continues to power dynamic web applications even in 2025. Despite the rise of newer frameworks, AngularJS has maintained a solid community base and extensive use, especially in legacy systems and scalable enterprise applications. To make the most of its capabilities, developers rely on a range of AngularJS development tools that simplify coding, debugging, testing, and performance optimization.
If you’re working on AngularJS projects or offering AngularJS development services, equipping yourself with the right tools can drastically improve your development speed and code quality. Let’s explore the top 12 AngularJS tools you should know in 2025.
Read detail: https://www.grapestechsolutions.com/blog/12-angularjs-development-tools/
Wilcom Embroidery Studio Crack 2025 For WindowsGoogle
Download Link 👇
https://techblogs.cc/dl/
Wilcom Embroidery Studio is the industry-leading professional embroidery software for digitizing, design, and machine embroidery.
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!Message Blink
Learn how to send single or bulk SMS directly from your Salesforce dashboard using Message Blink. No switching tabs—just faster, smarter communication within your CRM.
Not So Common Memory Leaks in Java WebinarTier1 app
This SlideShare presentation is from our May webinar, “Not So Common Memory Leaks & How to Fix Them?”, where we explored lesser-known memory leak patterns in Java applications. Unlike typical leaks, subtle issues such as thread local misuse, inner class references, uncached collections, and misbehaving frameworks often go undetected and gradually degrade performance. This deck provides in-depth insights into identifying these hidden leaks using advanced heap analysis and profiling techniques, along with real-world case studies and practical solutions. Ideal for developers and performance engineers aiming to deepen their understanding of Java memory management and improve application stability.
Best Practices for Collaborating with 3D Artists in Mobile Game DevelopmentJuego Studios
Discover effective strategies for working with 3D artists on mobile game projects. Learn how top mobile game development companies streamline collaboration with 3D artists in Dubai for high-quality, optimized game assets.
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...OnePlan Solutions
When budgets tighten and scrutiny increases, portfolio leaders face difficult decisions. Cutting too deep or too fast can derail critical initiatives, but doing nothing risks wasting valuable resources. Getting investment decisions right is no longer optional; it’s essential.
In this session, we’ll show how OnePlan gives you the insight and control to prioritize with confidence. You’ll learn how to evaluate trade-offs, redirect funding, and keep your portfolio focused on what delivers the most value, no matter what is happening around you.
Building Apps for Good The Ethics of App DevelopmentNet-Craft.com
This article explores the critical ethical considerations that application development phoenix companies and individual app developers phoenix az must address to ensure they are building apps for good, contributing positively to society, and fostering user trust. Know more https://www.net-craft.com/blog/2025/04/29/ethics-in-app-development/
Why Tapitag Ranks Among the Best Digital Business Card ProvidersTapitag
Discover how Tapitag stands out as one of the best digital business card providers in 2025. This presentation explores the key features, benefits, and comparisons that make Tapitag a top choice for professionals and businesses looking to upgrade their networking game. From eco-friendly tech to real-time contact sharing, see why smart networking starts with Tapitag.
https://tapitag.co/collections/digital-business-cards
🌍📱👉COPY LINK & PASTE ON GOOGLE https://techblogs.cc/dl/ 👈
MathType Crack is a powerful and versatile equation editor designed for creating mathematical notation in digital documents.
4. 4CONFIDENTIAL
•Transaction phenomena and isolation levels
•Pessimistic and optimistic approaches
•Transaction isolation in MySQL
•Database-level locks in MySQL
•JPA features for locking
Agenda
7. 7CONFIDENTIAL
• Problem of concurrent updates made
by parallel transactions
• No problem if no concurrent updates
• Databases have protection
Transaction phenomena
PROBLEM PHENOMENA
• Dirty read
• Non-repeatable read
• Phantom insert
8. 8CONFIDENTIAL
Transaction phenomena: dirty read
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
DATABASES ARE
PROTECTED AGAINST
THIS IN REAL LIFE
~ not committed data
• Transactions can read not committed
(dirty) data of each other
• Other transaction rollbacks, decision
was made based on never existed
data
PROBLEM
9. 9CONFIDENTIAL
• Transactions can read not committed
(dirty) data of each other
• Other transaction rollbacks, decision
was made based on never existed
data
Transaction phenomena: dirty read
PROBLEM
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
balance = 1100
customer balance
tom ~1100~
~ not committed data
DATABASES ARE
PROTECTED AGAINST
THIS IN REAL LIFE
10. 10CONFIDENTIAL
• Transactions can read not committed
(dirty) data of each other
• Other transaction rollbacks, decision
was made based on never existed
data
Transaction phenomena: dirty read
PROBLEM
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
balance = 1100
read not
committed
balance = 1100
customer balance
tom ~1100~
customer balance
tom ~1100~
~ not committed data
DATABASES ARE
PROTECTED AGAINST
THIS IN REAL LIFE
11. 11CONFIDENTIAL
• Transactions can read not committed
(dirty) data of each other
• Other transaction rollbacks, decision
was made based on never existed
data
Transaction phenomena: dirty read
PROBLEM
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
balance = 1100
read not
committed
balance = 1100
rollback
customer balance
tom ~1100~
customer balance
tom 1000
customer balance
tom ~1100~
~ not committed data
DATABASES ARE
PROTECTED AGAINST
THIS IN REAL LIFE
12. 12CONFIDENTIAL
• One transaction updates data
• Other transaction reads data several
times and get different results
Transaction phenomena: non-repeatable read
PROBLEM
Transaction 1 Transaction 2customer balance
tom 1000
begin begin
WHEN WE CAN LIVE WITH THIS
• We are fine with not the most
recent data
~ not committed data
time
13. 13CONFIDENTIAL
Transaction phenomena: non-repeatable read
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
read
balance = 1000
customer balance
tom 1000
~ not committed data
• One transaction updates data
• Other transaction reads data several
times and get different results
PROBLEM
WHEN WE CAN LIVE WITH THIS
• We are fine with not the most
recent data
14. 14CONFIDENTIAL
Transaction phenomena: non-repeatable read
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
balance = 900
commit
read
balance = 1000
customer balance
tom 900
customer balance
tom 1000
~ not committed data
• One transaction updates data
• Other transaction reads data several
times and get different results
PROBLEM
WHEN WE CAN LIVE WITH THIS
• We are fine with not the most
recent data
15. 15CONFIDENTIAL
Transaction phenomena: non-repeatable read
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
read
balance = 1000
customer balance
tom 900
customer balance
tom 1000
~ not committed data
customer balance
tom 900
read
balance = 900
balance = 900
commit
• One transaction updates data
• Other transaction reads data several
times and get different results
PROBLEM
WHEN WE CAN LIVE WITH THIS
• We are fine with not the most
recent data
16. 16CONFIDENTIAL
• One transaction inserts/deletes rows
• Other transaction reads several
times and get different number of
rows
Transaction phenomena: phantom
PROBLEM
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
WHEN WE CAN LIVE WITH THIS
• Read single rows, not ranges
• We are fine with not the most recent
data
~ not committed data
17. 17CONFIDENTIAL
Transaction phenomena: phantom
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
get all customers
where balance < 2000
got 1 record
customer balance
tom 1000
~ not committed data
• One transaction inserts/deletes rows
• Other transaction reads several
times and get different number of
rows
PROBLEM
WHEN WE CAN LIVE WITH THIS
• Read single rows, not ranges
• We are fine with not the most recent
data
18. 18CONFIDENTIAL
Transaction phenomena: phantom
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
get all customers
where balance < 2000
got 1 record
customer balance
tom 1000
jerry 500
customer balance
tom 1000
insert new customer
commit
~ not committed data
• One transaction inserts/deletes rows
• Other transaction reads several
times and get different number of
rows
PROBLEM
WHEN WE CAN LIVE WITH THIS
• Read single rows, not ranges
• We are fine with not the most recent
data
19. 19CONFIDENTIAL
Transaction phenomena: phantom
Transaction 1 Transaction 2customer balance
tom 1000
time
begin begin
get all customers
where balance < 2000
got 1 record
customer balance
tom 1000
jerry 500
customer balance
tom 1000
insert new customer
commit
customer balance
tom 1000
jerry 500
get all customers
where balance < 2000
got 2 records
~ not committed data
• One transaction inserts/deletes rows
• Other transaction reads several
times and get different number of
rows
PROBLEM
WHEN WE CAN LIVE WITH THIS
• Read single rows, not ranges
• We are fine with not the most recent
data
21. 21CONFIDENTIAL
Transaction isolation levels (standard)
• ISO/IEC 9075:1992
• Information technology --
Database languages -- SQL
http://www.contrib.andrew.cmu.edu/~shadow/s
ql/sql1992.txt
READ
UNCOMMITED
READ
COMMITED
REPEATABLE
READ
SERIALIZABLE
Dirty read YES NO NO NO
Non-
repeatable
read
YES YES NO NO
Phantom YES YES YES NO
• Defined in SQL92
• Trade-off between performance, scalability and data protection
• Same work performed with the same inputs may result in
different answers, depending on isolation level
• Implementation can be VERY DIFFERNT in different databases
23. 23CONFIDENTIAL
• Locking rows or ranges
• Like ReadWriteLock/synchronized in Java
• Concurrent transactions wait until lock is released
Optimistic and pessimistic approaches
PESSIMISTIC OPTIMISTIC
• Multi version concurrency control (MVCC)
• Doesn’t lock anything
• Save all versions of data
• We work with data snapshots
• Like GIT/SVN
24. 24CONFIDENTIAL
• Shared lock – read lock, many owners
• Exclusive lock – write lock, one owner
Pessimistic locking
BY OWNERSHIP
range lock row lock user balance
tom 1000
jerry 1500
EXCLUSIVE
T1
SHARED
EXCLUSIVE SHARED
EXCLUSIVE SHARED
• Row - specific rows by index (if index exists)
• Range – all records by condition
range lock row lock user balance
tom 1000
jerry 1500
EXCLUSIVE
T1
SHARED
EXCLUSIVE SHARED
EXCLUSIVE SHARED
range lock row lock user balance
tom 1000
jerry 1500
EXCLUSIVET1
SHARED
EXCLUSIVE SHARED
EXCLUSIVE SHARED
BY SCOPE
T2
T2
range lock row lock user balance
tom 1000
jerry 1500
EXCLUSIVE
T1
SHARED
EXCLUSIVE SHARED
EXCLUSIVE SHARED
T2T3
T2
T3
25. 25CONFIDENTIAL
Optimistic multi version concurrency control (MVCC)
Updated user balance Deleted
0 tom 1000
1 tom 1000
1100
2
Transaction 1 (TS=0) READ
Transaction 2 (TS=1) WRITE
• Transactions see the rows with version less or equal to transaction start time
• On update: new version is added
• On remove: deleted column is set of new version number
HOW IT WORKS
Transaction 3 (TS=2)
DELETE
26. 26CONFIDENTIAL
Optimistic MVCC vs pessimistic locks
MVCC (OPTIMISTIC) LOCKS (PESSIMISTIC)
Behavior
1. Each transaction works with
it’s own version
2. Concurrent transaction fails
1. Transaction which owns lock
works with data
2. Concurrent transactions wait
Locks NO YES
Performance and
scalability
GOOD BAD
Deadlocks NO POSSIBLE
Guarantee of recent data
version
NO YES
Extra disk space needed YES NO
Durability
better (because
of saved versions)
27. 27CONFIDENTIAL
DB CONCEPT
READ
UNCOMMITED
READ COMMITED REPEATABLE READ SERIALIZABLE SPECIFICS
Oracle MVCC
NOT
SUPPORTED
DEFAULT
return new snapshot
each read
NOT SUPPORTED
returns
snapshot of
data at
beginning of
transaction
+ READ ONLY LEVEL
transaction only sees data at the moment
of start, writes not allowed
always returns snapshots, transaction fail
when concurrent update
MySQL
(InnoDB)
MVCC
return new snapshot
each time
DEFAULT
- save snapshot at first
read
- return it for next reads
- locks ranges
- transaction
lifetime
- Shared lock
on select
MSSQL LOCKS
+ Double read
phenomena:
able to read
same row twice
while it is
migrating to
another place
on disk
SNAPSHOT
(optimistic)
- locks rows
- transaction lifetime
- locks ranges
- transaction
lifetime
- selects:
shared range
lock
- updates:
exclusive lock
+ SNAPSHOT LEVEL
- save snapshot at first read
- return it for next reads
- transactions fail in case of optimistic
lock concurrent update
return new snapshot
each time
LOCK (pessimistic)
DEFAULT
- locks rows
- statements
lifetime
PostgreSQL MVCC
NOT
SUPPORTED
DEFAULT
return new snapshot
each read
- save snapshot at first
read
- return it for next reads
predicate
locking
(optimistic)
always returns snapshots, transaction fail
when concurrent update
Transaction isolation levels in different databases
30. 30CONFIDENTIAL
• SELECT … LOCK IN SHARE MODE – shared (read) lock
• SELECT … FOR UPDATE – exclusive (write) lock
Pessimistic locking of specific rows/ranges (MySQL)
LOCKING SELECTS
IDEA
• Increase isolation level for specific rows/ranges
• Other rows/ranges can have lower isolation level
32. 32CONFIDENTIAL
Database deadlocks
row locks user balance
tom 1000
jerry 1500
Transaction 1
EXCLUSIVE
SHAREDEXCLUSIVE
SHARED
Transaction 2
Database deadlocks happen because of bad application architecture design
• Take locks in same order in every transaction
• Use as small as possible transactions
HOW TO PREVENT DEADLOCKS
35. 35CONFIDENTIAL
JPA features for locking
Enum LockModeType
PESSIMISTIC_READ Shared lock
PESSIMISTIC_WRITE Exclusive lock
EntityManager
lock(Object entity, LockModeType
lockMode)
Makes additional locking select query just to
lock entity
find(Class<T> entityClass, Object
primaryKey, LockModeType lockMode)
Makes locking select when reading entity
refresh(Object entity, LockModeType
lockMode)
Makes locking select when reloading entity
NamedQuery
@NamedQuery(name=“myQuery”, query=“…”,
lockMode=LockModeType.PESSIMISTIC_READ)
Allows to specify that we need locking select
to any query
• Complex to manually lock
entity relationships
• @NamedQuery is only way to
specify query lock
ADVANTAGES
• It is really simple
• Database specific things are
hidden from developer
• Supports parent-child entities
DRAWBACKS
36. 36CONFIDENTIAL
Transaction isolation and locking with JPA
• Repeatable reads because of EntityManager’s cache
• Requests do not always go to database
Behavior = EntityManager + 2’nd lvl cache + database
Database
user balance
tom 1000
jerry 1500
2’nd level cache
user balance
tom 900
EntityManager
1’st level cache
user balance
tom ~1100~CLIENT
APPLICATION
37. 37CONFIDENTIAL
Conclusion
• Do you have problems because of concurrent updates?
– Same as concurrent programming in Java
– Sometimes we can allow phenomena
• Transaction Isolation is trade-off between data protection and performance
• Two main approaches in databases implementation:
– Optimistic: no locks, data is versioned
– Pessimistic: range and low locks
• JPA
– Simplifies usage of pessimistic locking
– Adds own specific behavior because of caches
• For better performance:
– Prefer smaller transactions: they hold long time locks and can make deadlocks
– Be careful with declarative transaction management it can make heavy transactions