SlideShare a Scribd company logo
Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
Agenda What is database refactoring about? Evolutionary database development techniques Refactoring Strategies Classification of refactorings and examples
What is database refactoring about? Improving database design  Making small and incremental changes to the schema Maintain existing information and behaviour Functionality is not added/removed Not just limited to the database, but also the applications that use it
A simple example… Customer accesses balance Customer SynchronizeAccountBalance {event = on update |on delete|on insert, drop date = <date> } balance SynchronizeCustomerBalance {event = on update |on delete|on insert, drop date = <date> } {drop date = <date>} App A App B maintainbalance() maintainbalance() customerId <<PK>> name Account accountId <<PK>> customerId <<FK>>
Why refactor ? Data models built upfront tend to be complex and need cleaning Maintain consistency between application domain and data model Address performance requirements Identify and eliminate db smells
Database Smells Multipurpose Column -  eg.   Customer dob & employee start date Multipurpose Table  – eg. Customer table with person/corps Redundant Data  – same information in different tables Table with too many columns  – eg. Customer with many address Table with too many rows  Smart columns –  eg. Data has positional context Fear of change –  too risky to change schema, time to refactor!
Evolutionary Database Development Evolve data models vs upfront design Database regression testing Configuration management of database artifacts Developer Sandboxes
Database regression testing Test the schema Check logic in stored procedures and triggers Test check and referential constraints View definitions Default Values and Invariants Test application code Unit tests around application code which queries the db. Test data migration
Config management of DB Artifacts Schema creation scripts Data loading/migration scripts Reference data Stored procedures View definitions Test data Regression Tests
Developer Sandboxes
Database Refactoring Strategies Apply small changes Small changes allow easy/early detection of errors Identify Individual Refactorings Instead of doing “move column” and “rename column” in one go, version each individually. Create database configuration table Helps identify current version of the database and can be used in migrations.
Database Refactoring Strategies (contd.) Determine synchronization strategies during transition period Triggers do real time update but might have performance impacts. Views might not supports updates but do not move data Batch synch can be used during non-peak loads but might have to deal with multiple updates Encapsulate Database Access Abstract database access eg. By using persistence frameworks
Database Refactoring Classification Structural Data Quality Referential Architectural Method
Structural Refactorings Related to structure of Tables, Views eg.  Move Column, Rename Table, Split Table, Merge Column Issues to consider when implementing: Cyclic Triggers Broken Views, Procedures, Triggers Transition period in multi-application setup
Introduce Surrogate Key Motivations Reduce coupling between schema and business domain Increase consistency by having a uniform key strategy Improve performance by having index based on simpler key Potential Tradeoffs Surrogate keys are not suitable for all situations Introducing a new key might require further key consolidation and more effort “ Replace an existing natural key with a surrogate key”
Introduce Surrogate Key (contd.) contains balance PopulateOrderId {event = on insert drop date = <date> } orderId   <<FK>> <<surrogate>> orderId   <<PK>> <<surrogate>> {drop date = <date>} Order customerNumber   <<PK>> <<FK>> <<Natural>> storeId   <<PK>> <<Natural>> OrderItem customerNumber   <<PK>> <<FK>> <<Natural>> storeId   <<PK>> <<Natural>> orderItemNumber  <<PK>>
Data Quality Refactorings Related to improving quality of information in db eg.  Add Lookup Table, Introduce column constraint, Introduce common format Issues to consider when implementing: Constraint violations Broken logic in procedures Broken  where  clauses in Views Updating large amounts of data
Add Lookup Table Motivations Introduce referential integrity for a column Provide code lookup (move enum to the db) Replace column constraint with set of expected values in lookup table Potential Tradeoffs Identifying the data to populate (especially for multiple apps) Possible performance impact due to additional joins “ Create a lookup table for an existing column”
Add Lookup Table (contd.) Address Street <<FK>> 1. Identify the column 4. Introduce FK constraint 3. Populate Data 2. Create Lookup Table State PostCode State State <<PK>>  Name
Referential Integrity Refactorings Changes that improve referential integrity of data eg.  Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for history Issues to consider when implementing: Fix broken CRUD logic in procedure Data cleansing to make new constraints work
Introduce Cascading Delete Motivations Preserve referential integrity of the parent /child rows Remove responsibility for child deletion in the application Potential Tradeoffs Deadlock ? Trigger accidental mass deletion when deleting root nodes Duplicate functionality is introduced when using persistence  frameworks like Hibernate/Toplink “ Delete the child record(s) when the parent is deleted”
Introduce Cascading Delete (contd.) Policy PolicyId <<PK>> Claim ClaimId <<PK>>  1. Identify the column 2. Choose cascading mechanism (triggers or using  cascade  clause during constraint creation) PolicyId <<FK>>  DeleteClaim {event = on delete}
Architectural Refactorings Changes that improve performance, portability and define the architecture within the database eg.  Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for history Issues to consider when implementing: Performance vs Data redundancy  Keeping business logic in the application vs database
Introduce Index Motivations Increase performance of read queries Potential Tradeoffs Too many indexes degrade performance during insert/update/deletes Existing data containing duplicates might need cleansing when introducing unique indexes “ Introduce a unique or non-unique Index”
Introduce Index (contd.) Customer CustomerId <<PK>> TFN <<index>>  1.  Determine type of index – unique vs non-unique 3.  Add a new index TFN <<AK>> Name 4.  Add more disk space for index maintenance 2.  Eliminate duplicate rows when using unique index
Method Refactorings Changes that improve code representing stored procedures, functions and triggers eg.  Rename Method, Reorder Parameters,  Replace literal with Table Lookup Issues to consider when implementing: Broken triggers, procedures, functions Tool support
Refactoring Tools Schema Migration – Rails Migration, Sundog Unit Testing –JUnit, DBUnit Refactor Stored Procedures – SQLRefactor(SQLServer Only)

More Related Content

Viewers also liked (7)

PPT
Representing Patterns In Uml Andy Bulka 200610
melbournepatterns
 
PPTX
Agile DDD Genuin Objects
Jukka Tamminen
 
ODP
Paul Viiding
Merle Rekaya
 
PPS
創意變化1
elti
 
PDF
Domain Driven Design Thoughts Mat Holroyd
melbournepatterns
 
PPT
Uml2 David Kemp 20060716
melbournepatterns
 
PDF
Domain Driven Design Mat Holroyd
melbournepatterns
 
Representing Patterns In Uml Andy Bulka 200610
melbournepatterns
 
Agile DDD Genuin Objects
Jukka Tamminen
 
Paul Viiding
Merle Rekaya
 
創意變化1
elti
 
Domain Driven Design Thoughts Mat Holroyd
melbournepatterns
 
Uml2 David Kemp 20060716
melbournepatterns
 
Domain Driven Design Mat Holroyd
melbournepatterns
 

Similar to Database Refactoring Sreeni Ananthakrishna 2006 Nov (20)

PPT
Refactoring database
Jiang Zhu
 
PPTX
Database Basics
Abdel Moneim Emad
 
PDF
Gaelyk - Web Apps In Practically No Time
Saltmarch Media
 
PPTX
Physical Design and Development
Er. Nawaraj Bhandari
 
PPT
Evolutionary db development
Open Party
 
PDF
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
PPTX
Data Access Tech Ed India
rsnarayanan
 
PPTX
SQL Server Development Tools & Processes Using Visual Studio 2010
Ayman El-Hattab
 
PPTX
Evolutionary database design
Salehein Syed
 
PPT
Jdbc Dao it-slideshares.blogspot.com
phanleson
 
PPTX
TechDays Tunisia - Visual Studio & SQL Server, Better Together - Ayman El-Hattab
Ayman El-Hattab
 
PPTX
Relational Database Management System part II
KavithaA19
 
PDF
Database Design Project-Oracle 11g
Sunny U Okoro
 
PPT
Ch10
蕭美蓮
 
PDF
Oracle
argusacademy
 
PDF
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data
 
PPT
Dbms
AbiramiK
 
PDF
Database Refactoring
Anton Keks
 
PDF
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Citus Data
 
PPTX
Web Developer make the most out of your Database !
Jean-Marc Desvaux
 
Refactoring database
Jiang Zhu
 
Database Basics
Abdel Moneim Emad
 
Gaelyk - Web Apps In Practically No Time
Saltmarch Media
 
Physical Design and Development
Er. Nawaraj Bhandari
 
Evolutionary db development
Open Party
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
Data Access Tech Ed India
rsnarayanan
 
SQL Server Development Tools & Processes Using Visual Studio 2010
Ayman El-Hattab
 
Evolutionary database design
Salehein Syed
 
Jdbc Dao it-slideshares.blogspot.com
phanleson
 
TechDays Tunisia - Visual Studio & SQL Server, Better Together - Ayman El-Hattab
Ayman El-Hattab
 
Relational Database Management System part II
KavithaA19
 
Database Design Project-Oracle 11g
Sunny U Okoro
 
Ch10
蕭美蓮
 
Oracle
argusacademy
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data
 
Dbms
AbiramiK
 
Database Refactoring
Anton Keks
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Citus Data
 
Web Developer make the most out of your Database !
Jean-Marc Desvaux
 
Ad

More from melbournepatterns (20)

PDF
An Introduction to
melbournepatterns
 
PPT
State Pattern from GoF
melbournepatterns
 
PDF
Iterator Pattern
melbournepatterns
 
PDF
Iterator
melbournepatterns
 
PPT
Concurrency Patterns
melbournepatterns
 
PPTX
Continuous Integration, Fast Builds and Flot
melbournepatterns
 
PPTX
Command Pattern
melbournepatterns
 
PPTX
Code Contracts API In .Net
melbournepatterns
 
PPTX
LINQ/PLINQ
melbournepatterns
 
PDF
Gpu Cuda
melbournepatterns
 
PPTX
Facade Pattern
melbournepatterns
 
PPT
Phani Kumar - Decorator Pattern
melbournepatterns
 
PPT
Composite Pattern
melbournepatterns
 
PPT
Adapter Design Pattern
melbournepatterns
 
PPT
Prototype Design Pattern
melbournepatterns
 
PPT
Factory Method Design Pattern
melbournepatterns
 
PPT
Abstract Factory Design Pattern
melbournepatterns
 
PPT
A Little Lisp
melbournepatterns
 
PPT
State Pattern in Flex
melbournepatterns
 
PPT
Active Object
melbournepatterns
 
An Introduction to
melbournepatterns
 
State Pattern from GoF
melbournepatterns
 
Iterator Pattern
melbournepatterns
 
Concurrency Patterns
melbournepatterns
 
Continuous Integration, Fast Builds and Flot
melbournepatterns
 
Command Pattern
melbournepatterns
 
Code Contracts API In .Net
melbournepatterns
 
LINQ/PLINQ
melbournepatterns
 
Facade Pattern
melbournepatterns
 
Phani Kumar - Decorator Pattern
melbournepatterns
 
Composite Pattern
melbournepatterns
 
Adapter Design Pattern
melbournepatterns
 
Prototype Design Pattern
melbournepatterns
 
Factory Method Design Pattern
melbournepatterns
 
Abstract Factory Design Pattern
melbournepatterns
 
A Little Lisp
melbournepatterns
 
State Pattern in Flex
melbournepatterns
 
Active Object
melbournepatterns
 
Ad

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

Database Refactoring Sreeni Ananthakrishna 2006 Nov

  • 1. Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
  • 2. Agenda What is database refactoring about? Evolutionary database development techniques Refactoring Strategies Classification of refactorings and examples
  • 3. What is database refactoring about? Improving database design Making small and incremental changes to the schema Maintain existing information and behaviour Functionality is not added/removed Not just limited to the database, but also the applications that use it
  • 4. A simple example… Customer accesses balance Customer SynchronizeAccountBalance {event = on update |on delete|on insert, drop date = <date> } balance SynchronizeCustomerBalance {event = on update |on delete|on insert, drop date = <date> } {drop date = <date>} App A App B maintainbalance() maintainbalance() customerId <<PK>> name Account accountId <<PK>> customerId <<FK>>
  • 5. Why refactor ? Data models built upfront tend to be complex and need cleaning Maintain consistency between application domain and data model Address performance requirements Identify and eliminate db smells
  • 6. Database Smells Multipurpose Column - eg. Customer dob & employee start date Multipurpose Table – eg. Customer table with person/corps Redundant Data – same information in different tables Table with too many columns – eg. Customer with many address Table with too many rows Smart columns – eg. Data has positional context Fear of change – too risky to change schema, time to refactor!
  • 7. Evolutionary Database Development Evolve data models vs upfront design Database regression testing Configuration management of database artifacts Developer Sandboxes
  • 8. Database regression testing Test the schema Check logic in stored procedures and triggers Test check and referential constraints View definitions Default Values and Invariants Test application code Unit tests around application code which queries the db. Test data migration
  • 9. Config management of DB Artifacts Schema creation scripts Data loading/migration scripts Reference data Stored procedures View definitions Test data Regression Tests
  • 11. Database Refactoring Strategies Apply small changes Small changes allow easy/early detection of errors Identify Individual Refactorings Instead of doing “move column” and “rename column” in one go, version each individually. Create database configuration table Helps identify current version of the database and can be used in migrations.
  • 12. Database Refactoring Strategies (contd.) Determine synchronization strategies during transition period Triggers do real time update but might have performance impacts. Views might not supports updates but do not move data Batch synch can be used during non-peak loads but might have to deal with multiple updates Encapsulate Database Access Abstract database access eg. By using persistence frameworks
  • 13. Database Refactoring Classification Structural Data Quality Referential Architectural Method
  • 14. Structural Refactorings Related to structure of Tables, Views eg. Move Column, Rename Table, Split Table, Merge Column Issues to consider when implementing: Cyclic Triggers Broken Views, Procedures, Triggers Transition period in multi-application setup
  • 15. Introduce Surrogate Key Motivations Reduce coupling between schema and business domain Increase consistency by having a uniform key strategy Improve performance by having index based on simpler key Potential Tradeoffs Surrogate keys are not suitable for all situations Introducing a new key might require further key consolidation and more effort “ Replace an existing natural key with a surrogate key”
  • 16. Introduce Surrogate Key (contd.) contains balance PopulateOrderId {event = on insert drop date = <date> } orderId <<FK>> <<surrogate>> orderId <<PK>> <<surrogate>> {drop date = <date>} Order customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> OrderItem customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> orderItemNumber <<PK>>
  • 17. Data Quality Refactorings Related to improving quality of information in db eg. Add Lookup Table, Introduce column constraint, Introduce common format Issues to consider when implementing: Constraint violations Broken logic in procedures Broken where clauses in Views Updating large amounts of data
  • 18. Add Lookup Table Motivations Introduce referential integrity for a column Provide code lookup (move enum to the db) Replace column constraint with set of expected values in lookup table Potential Tradeoffs Identifying the data to populate (especially for multiple apps) Possible performance impact due to additional joins “ Create a lookup table for an existing column”
  • 19. Add Lookup Table (contd.) Address Street <<FK>> 1. Identify the column 4. Introduce FK constraint 3. Populate Data 2. Create Lookup Table State PostCode State State <<PK>> Name
  • 20. Referential Integrity Refactorings Changes that improve referential integrity of data eg. Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for history Issues to consider when implementing: Fix broken CRUD logic in procedure Data cleansing to make new constraints work
  • 21. Introduce Cascading Delete Motivations Preserve referential integrity of the parent /child rows Remove responsibility for child deletion in the application Potential Tradeoffs Deadlock ? Trigger accidental mass deletion when deleting root nodes Duplicate functionality is introduced when using persistence frameworks like Hibernate/Toplink “ Delete the child record(s) when the parent is deleted”
  • 22. Introduce Cascading Delete (contd.) Policy PolicyId <<PK>> Claim ClaimId <<PK>> 1. Identify the column 2. Choose cascading mechanism (triggers or using cascade clause during constraint creation) PolicyId <<FK>> DeleteClaim {event = on delete}
  • 23. Architectural Refactorings Changes that improve performance, portability and define the architecture within the database eg. Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for history Issues to consider when implementing: Performance vs Data redundancy Keeping business logic in the application vs database
  • 24. Introduce Index Motivations Increase performance of read queries Potential Tradeoffs Too many indexes degrade performance during insert/update/deletes Existing data containing duplicates might need cleansing when introducing unique indexes “ Introduce a unique or non-unique Index”
  • 25. Introduce Index (contd.) Customer CustomerId <<PK>> TFN <<index>> 1. Determine type of index – unique vs non-unique 3. Add a new index TFN <<AK>> Name 4. Add more disk space for index maintenance 2. Eliminate duplicate rows when using unique index
  • 26. Method Refactorings Changes that improve code representing stored procedures, functions and triggers eg. Rename Method, Reorder Parameters, Replace literal with Table Lookup Issues to consider when implementing: Broken triggers, procedures, functions Tool support
  • 27. Refactoring Tools Schema Migration – Rails Migration, Sundog Unit Testing –JUnit, DBUnit Refactor Stored Procedures – SQLRefactor(SQLServer Only)