0% found this document useful (0 votes)
16 views16 pages

chapter 6-choosing your database

Choosing the right database involves evaluating factors such as data volume, user load, integration needs, scaling requirements, and support considerations. Key aspects include understanding the CAP theorem, schema flexibility, and the importance of keeping the architecture simple to avoid unnecessary complexity. Ultimately, the decision should align with the specific needs of the application and the expertise available within the organization.

Uploaded by

Lokesh Chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

chapter 6-choosing your database

Choosing the right database involves evaluating factors such as data volume, user load, integration needs, scaling requirements, and support considerations. Key aspects include understanding the CAP theorem, schema flexibility, and the importance of keeping the architecture simple to avoid unnecessary complexity. Ultimately, the decision should align with the specific needs of the application and the expertise available within the organization.

Uploaded by

Lokesh Chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 16
Choosing the Right Database How do you make this decision when you're architecting a given system?. There are so many databases are available and picking up one database over another is a complicated decision. Well, there is no real formula you can follow but there are a few things you should think about. It’s not an easy decision but people who are good at it make the big bucks. Firstly set aside the idea that you are going to find the one true database that is better than everything else. Now before considering a specific database take some time and ask a few important questions related to your project... How much data do you expect to store when the application is mature? How many users do you expect to handle simultaneously at peak load? What availability, scalability, latency, throughput, and data consistency does your application need? How often will your database schemas change? What is the geographic distribution of your user population? What is the natural “shape” of your data? Does your application need online transaction processing (OLTP), analytic queries (OLAP), or both? What ratio of reads to writes do you expect in production? What are your preferred programming languages? © Do you have a budget? If so, willit cover licenses and support contracts? © How strict are you with invalid data being sent to your database? (Ideally, you are very strict and do server-side data validation before persisting it to your database) Now let's talk about some key aspects that will answer the above questions and will help you to choose the right database for your application... 1. Integration The most important thing to consider while choosing the right database is what system you need to integrate together? Make sure that your database management system can be integrated with other tools and services within your project. Different technologies have different connectors for different other technologies. For example, if you have a big analytics job that's currently running an Apache spark then probably you want to limit yourself to external databases that can connect easily to apache spark. Now suppose you have some frontend system that actually depends on having a SQL interface to a backend and you're thinking about moving froma monolithic database to a non- relational database. It will be only a good choice if the non-relational database you're moving to offer some sort of SQL like interface that can be easily migrated to from your frontend application. So think about the pieces that need to talk together in your system and see if they can actually talk together or not with existing off- the-shelf components and whether those components are actually well maintained and up-to-date. Another example is ArangoDB which has excellent performance but libraries for this DBMS are still young and lack support. Using ArangoDBin combination with other tools may be risky, so the community suggests avoiding ArangoDB for complex projects. 2. Scaling Requirement It's important to know the scaling requirement before installing your production database. How much data are you really talking about? Is it really going to grow unbounded over time? if so then you need some sort of database technology that is not limited to the data that you can store on one PC. You need to look at something like Cassandra or MongoDB or HBase where you can actually distribute the storage of your data across an entire cluster and scale horizontally instead of vertically. Many databases can’t handle thousands of users querying terabytes or petabytes of data, because of scaling issues. While choosing a database you also need to think about the transaction rate or throughput which means how many requests you intend to get per second. Databases with high throughput can support many simultaneous users. If we are talking about thousands then again a single database service is not going to work out. This is especially important when you are working on some big websites where we have a Lot of web servers that are serving a lot of people at the same time. You will have to choose a database that is distributed and allows you to spread out a load of those transactions more evenly. In those situations, NoSQL databases are a good choice instead of RDBMS. 3. Support Consideration Think about the supports you might need for your database. Do you have the in-house expertise to spin up this new technology and actually configure it properly? \t's going to be harder than you think especially if you're using this in the real world or any sort of situation where you have personally identifiable information in the mix from your end-users. In that case, you need to make sure you're thinking about the security of your system. The truth is most of the NoSQL database we've talked about if you configure them with their default settings there will be no security at all. Anybody at all can connect to these things and retrieve data and write data into them. So make sure you have someone available who knows what they're doing for setting this up in a secure manner. If you are in a big organization that has these experts in-house then it's great, but if you're in asmaller organization you may have to choose the technology that offers professional paid support who can guide you through initial setup decisions in the initial administration of your server over time. You can also outsource the administrators for support. Amore corporate solution like MongoDB has paid support and if we talk about tham Apache projects then there are some companies that offer paid professional support. 4. CAP Consideration CAP stands for Consistency, Availability, and Partition tolerance. The theorem states that you cannot achieve all the properties at the best level in a single database, as there are natural trade offs between the items. You can only pick two out of three at a time and that totally depends on your prioritize based on your requirements. For example, if your system needs to be available and partition tolerant, then you must be willing to accept some latency in your consistency requirements. Traditional relational databases are a natural fit for the CA side whereas Non-relational database engines mostly satisfy AP and CP requirements. Consistency op CATEGORY longoD! * Menthe ‘Bigtable, CA CATEGORY Ex. RDBMS { Or 561 Server. MySQL) AP CATEGORY Ex. Cassandra, RIAK, CouchDB e Consistency means that any read request will return the most recent write. Data consistency is usually “strong’ for SQL databases and for NoSQL database consistency may be anything from “eventual” to “strong”. e Availability means that a non- responding node must respond ina reasonable amount of time. Not every application needs to run 24/7 with 99.999% availability but most likely you will prefer a database with higher availability. ¢ Partition tolerance means the system will continue to operate despite network or node failures. The type of application will determine what you want there and only you know the actual requirements. Is it actually ok if your system goes down for a few seconds or a few minutes, if not then availability should be your prime concern? If you're dealing with something with real transactional information like a stock transaction or financial transactions you might value consistency above all. Try to choose the technology that is best suited to the trade-offs that you want to make. 5. Schemas or Data Model Relational databases store dataina fixed and predefined structure. It means when you start development you will have to define your data schema in terms of tables and columns. You have to change the schema every time the requirements change. This will lead to creating new columns, defining new relations, reflecting the changes in your application, discussing with your database administrators, etc. NoSQL database provides much more flexibility when it comes to handling data. There is no requirement to specify the schema to start working with the application. Also, the NoSQL database doesn't put a restriction on the types of data you can store together. It allows you to add more gm new types as your needs cnange. In the application building process, most of the developers prefer high coding velocity and great agility. NoSQL databases have proven to be a much better choice in that regard especially for agile development which requires fast implementation. You really need to take care of allthe 5 points mentioned but above all, the most important advice is to keep everything simple. Don't choose a database just because it is shiny and trendy in the market. If you don't need to set up a highly complex NoSQL cluster or something that needs a lot of maintenance like MongoDB or HBase where you have all these external servers that maintain the configuration don't do it if you don't needtn Think ahautthe minimim requirements that you need for your system. If you don’t need to deal with the massive scale then there is no need to use a NoSQL database, you can choose MySQL and somewhere it'll be fine. There is no point to deploy a whole new system that does not have good expertise within your organization unless you really need to. Simple technologies and simple architectures are going to be a lot easier to maintain. After all, you're not going to be happy when you wake up in the morning at 3:00 am because some random server went down on this overly complex database system that you set up for no good reasons. So keep everything simple as much as possible.

You might also like