Database Design 101: Strategies for Scale

Database Design 101: Strategies for Scale

Let's set the scene. Your backend is optimised, your api scales, services containerised and your frontend is snappy. But then your marketing intern publishes a tiktok none of the team believed in and suddenly traffic spikes - thousands of users hitting your app. The response times are getting ridiculously slow and your database is the reason. 

Databases are often the first system to crack when scale kicks in. And while the fix isn’t always to “just use NoSQL” or “shard it,” there are clear strategies senior engineers use to manage growth without over-engineering too soon.

In this article, we’ll walk through practical techniques that help you handle massive scale on the database side — from read vs. write separation, to caching, indexing, connection pooling, and more. As per feedback on my previous articles, I'll try my best to not make this just about academic theory — rather about the tools and design patterns you can start applying today as you level up toward senior engineering.


Read vs. Write Patterns: Why Writes Are Always the Harder Problem

Let’s get this out of the way early: reads are easier to scale than writes.

Why? Because reads don’t usually change the state of your system — they just ask for information. That makes them easier to duplicate, cache, and distribute. Writes, on the other hand, have to change something — and that makes them way harder to scale cleanly.

Real-World Analogy: Taking Orders vs. Serving Meals

Think of your database like a restaurant. Taking a million food orders is easy if you have enough waiters, order kiosks, or an app. But actually cooking and serving a million meals? That’s where you hit bottlenecks. You can’t just throw more cooks in the kitchen without some serious coordination.

  • Reads = Taking orders (you can scale this out with more replicas or caches)

  • Writes = Cooking meals (requires consistency, coordination, and correctness)

Why Write Scaling Gets Complicated

  • Writes change the system’s state — and the system needs to stay consistent. You can’t just duplicate or defer them without consequences.

  • Concurrency issues can pop up when multiple users try to write to the same data. Think race conditions, locking, and data conflicts.

  • Durability matters: A failed write can lead to data loss or corruption, which is often unacceptable.

Quick Wins for Read Scalability

Since writes are harder to scale, a common strategy is to offload as many reads as possible from your primary database:

  • Read Replicas: These are copies of your database that only handle reads. They help spread the load without affecting write performance.

  • Caching: Store frequently-accessed data in fast, in-memory stores like Redis or Memcached. This drastically reduces DB pressure.

We’ll dive deeper into distributed databases and advanced write strategies (like sharding and consensus) in a future article. For now, focus on understanding why scaling reads is low-hanging fruit — and why write paths need more care.


Caching: Your First Line of Defence

One of the fastest ways to make your database happy is simple: just stop hitting it so much.

Caching is your first, most effective line of defence. It allows you to serve data faster, reduce database load, and absorb spikes in traffic — all without changing your core DB architecture.

Why Hitting the DB Less = Survival

Think of your database like a customer support rep. If every single user question goes to that rep directly, burnout is inevitable. But if you put the top 100 questions on a FAQ page (aka cache), the rep only gets involved when it really matters.

Less load on the database = more headroom for writes, complex queries, or growth.

Types of Caching

There are a few layers of caching worth understanding:

  1. Application-Level Caching This is where you store commonly accessed data in memory, right inside your app or via something like Redis. Examples:

  2. Edge or CDN Caching If you're serving public-facing data (like blog posts, product listings, or images), you can push this content to the edge using CDNs like Cloudflare or Fastly. That way, requests don’t even hit your app — let alone your DB.

Don’t Cache Writes (Yet)

At this point, don’t try to cache writes. It sounds tempting, but caching write paths (like “user just updated their bio”) opens the door to tricky consistency issues. That’s a more advanced topic — one we’ll cover later when we dive into distributed systems and eventual consistency.

TL;DR: If you're not caching yet, you’re making your database work harder than it needs to. Start with what’s easy: popular reads, repeated queries, and public content.


Read Replicas and Load Distribution

Once caching has taken some pressure off, the next step is scaling your reads across multiple machines using read replicas.

What Are Read Replicas?

Read replicas are exact copies of your primary database that are continuously updated (usually asynchronously). Instead of sending every query to the same overworked primary DB, you can distribute read traffic to replicas.

Think of your primary DB as the chef and the replicas as food runners. The chef prepares the meal (writes), but the runners (replicas) can serve it to thousands of customers (read queries) without bothering the chef again.

When Do You Use Them?

Read replicas are great for:

  • Analytics dashboards

  • Internal reporting tools

  • Mobile or web clients fetching read-only data

Anything that doesn’t modify data is fair game to offload.

For example, if your app generates a heavy analytics report every hour, running that on the primary DB could slow things down for everyone else. But if you send it to a replica, the primary stays focused on writes and critical transactions.

Lag and Tradeoffs

Here’s the catch: replicas are usually eventually consistent. Since replication is often asynchronous, there’s a lag between when data is written to the primary and when it shows up on a replica.

That means:

  • If a user updates their profile and immediately views it, they might see stale data (depending on which DB they hit).

  • You’ll need to design your system to tolerate this — for example, routing reads to the primary DB immediately after a write, or clearly communicating that a change might take a second to reflect.

TL;DR: Read replicas buy you scale, but come with tradeoffs. Use them to reduce load, especially for non-critical reads — just be mindful of replication lag and eventual consistency. 


Connection Pooling and Limits

When scaling your application, one of the sneaky issues that can crop up is database connection overhead. As your user base grows, the number of simultaneous connections to your database can skyrocket, and managing them properly becomes crucial.

Each time your app opens a database connection, it's a resource-intensive operation. Databases have a max connection limit (e.g., PostgreSQL may have a default cap of 100-200 connections). If you don’t manage connections effectively, you’ll quickly hit that limit, and new users won’t be able to access the database. Worse, this can cause timeouts, slow performance, or even application crashes.

As your app grows, it’s easy to imagine that scaling your database requires adding more and more connections. But in reality, opening thousands of connections isn’t the solution — it’s a performance bottleneck.

Connection Pooling: The Fix

Instead of creating new connections on the fly, you can use connection pooling to reuse a set of existing connections. Pooling reduces overhead and ensures that your app uses a manageable number of connections.

The connection pooler:

  1. Opens a set of connections to the database when the application starts.

  2. Reuses these connections for incoming requests, rather than opening new ones every time.

  3. Automatically closes connections when they’re no longer needed.

This means less stress on the database, faster response times, and the ability to handle many more users with far fewer connections.

Pro Tip: Always set an upper limit for the number of connections that can be opened. It’s easy to accidentally let the connection pool grow too large, which can impact performance.


Denormalisation and Precomputed Views

As systems scale, database optimizations often require trade-offs. One such trade-off is denormalisation: duplicating data to reduce the complexity of queries and improve read performance. Though it may seem counterintuitive, denormalisation can be a powerful tool when used correctly.

In highly relational databases, normalisation (splitting data across multiple tables to reduce redundancy) is a best practice. However, as applications grow and traffic increases, complex joins across these normalised tables can become expensive in terms of latency and CPU usage.

At scale, these complex queries can slow down your system and create bottlenecks. This is where denormalisation comes into play: duplicating data in such a way that it’s optimised for faster reads, even at the cost of some storage overhead. By storing data that would otherwise require expensive joins, you can avoid repetitive database lookups and reduce response times.

For example:

  • Instead of doing a join between a users table and an orders table to get user order history, you might store a denormalised version of that order history directly on the users table. This makes the data retrieval faster but requires updating both places when changes occur.

Use Materialised Views for Expensive Joins or Aggregations

One of the best ways to handle expensive aggregations and joins without compromising on performance is by using materialised views.

  • Materialised Views are precomputed views that store the results of a query in a static table. Instead of running the query on-the-fly every time a user requests data, the results are precomputed and stored in the database. This means faster reads, especially for complex aggregations or calculations that are required repeatedly.

For example, an app that calculates monthly revenue for thousands of products would benefit from a materialised view that stores the precomputed sum for each product, so the system doesn’t need to run the aggregation query every time.

The downside? Materialised views require maintenance. They need to be refreshed periodically to reflect new data, which can add some complexity.

Common Anti-Patterns: Over-Normalising in High-Traffic Apps

One common anti-pattern when working with databases at scale is over-normalisation. In an effort to reduce redundancy and maintain data integrity, some developers go too far in breaking up data into many tiny tables.

While normalisation is important for consistency, when dealing with high-traffic applications, excessive normalisation can lead to:

  1. Complex queries with multiple joins, slowing down response times.

  2. Increased overhead when managing relationships between tables, especially in high-concurrency environments.

  3. Scaling issues, as each additional join adds computational load.

Instead, consider denormalising some of the data that’s frequently queried together or that doesn’t need to be updated on every request. This is particularly useful for read-heavy systems.

TL;DR: Denormalisation can improve read performance by duplicating data for faster access. Materialised views can speed up expensive queries, but they need to be refreshed. Avoid over-normalising in high-traffic apps.


Avoiding Expensive Queries

As your database scales, query performance becomes crucial. Poorly optimized queries can quickly become a bottleneck, especially when they grow in complexity. Let’s take a look at a few strategies to avoid expensive queries that could otherwise undermine your system's performance.

The "N+1 Query Problem" Explained

One of the most common performance issues in databases is the N+1 query problem. This occurs when an application makes a query for a list of items, and then for each item, it makes a separate query to retrieve additional data. In essence, this turns one query into N+1 queries, where N is the number of items being retrieved.

For example, imagine you’re retrieving a list of users with their associated orders:

  1. First, you query for all users (let’s say 1000 users).

  2. Then, for each user, you make a separate query to get their orders.

This creates a situation where you’re performing 1 query to get users, and 1000 additional queries to get their orders — a total of 1001 queries. This is incredibly inefficient.

Solution: Always strive to batch queries. Instead of making a separate query for each related piece of data, retrieve them in one go using joins or more efficient bulk fetching strategies.


Indexing 101: What to Index, and What Not to

Indexes are a powerful tool to speed up data retrieval. However, they come with trade-offs. Adding too many indexes can slow down your database’s write operations, as every insert or update needs to update the indexes as well.

Here’s a quick guide on what to index and what not to:

  • Index Columns Used in WHERE Clauses: If you frequently query by a specific column, it’s a good idea to index it. For example, if you’re often filtering by user_id or email, these columns should be indexed.

  • Index Columns Used in JOINs: Columns that are used in JOINs should also be indexed to speed up data matching across tables.

  • Avoid Indexing Columns with High Cardinality: Columns with too many unique values (like a timestamp or UUID) might not benefit from indexing as much, because they don’t significantly reduce the search space.

  • Composite Indexes: When queries filter by multiple columns, consider using composite indexes (indexes on multiple columns). This can drastically improve performance for those specific queries.

Index strategically. Don’t index everything — it’s about finding a balance between speed for reads and impact on writes.


Write Optimisation Strategies

When it comes to scaling databases, write optimisation is just as important as optimising reads. Writes are often more resource-intensive and can introduce latency, especially as your system grows. Fortunately, there are strategies to minimise the load on your database during write-heavy operations.

Batching Writes: Buffering Frequent Inserts

One of the best ways to optimise writes is by batching them. Instead of inserting records one by one, you can group them together into a batch and insert them in a single operation. This reduces the overhead of opening and closing database connections multiple times, leading to significant performance improvements.

For example, if you're logging user activity or inserting multiple items in an e-commerce order, batch the writes so that you're only hitting the database once instead of making a separate request for each item. The batch can be sent in intervals or triggered by reaching a certain size.

Buffering frequent inserts also gives you the flexibility to apply more complex logic before committing changes to the database, such as deduplication or validation, reducing the risk of errors.

Background Jobs & Queues

Not all writes need to happen immediately. For example, user activity logs, email notifications, or analytics data often don’t need to be written to the database in real-time. By deferring these writes to background jobs or message queues, you can free up your database to handle more critical tasks.

Common approaches include:

  • Background Jobs: Using job processing frameworks like Celery (Python) to handle background tasks, such as sending emails or logging analytics events, without blocking user-facing operations.

  • Message Queues: Systems like RabbitMQ or Kafka can be used to queue up tasks that are then processed asynchronously. These queues can handle spikes in traffic, ensuring that your database is not overwhelmed by simultaneous writes.

The key is to recognise that not all writes need to block the user experience. Use asynchronous writes whenever possible to maintain a smooth user experience while offloading heavy operations to the background.


Wrapping Up

Handling millions of users without your database “exploding” requires a blend of thoughtful scaling strategies and practical optimizations. By focusing on the right patterns for read-heavy and write-heavy workloads, you can dramatically reduce the stress placed on your database.

Key Tactics to Remember:

  • Caching: Minimise the number of direct hits to your database, especially for frequently accessed or static data.

  • Read Replicas: Offload read traffic to replicas and improve database scalability.

  • Connection Pooling: Reduce connection overhead by reusing database connections and limiting the number of active connections.

  • Denormalisation: Don't shy away from some duplication to optimise complex queries or aggregations.

  • Write Optimisation: Batch writes, use background jobs for asynchronous processing, and ensure idempotent writes to handle retries gracefully.

While these strategies are effective in scaling your database to handle millions of users, eventually you’ll hit the limits of what a single, monolithic database can handle. That’s when distributed databases become the solution. In our next article, we’ll dive deep into how to manage database performance and consistency when your data is spread across multiple nodes, ensuring both scalability and availability.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics