Building a Scalable Database Service on AWS with Serverless Architecture

Designing a scalable AWS database service using serverless compute, tiered storage, and Redis caching.

Dec 29, 2025 • 10 mins read • views

Modern SaaS platforms often face a tricky challenge early on: how do you manage thousands of customer databases without exploding infrastructure costs?

The default answer is usually a managed database per tenant. But that approach quickly becomes expensive, operationally complex, and difficult to scale. Instead, I (along with my teammates at NYU) explored a different design: building a serverless, multi-tenant database platform using lightweight databases and cloud-native storage.

The goal was simple: design a system that provides strong tenant isolation, scalable migrations, and high availability while keeping costs dramatically lower than traditional managed databases.

The problem we're solving

Most SaaS platforms start with a simple architecture: one database per customer or one large shared database.

Both approaches have tradeoffs.

Provisioning a dedicated database instance per tenant offers strong isolation but becomes extremely expensive at scale. Running thousands of database instances means paying for compute, storage, and replication even when tenants are idle.

Shared databases reduce cost but introduce operational complexity:

difficult schema migrations
tenant isolation challenges
noisy neighbor performance issues

Traditional cloud databases solve some of these problems, but the cost curve grows quickly. For example, provisioning managed databases for thousands of tenants can cost tens of thousands of dollars per month.

This led to an interesting question:

What if each tenant database were just a lightweight file, orchestrated by serverless infrastructure?

That idea became the foundation for the architecture.

System Design

The system is built entirely with AWS serverless components and distributed storage, focusing on three goals:

tenant isolation
low operational cost
elastic scaling

Each tenant is provisioned as a separate SQLite database file (.db) stored in object storage, while orchestration is handled through serverless APIs and metadata services.

Core Components

The architecture is built using a set of AWS services that together provide tenant provisioning, query execution, schema migrations, replication, and storage tiering.

Amazon API Gateway

API Gateway acts as the public entry point to the system. All client requests such as tenant creation, query execution, and schema migrations pass through API Gateway before being routed to backend Lambda functions.

AWS Lambda

Lambda functions serve as the compute layer of the platform. Different Lambda handlers are responsible for tasks such as executing queries, provisioning new tenant databases, performing schema migrations, managing replication, and handling storage tier transitions.

DynamoDB

DynamoDB stores system metadata including tenant identifiers, API keys, schema versions, database locations, and migration state. This metadata layer allows the platform to orchestrate thousands of tenant databases efficiently.

S3

S3 stores SQLite database files and schema templates. It also acts as cold storage for inactive tenants, allowing the system to store large numbers of databases cheaply while maintaining durability.

EFS

EFS is used as the hot storage layer for active tenant databases. Because it provides low-latency shared filesystem access, Lambda functions can read and write tenant database files directly during query execution.

ElastiCache (Redis)

Redis is used as an in-memory query result cache. Frequently executed read queries are temporarily cached, allowing the system to return results without repeatedly accessing the underlying database files.

SQS

SQS queues coordinate asynchronous workflows such as schema migrations and replication updates. FIFO queues ensure migrations for a tenant are processed sequentially, preventing concurrent modification of the same database.

SNS

SNS is used to broadcast database snapshot events after write operations. These events trigger downstream replication tasks, allowing multiple replicas to update asynchronously.

Route 53

Route 53 manages DNS routing and health checks for the system’s API endpoints. If the primary endpoint fails, Route 53 automatically redirects traffic to a secondary region to maintain availability.

EventBridge Scheduler

EventBridge triggers scheduled background jobs such as the cold storage manager, which periodically scans tenant activity and moves inactive databases from EFS to S3.

CloudWatch

CloudWatch provides logging, metrics, and monitoring for the platform. It tracks system health, storage transitions, migration execution, and overall request latency.

Together, these services create a fully serverless control plane for database provisioning and management.

When a new tenant is created:

A request hits API Gateway
A Lambda function provisions a new SQLite database
Schema templates are applied
Metadata is stored in DynamoDB
The database file is uploaded to S3

This design provides complete tenant isolation, since each tenant operates on its own database file.

Designing a Tiered Storage Strategy

One of the key challenges in multi-tenant systems is managing storage cost. If every tenant database lives on low-latency storage, the cost grows linearly with the number of tenants.

To solve this, the system uses two storage tiers:

Storage	Purpose
Amazon EFS	Hot storage for active tenants
Amazon S3	Cold storage for inactive tenants

Active tenants are stored on EFS, which provides low-latency NFS access suitable for transactional workloads. Typical read latency is around 1–2 ms.

Inactive tenants are moved to S3, which provides durable and significantly cheaper object storage, but with higher access latency around 100–200 ms when databases need to be restored.

This separation allows the system to optimize both performance and cost.

Schema Migrations at Scale

One of the hardest problems in multi-tenant systems is schema evolution.

Updating schemas across thousands of databases can easily introduce downtime or inconsistent states.

To solve this, the system implements a queue-based migration pipeline.

Migration requests are first processed by a handler Lambda, which prepares migration tasks and sends them to a FIFO SQS queue. A separate worker Lambda consumes those tasks sequentially.

This architecture ensures:

migrations are processed in order
no two workers update the same tenant simultaneously
failures are isolated per tenant

Supported migration operations include:

creating tables
renaming tables
adding columns
dropping tables

More complex operations are intentionally avoided to keep migrations safe and predictable.

Replication and High Availability

To ensure resilience, the system implements read replicas and automatic failover.

After every write operation:

A snapshot of the database is created
Metadata about the snapshot is published to an SNS topic
Multiple SQS queues fan out replication jobs
Lambda workers update read replicas asynchronously

This approach decouples replication from the write path, keeping write latency low while maintaining eventual consistency.

To handle failures, Route 53 health checks monitor the primary API endpoint. If a failure occurs, traffic is automatically routed to a standby endpoint in another region.

The result is a system that remains available even during infrastructure outages.

Tiered Storage and Query Caching

octodb-migration-tiered-storage-architecture-diagram

Storage cost becomes a major issue when you manage millions of tenant databases.

Instead of storing everything in a single storage layer, the system uses tiered storage:

Storage Layer	Purpose
S3	Cold storage for inactive tenants
EFS	Hot storage for frequently accessed tenants
Redis	In-memory cache for repeated queries

Inactive tenant databases are automatically moved to S3 to reduce cost. When they become active again, they are rehydrated back into EFS.

On top of that, a Redis caching layer stores results of frequently executed queries. Cache hits can return results in under one millisecond, reducing load on the storage layer.

Automatic Tiering

A scheduled Lambda function periodically scans tenant metadata stored in DynamoDB. If a tenant database has not been accessed within a configurable threshold (for example 30 days), it is moved to cold storage.

The workflow looks like this:

A scheduled Lambda checks the last_accessed timestamp for each tenant.
Databases that exceed the inactivity threshold are uploaded from EFS → S3.
The database is removed from EFS to free up space.
Metadata is updated to mark the tenant as COLD.

Operational metrics such as tier transitions and reclaimed storage are tracked using CloudWatch dashboards.

Rehydration

When a request is made for a tenant stored in cold storage:

API Gateway invokes a Rehydrate Lambda.
The tenant database is downloaded from S3 → EFS.
Metadata is updated to mark the tenant as HOT.
Queries resume normally.

This design keeps hot tenants fast while allowing inactive tenants to be stored cheaply.

Cost Analysis of Storage Strategies

To evaluate the effectiveness of tiered storage, consider a system with:

1 million tenants
5 GB per tenant database
10 reads and 5 writes per day

If all databases were stored on EFS, annual infrastructure costs would exceed $1.7 million.

If everything were stored on S3, storage would be cheaper but request and transfer costs would dominate, reaching over $10 million annually.

A hybrid tiered strategy dramatically improves efficiency:

Storage Strategy	Estimated Annual Cost
All EFS	~$1.7M
All S3	~$10.7M
Tiered (EFS + S3)	~$1.0M

The hybrid approach ensures the system cost scales with tenant activity rather than total tenant count.

Results

The architecture delivered strong improvements in both cost and scalability.

For tenant provisioning, the system reduces database cost dramatically compared with traditional managed instances:

Approach	Monthly cost per tenant
Managed database per tenant	$7–8
Shared database model	$0.10–0.20
File-based architecture	< $0.01

This represents over 95% cost reduction while still maintaining strong tenant isolation.

Performance improvements were also significant:

Access Path	Typical Latency
Redis cache hit	< 1 ms
EFS hot storage	1–3 ms
S3 cold rehydration	100–200 ms

The tiered storage architecture also reduces storage costs by 60–80%, since only active tenants remain in the hot storage layer.

Final Thoughts

Building a scalable database service does not always require heavyweight distributed databases or expensive managed clusters.

By combining:

serverless compute
lightweight databases
object storage
queue-based orchestration
caching layers

it IS possible to build a cost-efficient, highly scalable multi-tenant database platform using simple cloud primitives.

This architecture shows that with the right design choices, systems can scale to massive workloads while keeping both cost and operational complexity under control. If you'd like to take a look at the implementation of such a service, check out the GitHub repository on my profile, and make sure to ⭐ star it as well!

🎉 Share this article

kevzpeter.com/blog/building-a-scalable-database-service-on-aws-with-serverless-architecture

Kevin Peter

Built and designed by Kevin Peter.