DynamoDB vs Aurora: Simulation-Led Selection

The situation

A growth-stage e-commerce company was rebuilding its product catalogue service ahead of an anticipated traffic ramp. The team had operated on a monolithic PostgreSQL database for three years. The new service - a read-heavy catalogue API serving product listings, inventory status, and pricing to both the storefront and third-party integrations - needed to support a baseline of 8,000 RPS, weekly wave peaks of 40,000 RPS, and campaign-day spikes to 120,000 RPS.

The engineering lead had a working assumption: DynamoDB was the natural choice for a high-read, horizontally scalable catalogue service. The data model was largely key-value - product ID to product record - with limited relational complexity. The operational simplicity argument was compelling. There was no appetite for managing an RDS fleet at this stage of growth.

The question wasn't whether to use DynamoDB. It was which DynamoDB capacity mode - on-demand or provisioned - and whether Aurora Serverless v2 deserved a serious look given its ACU-based autoscaling and SQL compatibility for the more complex catalogue queries downstream analytics teams were already asking for.

We thought we knew the answer before we started. The simulation changed three things we were confident about. That's the whole point of running it.

The team ran the evaluation entirely in pinpole before provisioning a single resource. Two canvases. Three traffic patterns. Eight simulation runs. The decision was made, documented, and presented to the CTO in a single afternoon.

8 hrs

Total evaluation time, canvas to decision

$68K

Monthly cost difference found before deployment

Assumptions overturned by simulation data

Canvas setup and architecture

The team built two canvases representing the same application topology - a serverless API stack - with only the database layer differing. Both canvases share the same critical path:

Route 53 → CloudFront → API Gateway (HTTP API) → Lambda (catalogue-api)
Canvas A: → DynamoDB (product-catalogue-table)
Canvas B: → Aurora Serverless v2 (catalogue-cluster) + RDS Proxy
Supporting: WAF · ElastiCache (read cache, both canvases) · CloudWatch · SQS (write path)

Canvas A was configured with DynamoDB in both on-demand and provisioned modes - toggled between simulation runs - to produce direct capacity mode comparisons under identical traffic. Canvas B used Aurora Serverless v2 with a minimum of 0.5 ACU and a maximum of 64 ACU, behind an RDS Proxy to manage Lambda connection pooling.

🖼

Canvas Image - Figure 1

Canvas A - DynamoDB Architecture

Screenshot: Canvas A showing Route 53 → CloudFront → API Gateway (HTTP API) → Lambda (catalogue-api, 512 MB, 300 reserved concurrency) → DynamoDB (product-catalogue-table, on-demand capacity mode). ElastiCache Redis cluster on cache-aside path. WAF attached to CloudFront. SQS on write path. Node labels visible. All connections validated (green).

Route 53

CloudFront

API Gateway

Lambda

DynamoDB (on-demand)

ElastiCache

canvas-a-dynamodb-v1

Figure 1 - Canvas A: DynamoDB architecture. All service connections validated. DynamoDB configured in on-demand capacity mode for initial simulation run.

🖼

Canvas Image - Figure 2

Canvas B - Aurora Serverless v2 Architecture

Screenshot: Canvas B showing identical topology to Canvas A up to Lambda, then Lambda → RDS Proxy → Aurora Serverless v2 cluster (catalogue-cluster, 0.5–64 ACU, multi-AZ, MySQL 8.0 compatible). RDS Proxy shown as intermediate node with connection pool configuration visible. ElastiCache on same cache-aside path as Canvas A.

Route 53

API Gateway

Lambda

RDS Proxy

Aurora Serverless v2

ElastiCache

canvas-b-aurora-v1

Figure 2 - Canvas B: Aurora Serverless v2 architecture. RDS Proxy node present to manage Lambda → Aurora connection pooling. All connections validated.

Before running any simulation, pinpole's connection validation surfaced one immediate finding on Canvas B: the initial RDS Proxy configuration had only a single availability zone selected. The canvas flagged this as a WARNING - a single-AZ proxy creates a single point of failure for the connection pool under Lambda burst conditions. The team added a second proxy endpoint in a second AZ before the first simulation ran. This was a five-minute canvas change that would have been an undiscovered production gap under the old workflow.

⚠ Pre-simulation catch - RDS Proxy AZ coverage

Canvas validation flagged single-AZ RDS Proxy before simulation ran. Under Lambda burst at 120K RPS, a single-AZ proxy creates connection routing contention that manifests as intermittent p99 latency spikes. A second proxy endpoint was added to a second AZ at canvas design time - not discovered in production.

Simulation methodology

Three traffic patterns were run against both canvases to reflect the real operational profile of the catalogue service. All simulations used the same RPS parameters to ensure the comparison was direct.

Constant - 8,000 RPS (baseline)

Simulates the steady-state weekday traffic profile. Used to establish baseline cost and validate all nodes are healthy at operating load before wave and spike patterns are applied.

Wave - 8,000 → 40,000 RPS (weekly traffic cycle)

Simulates the weekly traffic pattern: trough at 8K RPS Sunday night, peak at 40K RPS Friday afternoon. Exposes ACU scale-up latency in Aurora Serverless v2 and DynamoDB auto-scaling behaviour at sustained peak.

Spike - 8,000 → 120,000 RPS (campaign day burst)

Simulates a near-instantaneous 15× traffic spike during a promotional event. The most demanding pattern for both architectures. Surfaces throttling, connection pool exhaustion, and burst capacity limits.

Eight simulation runs were executed in total: Constant × 3 (DynamoDB on-demand, DynamoDB provisioned, Aurora Serverless v2), Wave × 2 (DynamoDB provisioned, Aurora Serverless v2), and Spike × 3 (DynamoDB on-demand, DynamoDB provisioned, Aurora Serverless v2). DynamoDB on-demand was excluded from Wave and Spike after the Constant run surfaced a cost finding that made further comparisons academic.

Finding 1 - DynamoDB on-demand is prohibitively expensive at sustained peak

The first simulation surprise came within minutes of running the Constant pattern at 8,000 RPS. DynamoDB on-demand showed healthy latency metrics - 2ms p50, 7ms p99 - but the live cost estimate in the pinpole panel was immediately alarming.

simulation / canvas-a-dynamodb-v1 / constant-8k-rps complete

📊

Simulation Panel - Figure 3

DynamoDB On-Demand - Constant 8,000 RPS

Screenshot: Simulation output panel showing all nodes healthy. Live cost estimate tile displayed prominently. Per-node latency bars for Lambda, DynamoDB. Traffic pattern visualisation (flat line at 8K RPS). DynamoDB node showing on-demand WCU/RCU consumption rate. AI Recommendation badge visible (1 recommendation pending).

2ms

p50 Latency

7ms

p99 Latency

$76,400

Est. / Month

Throttle Events

Figure 3 - Simulation output: DynamoDB on-demand at constant 8,000 RPS. Latency is excellent. Monthly cost estimate is not.

The $76,400/month estimate for DynamoDB on-demand at 8,000 RPS baseline was the first assumption-breaker. The team had budgeted approximately $12,000/month based on rough AWS Pricing Calculator estimates using static read/write unit counts - not simulated traffic load. The gap was a function of actual RCU and WCU consumption under real traffic patterns, including the write path through SQS, which on-demand pricing captures fully.

💡 Cost finding - DynamoDB on-demand at baseline load

Simulated estimate: $76,400/month. Team's Pricing Calculator estimate: $12,000/month. The delta is attributable to on-demand pricing capturing full WCU consumption on the SQS-driven write path, which the static estimate had modelled at 10% of actual write volume. DynamoDB on-demand was removed from further consideration before Wave or Spike simulations ran.

The recommendation surfaced at this point was direct: switch to DynamoDB provisioned capacity with auto-scaling. The recommendation included suggested starting RCU/WCU values based on the 8K RPS simulation profile and noted the hot partition key risk given the product ID access pattern.

✦ Recommendations - canvas-a-dynamodb-v1 - 1 recommendation

AI Panel - Figure 4

AI Recommendation: Switch to DynamoDB Provisioned Capacity

Screenshot: Recommendations panel showing one recommendation card. Card title: "Switch DynamoDB to provisioned capacity". Body: recommendation text, suggested RCU (72,000) and WCU (18,000) values for 8K RPS baseline, auto-scaling policy suggestion (scale-out 70% utilisation), hot partition key warning for product-id access pattern. "Apply recommendation" button visible. Estimated monthly cost after change: shown in green.

Figure 4 - Recommendations panel: pinpole recommends switching to DynamoDB provisioned capacity with suggested RCU/WCU values and auto-scaling policy. Hot partition key warning included.

Finding 2 - Aurora Serverless v2 ACU cold-start latency under spike load

With DynamoDB on-demand eliminated, the comparison narrowed to DynamoDB provisioned vs Aurora Serverless v2. Both performed well under the Constant pattern. The Wave pattern at 8K → 40K RPS is where Aurora Serverless v2 revealed a behaviour the team had not accounted for.

Aurora Serverless v2's ACU scale-up is not instantaneous. When the Wave simulation ramped from 8K to 40K RPS over a ten-minute period, the p99 latency for Aurora rose from a baseline of 12ms to 47ms during the ACU scale-out window - a 3–5 minute period during which the cluster was operating beyond its provisioned capacity while new ACUs came online.

simulation / canvas-b-aurora-v1 / wave-8k-40k-rps complete · 2 warnings

📈

Simulation Panel - Figure 5

Aurora Serverless v2 - Wave Pattern (8K → 40K RPS)

Screenshot: Simulation output showing wave traffic pattern visualisation. Aurora Serverless v2 node highlighted with amber warning indicator. Per-node latency graph showing p99 spike from 12ms to 47ms during ACU scale-out window (approx minutes 12–17 of simulation). Warning badge on Aurora node. Recommendations showing 2 recommendations pending. Live cost estimate visible in header tile.

12ms

p99 Baseline

47ms

p99 Scale-Out Peak

$8,200

Est. / Month (Wave avg)

Warnings

Figure 5 - Aurora Serverless v2 under Wave pattern: ACU scale-out latency spike to 47ms p99 visible during the ramp phase. Two warnings flagged by simulation.

The recommendation for this finding was to raise the minimum ACU to 4 (from 0.5) to ensure the cluster is always warm enough to absorb wave ramp load without a latency spike. This change increases the cost floor by approximately $280/month but eliminates the ACU cold-start window. A second recommendation advised enabling cluster read replicas to distribute read load during peak, reducing per-primary-instance load pressure during scale-out.

⚠ Aurora Serverless v2 ACU minimum configuration

With minimum ACU set to 0.5, Aurora Serverless v2 exhibits a 3–5 minute latency degradation window during rapid traffic ramps. Setting minimum ACU to 4 eliminates this at a cost of approximately $280/month. For any traffic profile that includes wave or ramp patterns, minimum ACU should be sized to the trough-to-peak ramp rate, not the trough volume.

The same ACU minimum fix was applied to Canvas B, and the Wave simulation was re-run. With minimum ACU set to 4, the p99 latency during the ramp window dropped from 47ms to 14ms - a canvas change that took under two minutes and was validated by re-simulation before any infrastructure was provisioned.

Finding 3 - The campaign spike reveals the economic crossover

The Spike simulation at 120,000 RPS was where the decision became clear. Both architectures were now properly configured based on the Constant and Wave learnings. The Spike pattern revealed the economic crossover that the team had expected to find - but at a different threshold than anticipated.

Configuration	p50 Latency	p99 (Spike Peak)	Throttle Events	Est. Monthly Cost	Simulation Result
DynamoDB on-demand	2ms	8ms	0	$136,000+	ELIMINATED
DynamoDB provisioned (auto-scale)	2ms	31ms	~1,200	$9,800	WARNING
DynamoDB provisioned (headroom config)	2ms	9ms	0	$12,400	HEALTHY
Aurora Serverless v2 (min 4 ACU + replica)	11ms	28ms	0	$8,200	HEALTHY

DynamoDB provisioned with auto-scaling configured at 70% utilisation showed throttle events under the instantaneous 15× spike - the auto-scaler cannot respond fast enough to a near-vertical traffic ramp. The fix was to provision with explicit headroom: capacity configured at 150% of expected peak rather than relying on auto-scale to keep up with burst. This is a well-known DynamoDB operational pattern, but it is not discoverable without a spike simulation. The cost delta between the auto-scale config and the headroom config was $2,600/month - an acceptable trade for zero campaign-day throttling.

Execution History - Version Comparison canvas-a-dynamodb · 4 versions

History Panel - Figure 6

Version Comparison: DynamoDB Auto-Scale vs Headroom Config

Screenshot: pinpole Execution History panel showing side-by-side comparison of canvas-a-dynamodb-v2 (auto-scale config, 70% trigger) vs canvas-a-dynamodb-v3 (headroom config, 150% peak capacity). Metrics delta visible: throttle events 1,200 → 0, p99 spike latency 31ms → 9ms, monthly cost $9,800 → $12,400. Version timestamp and simulation run ID visible for each. Comparison diff shows changed DynamoDB capacity unit values highlighted.

Figure 6 - pinpole Execution History: version comparison between DynamoDB auto-scale and headroom configurations. Cost delta of $2,600/month visible alongside the throttle event and latency improvements.

Aurora Serverless v2 - with the minimum ACU correction and a read replica added - handled the 120K RPS spike cleanly. p99 under spike was 28ms, no throttle events, and the monthly cost estimate of $8,200 reflects ACU-based pricing that automatically scales down during trough periods. The provisioned DynamoDB headroom configuration costs $12,400/month because over-provisioned capacity units are always billed regardless of actual consumption.

💡 The economic crossover point

For this traffic profile - 8K baseline, 40K weekly peak, 120K spike - Aurora Serverless v2 is $4,200/month cheaper than DynamoDB provisioned with spike headroom. The crossover is driven by Aurora's ACU-based pricing model, which scales to actual demand during trough periods rather than billing for provisioned capacity that isn't being used. Over 12 months, the Aurora Serverless v2 option saves approximately $50,400 at the traffic projections modelled.

Cost comparison

Initial assumption - DynamoDB on-demand

DynamoDB on-demand (eliminated)

DynamoDB read units (8K RPS baseline)$41,200

DynamoDB write units (SQS path)$28,600

Lambda, API Gateway, CloudFront$3,800

ElastiCache, Route 53, WAF$2,800

Monthly estimate $76,400

Selected - Aurora Serverless v2 (min 4 ACU + replica)

Aurora Serverless v2 - wave/spike optimized

Aurora Serverless v2 ACU (avg utilisation)$3,100

RDS Proxy (2 AZ endpoints)$420

Aurora read replica (1×)$680

Lambda, API Gateway, CloudFront$3,800

ElastiCache, Route 53, WAF$2,200

Monthly estimate $8,200 / mo

The comparison above reflects the team's initial working assumption (DynamoDB on-demand) versus the selected architecture (Aurora Serverless v2, wave/spike optimized). The monthly cost difference of $68,200/month was identified entirely through pre-deployment simulation - no infrastructure provisioned, no AWS bill incurred during the evaluation.

The decision and how it was made

The final architecture selection was Aurora Serverless v2 - but not primarily on cost. The cost finding was significant, but the engineering lead's reasoning was more nuanced:

Why Aurora Serverless v2 was selected

ACU-based pricing matches the traffic profile - trough periods are genuinely cheap; on-demand DynamoDB is not
SQL compatibility preserves the downstream analytics query interface that the data team already uses
Simulation validated that the ACU cold-start issue is solvable with minimum ACU configuration - not an architectural ceiling
p99 of 28ms under 120K RPS spike is acceptable for a catalogue API; sub-10ms is not a hard requirement
RDS Proxy provides the Lambda connection pooling solution; no additional operational complexity vs DynamoDB at this scale

When DynamoDB provisioned would have been selected

If p99 latency under 10ms at all load tiers was a hard requirement
If the traffic profile was constant (not wave/spike) - provisioned capacity pricing is more competitive at steady sustained load
If the data model had no relational query requirements and the analytics team had no existing SQL dependency
If the team had higher confidence in traffic forecasting, making over-provisioning headroom a known and bounded cost

The decision was documented as a pinpole canvas version and presented to the CTO with the simulation history as the evidence base. Three canvases, eight simulation runs, and an execution history showing every configuration iteration from initial assumption to final selected architecture.

🖼

Canvas Image - Figure 7

Final Selected Architecture - Canvas B v3

Screenshot: Canvas B final state (canvas-b-aurora-v3). Shows complete architecture: Route 53 → CloudFront (WAF attached) → API Gateway → Lambda (catalogue-api, 512 MB, 300 concurrency) → RDS Proxy (2 AZ endpoints, connection pool config visible) → Aurora Serverless v2 (catalogue-cluster, min 4 ACU, max 64 ACU) + Aurora Read Replica. ElastiCache on cache-aside path. SQS write path. All nodes green. Live cost estimate showing $8,200/mo. "Deploy to AWS" button visible. Version badge: v3.

Route 53

API Gateway

Lambda

RDS Proxy (2-AZ)

Aurora SV2 (min 4 ACU)

Aurora Replica

$8,200 / mo

Figure 7 - Final selected canvas: Aurora Serverless v2 with minimum 4 ACU, dual-AZ RDS Proxy, and read replica. All nodes healthy. Ready to deploy.

Outcome and post-deployment validation

The Aurora Serverless v2 architecture was deployed to ST and UAT environments using pinpole's deployment workflow. Production was promoted two weeks later. The first real AWS bill came in at $8,640/month - within 5.4% of the pinpole simulation estimate of $8,200.

5.4%

Variance between simulation estimate and first real bill

Throttle events on first campaign day (92K RPS peak, within spike headroom)

$68K

Monthly savings vs initial DynamoDB on-demand assumption

On the first promotional campaign following go-live, the catalogue API hit 92,000 RPS - below the 120K spike ceiling modelled in simulation, but the first real data point validating that the architecture held under load. Aurora Serverless v2 scaled to 38 ACUs at peak. p99 latency was 24ms - slightly better than the 28ms simulation estimate, attributable to the cache-hit rate being higher in production than modelled. Zero throttle events. Zero incidents.

The simulation didn't just find a cost saving. It changed the conversation. Instead of debating architecture options in a design review, we were reviewing simulation evidence. That's a different kind of meeting.

A note on simulation methodology

Simulation results are design-time projections, not production measurements. The 5.4% cost variance in this case study is favourable, but teams should treat simulation estimates as directionally reliable guidance rather than operationally authoritative figures. The value of pre-deployment simulation is not precision - it is the ability to surface order-of-magnitude differences, catch architectural gaps like the single-AZ RDS Proxy issue, and eliminate options that are clearly unviable (DynamoDB on-demand at 8K+ RPS sustained) before a dollar is spent on infrastructure.

All architectures selected in simulation should still be promoted through ST and UAT environments and validated with post-deployment load testing before production. Pinpole narrows the decision space and eliminates the surprises. It does not replace the deployment pipeline.

Make your next database selection with simulation data, not assumptions.

Two canvases, three traffic patterns, eight simulation runs. One afternoon. The free tier includes 5 simulations per month - no credit card required. The DynamoDB on-demand cost finding alone is worth the session.

Start your free trial →

All cost estimates are simulation projections generated by pinpole pre-deployment traffic simulation. Actual AWS costs depend on region, reserved instance coverage, support plans, and other factors. Post-deployment variance of 5.4% is reported for this specific case study and is not a guaranteed accuracy level.

Tags: AWS · DynamoDB · Aurora · Aurora Serverless v2 · Database Selection · Pre-Deploy Simulation · Cost Optimization · Lambda · RDS Proxy · pinpole

DynamoDB vs Aurora:Simulation-Led Selection

The situation

Canvas setup and architecture

Simulation methodology

Constant - 8,000 RPS (baseline)

Wave - 8,000 → 40,000 RPS (weekly traffic cycle)

Spike - 8,000 → 120,000 RPS (campaign day burst)

Finding 1 - DynamoDB on-demand is prohibitively expensive at sustained peak

Finding 2 - Aurora Serverless v2 ACU cold-start latency under spike load

Finding 3 - The campaign spike reveals the economic crossover

Cost comparison

DynamoDB on-demand (eliminated)

Aurora Serverless v2 - wave/spike optimized

The decision and how it was made

Why Aurora Serverless v2 was selected

When DynamoDB provisioned would have been selected

Outcome and post-deployment validation

A note on simulation methodology

Make your next database selection with simulation data, not assumptions.

Related

DynamoDB vs Aurora:
Simulation-Led Selection