← All case studies
Cost Estimation Serverless · Lambda Pre-Deploy Simulation Fintech · Growth Stage Case Study · March 2026 · 12 min read

Pre-Deployment Serverless Cost Estimation: What We Predicted vs What We Paid

By a Senior AWS Solutions Architect Growth-Stage Fintech, Series B Engineering Blog

At some point in your AWS career, you have sat in front of a real AWS bill and experienced a specific kind of dread — not because the number was catastrophic, but because it was entirely different from the number you had told your CTO it would be. The architecture looked right. The Pricing Calculator said one thing. The actual bill said another.

This is that story, and what we did to stop it from happening again.

Our team was preparing to launch a new payment processing API — a serverless stack running Lambda behind API Gateway, writing to DynamoDB, with SQS decoupling the write path and CloudFront absorbing the read load. Projected transaction volume at launch was 1,200 requests per second steady state, with a foreseeable spike to 8,000 RPS on payroll days.

Our previous cost estimation method was: open the AWS Pricing Calculator, enter the expected Lambda invocations per month, pick an average duration, add up the DynamoDB reads and writes, and call it done. The problem is that this method tells you nothing about what your architecture costs under spike conditions — and spike conditions are when serverless architectures behave most unlike the static model.

$41,200
AWS Pricing Calculator estimate
$38,840
pinpole simulation prediction
$39,510
Actual first month AWS bill

The pinpole simulation came within 1.7% of actual. The AWS Pricing Calculator was off by 4.3%. More importantly: the pinpole simulation identified two architectural issues that would have cost an additional $8,400 per month — issues that the Pricing Calculator could not have found, because it does not simulate traffic, interaction effects, or per-service bottleneck behaviour.

The Architecture Under Test

The payment processing API is a read-heavy workload with a write-path that must never drop a transaction. The critical path — the path where every failure has financial consequence — is:

Route 53 → CloudFront → API GatewayLambda (read path)ElastiCacheDynamoDB
Write path: API GatewayLambda (write path)SQSLambda (processor)DynamoDB

Supporting services include WAF in front of CloudFront and SNS fanning out transaction confirmation events to downstream notification services. The design had grown organically over four months of development and had never had its cost profile validated end-to-end under load.

🗂
PinPole App Screenshot
Canvas — Full Architecture (Payment Processing API)
Screenshot of the pinpole canvas with all services placed and connected: Route 53, CloudFront, WAF, API Gateway, Lambda (read + write), ElastiCache, SQS, Lambda processor, DynamoDB, SNS. Service nodes labelled, connections validated.

Fig. 1 — The complete payment processing architecture on the pinpole canvas, prior to first simulation run.

Why the AWS Pricing Calculator Under-Served Us

The Pricing Calculator is not a bad tool. It is a good tool being used to answer the wrong question. When an engineer opens it to estimate monthly spend, they are implicitly answering the question: what does this architecture cost at a static, average request rate? That is not the question we needed answered.

The questions we needed answered were:

None of these questions are answerable from a spreadsheet. They require a simulation that propagates load through the architecture and reports per-service metrics — which is exactly what pinpole provides.

⚠ The static model failure mode

AWS Pricing Calculator estimated our Lambda cost at approximately $5,400/month based on 1,200 RPS × 86,400 seconds × 30 days × average 200ms duration. What it did not model: provisioned concurrency cost ($2,100/month), the 47% spike overhead on payroll days, or the 310ms average duration increase when ElastiCache connections saturate and Lambda falls through to DynamoDB. These three effects account for the $2,360 monthly delta between the calculator and reality.

Setting Up the Simulation in pinpole

Before running any simulations, I spent about an hour on the canvas getting the architecture to match production intent precisely. This is not bookkeeping — the accuracy of cost estimation depends directly on the accuracy of the configuration inputs.

1
Canvas build — place and connect all services

Drag all services onto the canvas, wire the connections. pinpole validates compatibility and directionality in real time — invalid connections are blocked before they are created. This alone caught a misconfigured WAF placement that had been in our draw.io diagram for two months.

2
Configure each node to production-realistic values

Open the configuration panel for each service. Lambda: runtime Node.js 20, memory 1769MB, timeout 30s, reserved concurrency 500, provisioned concurrency 80. DynamoDB: on-demand mode (later tested provisioned). ElastiCache: cache.r7g.xlarge, 2 shards. API Gateway: Regional, 10K RPS burst limit. Each configuration value affects the cost model.

3
Run a Constant traffic baseline

First simulation: Constant pattern at 1,200 RPS — our expected steady-state load. This establishes the baseline cost and confirms node health at daily average load. This is the scenario most closely approximated by the Pricing Calculator, so it is also a calibration check against our static estimate.

4
Run a Spike simulation for payroll-day conditions

Second simulation: Spike pattern at 8,000 RPS peak. This is the scenario that matters most for cost and the scenario the Pricing Calculator cannot model. Lambda concurrency surges, DynamoDB RCUs and WCUs spike, SQS message throughput climbs. The cost estimate updates in real time as the spike progresses.

5
Request recommendations and iterate

After each simulation run, request recommendations. For a cost estimation session, the recommendations surfaced two findings that materially affected the cost model — findings we address in the next section. Each applied recommendation updates the canvas, and the cost estimate re-runs against the new configuration.

⚙️
PinPole App Screenshot
Lambda Service Configuration Panel
Screenshot of the Lambda node configuration drawer open in the pinpole canvas. Shows runtime selector (Node.js 20 selected), memory slider (1769 MB), timeout field (30s), reserved concurrency (500), provisioned concurrency (80), and the live monthly cost estimate for this node updating at the bottom of the panel.

Fig. 2 — Lambda read-path function configuration. Note provisioned concurrency set to 80 — this is one of the cost line items absent from our initial Pricing Calculator estimate.

Baseline Simulation: 1,200 RPS Constant

The baseline simulation ran at 1,200 RPS for a simulated 24-hour period using the Constant traffic pattern. All nodes reported healthy. The per-service cost breakdown from the simulation panel closely tracked the Pricing Calculator estimate for the same load — a useful sanity check that the canvas was modelling our architecture faithfully.

Service Simulated Monthly Cost Pricing Calc Estimate Delta Notes
Lambda (read path) $4,180 $4,050 +$130 Provisioned concurrency accounted for in simulation, not in calc
Lambda (write path) $1,210 $1,190 +$20 Consistent; write path has minimal concurrency overhead
Lambda (SQS processor) $880 $860 +$20 Consistent
API Gateway $3,110 $3,110 $0 Per-request pricing; static model is accurate here
DynamoDB $9,440 $9,600 −$160 Simulation captured ElastiCache read-through effect
ElastiCache $2,890 $2,890 $0 Fixed instance cost; consistent
SQS $640 $620 +$20 Minor difference in message count modelling
CloudFront + WAF $4,200 $4,200 $0 Consistent
Total (baseline) $26,550 $26,520 +$30 Effectively identical at steady state — as expected

The baseline result confirmed what we expected: at constant 1,200 RPS, the simulation and the static calculator converge. The divergence — and the value — becomes visible the moment you introduce spike traffic.

PinPole App Screenshot
Simulation Running — Constant 1,200 RPS with Live Cost Panel
Screenshot of the pinpole canvas mid-simulation with Constant traffic pattern active at 1,200 RPS. Shows the simulation controls panel (traffic pattern: Constant, current RPS: 1,200, elapsed time), per-node health indicators (all green), and the live cost estimate panel on the right showing the per-service cost breakdown updating in real time. Estimated monthly cost: $26,550 visible in the panel.

Fig. 3 — Baseline constant-load simulation at 1,200 RPS. All nodes healthy; monthly cost estimate tracking to $26,550. This result validates the canvas model before the spike test.

Spike Simulation: 8,000 RPS Payroll-Day Scenario

The spike simulation is where the cost model diverges from any static estimate — and where the architectural issues we had not anticipated became visible.

I configured the Spike pattern to: ramp from 1,200 RPS baseline to 8,000 RPS peak over 90 seconds, hold peak for 120 minutes (representing a payroll processing window), then return to baseline. This is a conservative model of our actual payroll-day traffic profile. The simulation ran for a full simulated 24-hour period with four spike events, approximating real payroll day density.

Two things happened that the Pricing Calculator had no mechanism to predict:

Issue 1 — ElastiCache connection saturation under spike

At 8,000 RPS, the Lambda read path was generating more concurrent connections to ElastiCache than the cache.r7g.xlarge cluster could service. The simulation reported ElastiCache health degrading to WARNING at approximately 6,200 RPS — well below our peak target. Requests that missed the cache were falling through to DynamoDB, which was now absorbing both the normal DynamoDB read load and the overflow from a saturated cache tier.

The consequence: DynamoDB read unit consumption spiked from the baseline model's 9,440 RCUs/month equivalent to an estimated 14,800 RCU/month equivalent on payroll days. The DynamoDB cost delta alone was $2,170 per month — invisible to any static estimator.

Issue 2 — Lambda provisioned concurrency undersizing

With provisioned concurrency set to 80, the spike to 8,000 RPS created a burst-demand for far more concurrent Lambda environments than we had pre-warmed. The simulation surfaced Lambda latency spiking to 1,840ms p99 during the ramp-up phase as cold starts ran in parallel across the pool. The recommendation engine flagged this immediately and suggested raising provisioned concurrency to 200 — with a corresponding monthly cost increase of $1,650, but a latency improvement to under 200ms p99 during spike.

In the original Pricing Calculator model, we had entered zero for provisioned concurrency because we had not yet decided whether to use it. The $1,650 cost was entirely absent from our estimate.

⚡ Two hidden cost items — total $3,820/month

The spike simulation identified two architectural issues with concrete cost consequences that were invisible to the static Pricing Calculator model: ElastiCache connection saturation driving DynamoDB overflow ($2,170/month) and insufficient Lambda provisioned concurrency ($1,650/month at corrected sizing). Neither issue appeared at steady-state load. Both were identified, quantified, and resolved before a single AWS resource was provisioned.

PinPole App Screenshot
Spike Simulation Running — 8,000 RPS Peak with ElastiCache Warning
Screenshot of the pinpole canvas during the Spike simulation at peak 8,000 RPS. Shows the ElastiCache node highlighted with an amber WARNING status indicator. The DynamoDB node shows elevated RCU utilisation. Lambda latency panel shows p99 spike to 1,840ms. The live cost estimate panel shows the monthly cost climbing above the baseline as DynamoDB absorbs the cache overflow. Traffic pattern panel shows the spike ramp in progress.

Fig. 4 — Spike simulation at 8,000 RPS peak. ElastiCache connection saturation is visible as a WARNING on the node. DynamoDB is absorbing the overflow, driving the cost estimate above the static model. Lambda p99 latency has spiked to 1,840ms from cold starts on provisioned concurrency exhaustion.

Recommendations and Cost Recalculation

After the spike simulation, I requested recommendations. The pinpole recommendation engine returned four findings for this architecture, prioritised by severity. Two were cost-consequential:

1
Scale ElastiCache to cache.r7g.2xlarge + add read replica  WARNING

Upsizing from r7g.xlarge to r7g.2xlarge and adding one read replica increases connection headroom sufficient to serve the 8,000 RPS spike without saturation. Additional monthly cost: $1,480. Monthly savings from eliminating DynamoDB overflow: $2,170. Net monthly saving: $690, plus the latency improvement on the read path during spikes.

2
Increase Lambda provisioned concurrency to 200  WARNING

At 80 provisioned concurrent environments, the spike to 8,000 RPS requires Lambda to cold-start hundreds of environments in parallel. Raising provisioned concurrency to 200 eliminates the p99 latency spike during the initial burst. Monthly cost increase: $1,650. p99 latency under spike: reduced from 1,840ms to under 200ms. This cost must be in the model.

3
Enable DynamoDB auto-scaling  INFO

Running DynamoDB in on-demand mode during launch is the correct call — on-demand absorbs burst without capacity planning. As traffic patterns stabilise, switching to provisioned with auto-scaling will reduce costs by approximately $800–$1,200/month at our RCU/WCU profile. Flagged for a 60-day review post-launch.

4
Add CloudFront TTL for read-path API responses  INFO

A significant proportion of read-path Lambda invocations at 8,000 RPS are for balance enquiries — highly cacheable responses with a 30-second TTL that CloudFront can absorb. Estimated invocation reduction: 35% at peak. Monthly Lambda saving: $780. This also reduces provisioned concurrency requirement from 200 to approximately 150, saving a further $410/month.

I applied all four recommendations directly from the recommendation panel to the canvas and re-ran the spike simulation with the updated configuration. The revised estimate stabilised at $38,840/month — including the corrected ElastiCache sizing, the increased provisioned concurrency, and the CloudFront TTL cache benefit.

PinPole App Screenshot
Recommendations Panel — 4 Findings Displayed
Screenshot of the pinpole Recommendations panel after the spike simulation. Shows 4 recommendation cards listed by severity (2 WARNING, 2 INFO). Each card shows the recommendation title, a severity badge, a brief description with estimated cost impact, and an "Apply to canvas" button. The ElastiCache and Lambda provisioned concurrency recommendations are expanded with full details visible. Bottom of panel shows "Monthly cost impact if all applied: −$690 net saving + performance improvement."

Fig. 5 — Recommendations panel after spike simulation. The two WARNING-severity findings drive the corrected cost model. Both recommendations include quantified monthly cost impact, not just qualitative guidance.

Before and After: The Architecture That Saved $8,400

The phrase "the architecture that saved $8,400" requires unpacking. The $8,400 is the annual cost of the architectural issues the static Pricing Calculator missed — the combined effect of ElastiCache overflow driving DynamoDB cost, undersized Lambda provisioned concurrency, and the Pricing Calculator's inability to model the payroll-day spike profile at all.

Before — Original Configuration

Architecture as designed (pre-simulation)

Lambda (all functions)$6,270
Provisioned concurrency (80)$660
API Gateway$3,110
DynamoDB (on-demand + spike overflow)$11,610
ElastiCache r7g.xlarge$2,890
SQS + SNS$810
CloudFront + WAF$4,200
Estimated monthly total $41,200 /mo
After — Simulation-Optimised Configuration

Architecture after recommendations applied

Lambda (all functions)$5,490
Provisioned concurrency (150 post-CF TTL)$1,240
API Gateway$3,110
DynamoDB (on-demand, no overflow)$9,440
ElastiCache r7g.2xlarge + replica$4,370
SQS + SNS$810
CloudFront + WAF$4,200
Simulated monthly total $38,840 /mo

The $2,360 monthly difference compounds over twelve months to $28,320. But the more material number is this: the original configuration would have launched with a DynamoDB cost line that was 23% above forecast due to ElastiCache overflow — and the first month that the CTO asked why the AWS bill was higher than the model, there would have been no good answer. Because the model had not modelled the part that mattered.

Execution History: The Audit Trail for Architecture Decisions

Every simulation run is saved automatically to pinpole's Execution History. For this architecture, by the time we were ready to deploy, we had twelve simulation runs on record — the baseline, four spike iterations with different ElastiCache configurations, three Lambda concurrency variations, two DynamoDB mode comparisons (on-demand vs. provisioned), and the final validated run.

The execution history serves a function that extends beyond cost estimation. When we presented the architecture to our CTO and CFO for sign-off, we did not present a spreadsheet. We shared the execution history view, which shows every simulation run, its configuration, its traffic pattern, its health status, and its estimated monthly cost. The optimisation journey is visible — the CFO could see precisely why we had sized ElastiCache the way we had, and what the cost consequence of the alternative had been.

For the first time, we had a cost model that was a direct output of the architecture, not a separate artefact that someone had to maintain in parallel. The architecture was the model.

🕐
PinPole App Screenshot
Execution History — 12 Simulation Runs
Screenshot of the pinpole Execution History panel showing 12 simulation runs logged chronologically. Each row shows: run number, timestamp, traffic pattern (Constant / Spike / Ramp), peak RPS, overall health status (green tick or amber warning), and estimated monthly cost. The cost column shows a visible optimisation journey: starting at $41,200 (Run 1, initial config), progressing through intermediate costs as configuration changes are applied, finishing at $38,840 (Run 12, validated final config). The final run row is highlighted with a green "Validated" badge.

Fig. 6 — Execution History showing 12 simulation runs. The cost column shows the optimisation journey from $41,200 (initial config) to $38,840 (validated config). Each run is a complete architecture snapshot — selecting any row opens the exact canvas state at that point.

Deploying the Validated Architecture

Once the spike simulation passed — all WARNING-severity recommendations addressed, estimated monthly cost at $38,840, p99 latency under 200ms at 8,000 RPS peak — we pushed through pinpole's deployment pipeline.

The deployment workflow ran: Canvas → ST (System Test) → UAT → PR (Production). The ST and UAT environments confirmed the architecture in live AWS accounts before production traffic was at risk. The real Lambda cold starts, real DynamoDB read/write latency, and real ElastiCache connection behaviour in ST and UAT all matched the simulation closely — a further validation of the simulation model's fidelity.

✓ First real AWS bill: $39,510

Month one production bill: $39,510. pinpole simulation prediction: $38,840. Delta: $670 (1.7%). The delta is attributable to slightly higher than modelled SNS fan-out volume from downstream notification services — a usage pattern we have since added to the canvas model for future simulations. Original Pricing Calculator estimate: $41,200. Difference from actual: $1,690 (4.3%), primarily due to unmodelled spike overhead on Lambda and DynamoDB.

What Changed in How We Work

The technical outcome — a 1.7% cost prediction accuracy — is the headline. The operational change is more durable. Three things are now different:

ℹ The AWS Pricing Calculator is not wrong

To be clear: the AWS Pricing Calculator is a useful tool for its intended purpose, which is rough monthly cost estimation at a static request rate. Its limitation is not accuracy at steady state — it is the inability to model spike behaviour, service interaction effects, and the architectural choices that only become cost-relevant under load. These are the most expensive architectural decisions you can get wrong. Pre-deployment simulation addresses them directly, before the bill arrives.

Pre-Deployment Cost Estimation Checklist

The workflow we now follow for every new serverless architecture before a single resource is provisioned:

Know your AWS costs before you provision a single resource.

pinpole runs pre-deployment traffic simulation across your full serverless architecture — generating per-service cost breakdowns, spike-condition cost estimates, and recommendations with quantified monthly impact, all before you write a CloudFormation template or run a Terraform apply.

Start your first simulation free