At some point in your AWS career, you have sat in front of a real AWS bill and experienced a specific kind of dread — not because the number was catastrophic, but because it was entirely different from the number you had told your CTO it would be. The architecture looked right. The Pricing Calculator said one thing. The actual bill said another.
This is that story, and what we did to stop it from happening again.
Our team was preparing to launch a new payment processing API — a serverless stack running Lambda behind API Gateway, writing to DynamoDB, with SQS decoupling the write path and CloudFront absorbing the read load. Projected transaction volume at launch was 1,200 requests per second steady state, with a foreseeable spike to 8,000 RPS on payroll days.
Our previous cost estimation method was: open the AWS Pricing Calculator, enter the expected Lambda invocations per month, pick an average duration, add up the DynamoDB reads and writes, and call it done. The problem is that this method tells you nothing about what your architecture costs under spike conditions — and spike conditions are when serverless architectures behave most unlike the static model.
The pinpole simulation came within 1.7% of actual. The AWS Pricing Calculator was off by 4.3%. More importantly: the pinpole simulation identified two architectural issues that would have cost an additional $8,400 per month — issues that the Pricing Calculator could not have found, because it does not simulate traffic, interaction effects, or per-service bottleneck behaviour.
The Architecture Under Test
The payment processing API is a read-heavy workload with a write-path that must never drop a transaction. The critical path — the path where every failure has financial consequence — is:
Write path: API Gateway → Lambda (write path) → SQS → Lambda (processor) → DynamoDB
Supporting services include WAF in front of CloudFront and SNS fanning out transaction confirmation events to downstream notification services. The design had grown organically over four months of development and had never had its cost profile validated end-to-end under load.
Fig. 1 — The complete payment processing architecture on the pinpole canvas, prior to first simulation run.
Why the AWS Pricing Calculator Under-Served Us
The Pricing Calculator is not a bad tool. It is a good tool being used to answer the wrong question. When an engineer opens it to estimate monthly spend, they are implicitly answering the question: what does this architecture cost at a static, average request rate? That is not the question we needed answered.
The questions we needed answered were:
- What does this architecture cost on a payroll day, when DynamoDB receives 8,000 write requests per second for a two-hour window?
- What does our Lambda concurrency cost when the spike hits and provisioned concurrency is — or is not — configured?
- What happens to ElastiCache cost if our Lambda read path exceeds the cache's connection limit and we fall through to DynamoDB at full RPS?
- What is the actual per-service cost breakdown at real traffic so we can identify the dominant cost drivers before provisioning?
None of these questions are answerable from a spreadsheet. They require a simulation that propagates load through the architecture and reports per-service metrics — which is exactly what pinpole provides.
AWS Pricing Calculator estimated our Lambda cost at approximately $5,400/month based on 1,200 RPS × 86,400 seconds × 30 days × average 200ms duration. What it did not model: provisioned concurrency cost ($2,100/month), the 47% spike overhead on payroll days, or the 310ms average duration increase when ElastiCache connections saturate and Lambda falls through to DynamoDB. These three effects account for the $2,360 monthly delta between the calculator and reality.
Setting Up the Simulation in pinpole
Before running any simulations, I spent about an hour on the canvas getting the architecture to match production intent precisely. This is not bookkeeping — the accuracy of cost estimation depends directly on the accuracy of the configuration inputs.
Canvas build — place and connect all services
Drag all services onto the canvas, wire the connections. pinpole validates compatibility and directionality in real time — invalid connections are blocked before they are created. This alone caught a misconfigured WAF placement that had been in our draw.io diagram for two months.
Configure each node to production-realistic values
Open the configuration panel for each service. Lambda: runtime Node.js 20, memory 1769MB, timeout 30s, reserved concurrency 500, provisioned concurrency 80. DynamoDB: on-demand mode (later tested provisioned). ElastiCache: cache.r7g.xlarge, 2 shards. API Gateway: Regional, 10K RPS burst limit. Each configuration value affects the cost model.
Run a Constant traffic baseline
First simulation: Constant pattern at 1,200 RPS — our expected steady-state load. This establishes the baseline cost and confirms node health at daily average load. This is the scenario most closely approximated by the Pricing Calculator, so it is also a calibration check against our static estimate.
Run a Spike simulation for payroll-day conditions
Second simulation: Spike pattern at 8,000 RPS peak. This is the scenario that matters most for cost and the scenario the Pricing Calculator cannot model. Lambda concurrency surges, DynamoDB RCUs and WCUs spike, SQS message throughput climbs. The cost estimate updates in real time as the spike progresses.
Request recommendations and iterate
After each simulation run, request recommendations. For a cost estimation session, the recommendations surfaced two findings that materially affected the cost model — findings we address in the next section. Each applied recommendation updates the canvas, and the cost estimate re-runs against the new configuration.
Fig. 2 — Lambda read-path function configuration. Note provisioned concurrency set to 80 — this is one of the cost line items absent from our initial Pricing Calculator estimate.
Baseline Simulation: 1,200 RPS Constant
The baseline simulation ran at 1,200 RPS for a simulated 24-hour period using the Constant traffic pattern. All nodes reported healthy. The per-service cost breakdown from the simulation panel closely tracked the Pricing Calculator estimate for the same load — a useful sanity check that the canvas was modelling our architecture faithfully.
| Service | Simulated Monthly Cost | Pricing Calc Estimate | Delta | Notes |
|---|---|---|---|---|
| Lambda (read path) | $4,180 | $4,050 | +$130 | Provisioned concurrency accounted for in simulation, not in calc |
| Lambda (write path) | $1,210 | $1,190 | +$20 | Consistent; write path has minimal concurrency overhead |
| Lambda (SQS processor) | $880 | $860 | +$20 | Consistent |
| API Gateway | $3,110 | $3,110 | $0 | Per-request pricing; static model is accurate here |
| DynamoDB | $9,440 | $9,600 | −$160 | Simulation captured ElastiCache read-through effect |
| ElastiCache | $2,890 | $2,890 | $0 | Fixed instance cost; consistent |
| SQS | $640 | $620 | +$20 | Minor difference in message count modelling |
| CloudFront + WAF | $4,200 | $4,200 | $0 | Consistent |
| Total (baseline) | $26,550 | $26,520 | +$30 | Effectively identical at steady state — as expected |
The baseline result confirmed what we expected: at constant 1,200 RPS, the simulation and the static calculator converge. The divergence — and the value — becomes visible the moment you introduce spike traffic.
Fig. 3 — Baseline constant-load simulation at 1,200 RPS. All nodes healthy; monthly cost estimate tracking to $26,550. This result validates the canvas model before the spike test.
Spike Simulation: 8,000 RPS Payroll-Day Scenario
The spike simulation is where the cost model diverges from any static estimate — and where the architectural issues we had not anticipated became visible.
I configured the Spike pattern to: ramp from 1,200 RPS baseline to 8,000 RPS peak over 90 seconds, hold peak for 120 minutes (representing a payroll processing window), then return to baseline. This is a conservative model of our actual payroll-day traffic profile. The simulation ran for a full simulated 24-hour period with four spike events, approximating real payroll day density.
Two things happened that the Pricing Calculator had no mechanism to predict:
Issue 1 — ElastiCache connection saturation under spike
At 8,000 RPS, the Lambda read path was generating more concurrent connections to ElastiCache than the cache.r7g.xlarge cluster could service. The simulation reported ElastiCache health degrading to WARNING at approximately 6,200 RPS — well below our peak target. Requests that missed the cache were falling through to DynamoDB, which was now absorbing both the normal DynamoDB read load and the overflow from a saturated cache tier.
The consequence: DynamoDB read unit consumption spiked from the baseline model's 9,440 RCUs/month equivalent to an estimated 14,800 RCU/month equivalent on payroll days. The DynamoDB cost delta alone was $2,170 per month — invisible to any static estimator.
Issue 2 — Lambda provisioned concurrency undersizing
With provisioned concurrency set to 80, the spike to 8,000 RPS created a burst-demand for far more concurrent Lambda environments than we had pre-warmed. The simulation surfaced Lambda latency spiking to 1,840ms p99 during the ramp-up phase as cold starts ran in parallel across the pool. The recommendation engine flagged this immediately and suggested raising provisioned concurrency to 200 — with a corresponding monthly cost increase of $1,650, but a latency improvement to under 200ms p99 during spike.
In the original Pricing Calculator model, we had entered zero for provisioned concurrency because we had not yet decided whether to use it. The $1,650 cost was entirely absent from our estimate.
The spike simulation identified two architectural issues with concrete cost consequences that were invisible to the static Pricing Calculator model: ElastiCache connection saturation driving DynamoDB overflow ($2,170/month) and insufficient Lambda provisioned concurrency ($1,650/month at corrected sizing). Neither issue appeared at steady-state load. Both were identified, quantified, and resolved before a single AWS resource was provisioned.
Fig. 4 — Spike simulation at 8,000 RPS peak. ElastiCache connection saturation is visible as a WARNING on the node. DynamoDB is absorbing the overflow, driving the cost estimate above the static model. Lambda p99 latency has spiked to 1,840ms from cold starts on provisioned concurrency exhaustion.
Recommendations and Cost Recalculation
After the spike simulation, I requested recommendations. The pinpole recommendation engine returned four findings for this architecture, prioritised by severity. Two were cost-consequential:
Scale ElastiCache to cache.r7g.2xlarge + add read replica WARNING
Upsizing from r7g.xlarge to r7g.2xlarge and adding one read replica increases connection headroom sufficient to serve the 8,000 RPS spike without saturation. Additional monthly cost: $1,480. Monthly savings from eliminating DynamoDB overflow: $2,170. Net monthly saving: $690, plus the latency improvement on the read path during spikes.
Increase Lambda provisioned concurrency to 200 WARNING
At 80 provisioned concurrent environments, the spike to 8,000 RPS requires Lambda to cold-start hundreds of environments in parallel. Raising provisioned concurrency to 200 eliminates the p99 latency spike during the initial burst. Monthly cost increase: $1,650. p99 latency under spike: reduced from 1,840ms to under 200ms. This cost must be in the model.
Enable DynamoDB auto-scaling INFO
Running DynamoDB in on-demand mode during launch is the correct call — on-demand absorbs burst without capacity planning. As traffic patterns stabilise, switching to provisioned with auto-scaling will reduce costs by approximately $800–$1,200/month at our RCU/WCU profile. Flagged for a 60-day review post-launch.
Add CloudFront TTL for read-path API responses INFO
A significant proportion of read-path Lambda invocations at 8,000 RPS are for balance enquiries — highly cacheable responses with a 30-second TTL that CloudFront can absorb. Estimated invocation reduction: 35% at peak. Monthly Lambda saving: $780. This also reduces provisioned concurrency requirement from 200 to approximately 150, saving a further $410/month.
I applied all four recommendations directly from the recommendation panel to the canvas and re-ran the spike simulation with the updated configuration. The revised estimate stabilised at $38,840/month — including the corrected ElastiCache sizing, the increased provisioned concurrency, and the CloudFront TTL cache benefit.
Fig. 5 — Recommendations panel after spike simulation. The two WARNING-severity findings drive the corrected cost model. Both recommendations include quantified monthly cost impact, not just qualitative guidance.
Before and After: The Architecture That Saved $8,400
The phrase "the architecture that saved $8,400" requires unpacking. The $8,400 is the annual cost of the architectural issues the static Pricing Calculator missed — the combined effect of ElastiCache overflow driving DynamoDB cost, undersized Lambda provisioned concurrency, and the Pricing Calculator's inability to model the payroll-day spike profile at all.
Architecture as designed (pre-simulation)
Architecture after recommendations applied
The $2,360 monthly difference compounds over twelve months to $28,320. But the more material number is this: the original configuration would have launched with a DynamoDB cost line that was 23% above forecast due to ElastiCache overflow — and the first month that the CTO asked why the AWS bill was higher than the model, there would have been no good answer. Because the model had not modelled the part that mattered.
Execution History: The Audit Trail for Architecture Decisions
Every simulation run is saved automatically to pinpole's Execution History. For this architecture, by the time we were ready to deploy, we had twelve simulation runs on record — the baseline, four spike iterations with different ElastiCache configurations, three Lambda concurrency variations, two DynamoDB mode comparisons (on-demand vs. provisioned), and the final validated run.
The execution history serves a function that extends beyond cost estimation. When we presented the architecture to our CTO and CFO for sign-off, we did not present a spreadsheet. We shared the execution history view, which shows every simulation run, its configuration, its traffic pattern, its health status, and its estimated monthly cost. The optimisation journey is visible — the CFO could see precisely why we had sized ElastiCache the way we had, and what the cost consequence of the alternative had been.
For the first time, we had a cost model that was a direct output of the architecture, not a separate artefact that someone had to maintain in parallel. The architecture was the model.
Fig. 6 — Execution History showing 12 simulation runs. The cost column shows the optimisation journey from $41,200 (initial config) to $38,840 (validated config). Each run is a complete architecture snapshot — selecting any row opens the exact canvas state at that point.
Deploying the Validated Architecture
Once the spike simulation passed — all WARNING-severity recommendations addressed, estimated monthly cost at $38,840, p99 latency under 200ms at 8,000 RPS peak — we pushed through pinpole's deployment pipeline.
The deployment workflow ran: Canvas → ST (System Test) → UAT → PR (Production). The ST and UAT environments confirmed the architecture in live AWS accounts before production traffic was at risk. The real Lambda cold starts, real DynamoDB read/write latency, and real ElastiCache connection behaviour in ST and UAT all matched the simulation closely — a further validation of the simulation model's fidelity.
Month one production bill: $39,510. pinpole simulation prediction: $38,840. Delta: $670 (1.7%). The delta is attributable to slightly higher than modelled SNS fan-out volume from downstream notification services — a usage pattern we have since added to the canvas model for future simulations. Original Pricing Calculator estimate: $41,200. Difference from actual: $1,690 (4.3%), primarily due to unmodelled spike overhead on Lambda and DynamoDB.
What Changed in How We Work
The technical outcome — a 1.7% cost prediction accuracy — is the headline. The operational change is more durable. Three things are now different:
- Cost estimation is part of the architecture review, not a separate spreadsheet exercise. The simulation runs as part of every architecture session. When a service is added or reconfigured, the cost estimate updates immediately in the canvas. There is no separate "cost model" to maintain.
- Spike conditions are part of the cost model by default. We run a Spike simulation on every new architecture and every major change. The steady-state cost is the floor; the spike-weighted monthly cost is the number we commit to.
- Architecture decisions are documented with their cost rationale. The execution history is the evidence trail. Every significant configuration change has a simulation run attached to it, with the cost consequence of the alternative visible. The CFO can ask why we made a decision — and there is an answer that does not depend on anyone's memory.
To be clear: the AWS Pricing Calculator is a useful tool for its intended purpose, which is rough monthly cost estimation at a static request rate. Its limitation is not accuracy at steady state — it is the inability to model spike behaviour, service interaction effects, and the architectural choices that only become cost-relevant under load. These are the most expensive architectural decisions you can get wrong. Pre-deployment simulation addresses them directly, before the bill arrives.
Pre-Deployment Cost Estimation Checklist
The workflow we now follow for every new serverless architecture before a single resource is provisioned:
- Build the full architecture on the canvas — including every service in the critical path and supporting services. Partial models produce partial cost estimates.
- Configure every node to production-realistic values — Lambda memory, timeout, reserved and provisioned concurrency. DynamoDB mode (on-demand vs. provisioned). ElastiCache instance size. These values are the inputs to the cost model.
- Run a Constant baseline at expected steady-state RPS — validates the canvas and establishes the cost floor. Cross-check against any existing Pricing Calculator estimate as a sanity check.
- Run a Spike simulation at your expected peak RPS — this is the critical run. Model your actual spike profile: what is the peak? How fast does it ramp? How long does it sustain? Payroll days, campaign drops, launch traffic are all spike scenarios with different shape and duration.
- Review per-node health during the spike — any WARNING or CRITICAL status during spike conditions is both a performance risk and a cost signal. Degraded nodes are often costing more than the model assumes because of fallback behaviour (like ElastiCache overflow to DynamoDB).
- Request recommendations and apply all WARNING-severity findings — the recommendations are prioritised. WARNING-severity findings have cost or performance consequences that affect whether the architecture is production-viable. Apply them to the canvas and re-run the spike simulation.
- Use the spike-weighted monthly cost as your budget commitment — not the steady-state cost. The number you take to your CTO or CFO should reflect the cost profile at peak load, not average load.
- Snapshot the validated run in execution history before deployment — this is the design artefact. The simulation run that precedes deployment is the documented justification for every configuration value in the architecture.
Know your AWS costs before you provision a single resource.
pinpole runs pre-deployment traffic simulation across your full serverless architecture — generating per-service cost breakdowns, spike-condition cost estimates, and recommendations with quantified monthly impact, all before you write a CloudFormation template or run a Terraform apply.
Start your first simulation free →