The structural problem with how FinOps currently works
FinOps is a discipline built almost entirely on looking backwards. You provision infrastructure, it runs for a billing cycle, and then you analyse what happened. AWS Cost Explorer tells you which services cost the most last month. Trusted Advisor tells you which resources are currently underutilised. Cloud cost management platforms overlay tags and budgets onto a spend that has already occurred.
These are useful tools. They are also the wrong tools for the most consequential cost decisions an engineering team makes — the architectural choices that happen before a single resource is provisioned.
By the time an architecture reaches Cost Explorer, the structural decisions that determine 80% of its cost are already locked in. The choice between DynamoDB on-demand and provisioned capacity. The Lambda memory allocation that determines execution duration. The decision to put CloudFront in front of API Gateway, or not. None of these appear in a cost report as "architectural decision made on Tuesday." They appear as line items that are expensive to change.
A configuration change at design time costs nothing and takes five minutes. The same change in production requires a deployment, possibly a migration, and carries incident risk. FinOps workflows that operate post-deployment are optimising inside a constraint that should never have been set.
This post walks through a real event pipeline architecture that I designed and optimised using PinPole before any infrastructure was provisioned. The three cost findings it surfaced — two from recommendations and one from direct simulation output — represent $3,840 per month in spend that was caught and eliminated before it appeared on an AWS bill. The work took under two hours.
before first deployment
at design time
to reach optimised state
The old workflow and its costs
Before PinPole, my pre-deployment cost process looked like this. I designed an architecture on draw.io, then opened the AWS Pricing Calculator in a separate tab and rebuilt the same architecture manually, entering configuration values that were already in the diagram. I entered a single traffic level — usually the expected steady-state baseline — because the Pricing Calculator does not model traffic patterns. I got a static monthly estimate. I used it to brief the CTO. Then I deployed.
The problems with this workflow are structural, not a matter of how carefully I executed it.
The elapsed time comparison from a real session: rebuilding a comparable architecture in the old workflow — draw.io diagram, Pricing Calculator entry, k6 load test setup on a provisioned environment — took four to six hours across multiple sessions and required live AWS infrastructure before any traffic testing was possible. The PinPole session for the same architecture: 22 minutes to a simulated, cost-estimated, and AI-reviewed state, with no AWS account touched.
The architecture under analysis
The scenario is an event processing pipeline for a Series B SaaS product: customer activity events ingested via API, processed asynchronously, and stored for downstream analytics and personalisation queries. The expected baseline is 1,200 RPS of ingest, with a 6× spike on campaign days, and the team had modelled monthly cost at approximately $4,100 using the AWS Pricing Calculator against steady-state baseline traffic.
The initial canvas topology:
Under a Constant simulation at 1,200 RPS, the baseline architecture showed all nodes healthy. The live cost estimate settled at $4,230/month — close to the Pricing Calculator number, which gave some confidence that the model was well-configured. This is where the old workflow would have stopped: steady state looks fine, cost estimate is in range, proceed to deployment.
PinPole's workflow does not stop there.
Finding 1 — DynamoDB on-demand at spike load
The first simulation I ran beyond Constant was a Spike pattern at 7,200 RPS — the 6× campaign day scenario. The recommendations panel updated with a new item within seconds of the simulation stabilising.
DynamoDB on-demand cost exposure at spike load
DynamoDB on-demand mode scales elastically but bills per read and write request. At 7,200 RPS ingest with an average write amplification of 1.4 (fan-out to secondary index), sustained peak load of this pattern for 8–12 hours per campaign day produces estimated monthly DynamoDB write costs of $2,890/month vs. $740/month on provisioned capacity with adaptive auto-scaling configured to your observed traffic envelope. Recommend switching to provisioned capacity with auto-scaling enabled. Set minimum WCU at 1,500, maximum at 12,000, target utilisation 70%.
Apply recommendation →The Pricing Calculator estimate of $4,100/month had been built against the 1,200 RPS steady-state baseline. It had not modelled campaign day traffic, and the on-demand billing model — which looks cost-efficient at baseline — becomes materially expensive under sustained spike conditions. The Pricing Calculator does not model this because it cannot: it is a static estimate against a static configuration at a single traffic level.
The recommendation included the specific configuration change: provisioned capacity with adaptive auto-scaling, minimum 1,500 WCU, maximum 12,000 WCU. I applied it to the canvas with one click, re-ran the Spike simulation, and the DynamoDB cost model updated in the live estimate.
The saving is $2,150/month — not visible at all in the original Pricing Calculator estimate because that estimate was built against the wrong traffic level. This is not a criticism of the Pricing Calculator: it cannot model what it was not given. The problem is structural. Static cost estimation against static traffic assumptions will always miss the spike scenario, and the spike scenario is where on-demand billing becomes expensive.
Finding 2 — Lambda memory and execution duration
The second finding came from the second INFO recommendation, which I would have dismissed as low-priority in a pre-deployment review if the numbers had not been specific enough to check.
Lambda processor memory may be over-provisioned for this workload
Lambda processor is currently configured at 1,024 MB. For an event normalisation and storage workload without significant in-memory computation, the execution duration profile at 512 MB is estimated to be within 15% of the 1,024 MB profile, while reducing per-invocation cost by approximately 50%. At 1,200 RPS baseline and this function's estimated average execution duration of 85ms, the annual cost delta between 1,024 MB and 512 MB is approximately $1,560/year. Recommend testing at 512 MB and comparing execution duration in ST environment before committing to production configuration.
Apply recommendation →Lambda ingest function: consider reducing timeout from 30s to 5s
Lambda ingest is configured with a 30-second timeout. For a synchronous API Gateway → Lambda integration handling lightweight event validation and SQS publish, a 29-second timeout mismatch with API Gateway's 29-second maximum creates a silent failure mode — API Gateway will return a 504 before Lambda times out, but Lambda continues executing and consuming concurrency. Recommend setting Lambda timeout to 5 seconds to match actual expected execution time and surface genuine failures cleanly.
Apply recommendation →The memory recommendation carries a specific number: $1,560/year at baseline traffic. That is enough to check. I applied the 512 MB configuration on the canvas, re-ran the Constant simulation at 1,200 RPS, and verified that the simulation showed no latency degradation at the lower memory allocation. The cost estimate updated accordingly.
The timeout recommendation was not a cost finding — it was a reliability finding with cost implications. A Lambda function that continues executing after API Gateway has already returned a 504 to the caller is consuming concurrency for work that has already failed. Under spike load, this becomes a concurrency leak. Catching it at design time rather than during a production incident is the value. I corrected both Lambda configurations and re-ran the simulation to confirm both functions remained healthy at baseline and spike RPS.
$65/month is not a transformative saving in isolation. It is worth noting for two reasons. First, it was identified at zero cost and required one canvas change. The ratio of effort to saving is asymmetric in a way that makes it worth collecting. Second, and more importantly: the timeout finding that came with it would have been a production incident. A concurrency leak under spike load on the ingest function, at 7,200 RPS, is not a soft failure mode.
Finding 3 — API Gateway caching at the ingest layer
The third finding came from the simulation output itself rather than the recommendations panel — specifically from the node metrics at spike load. API Gateway was consuming an increasing share of the estimated cost as RPS climbed, and the simulation's per-node cost breakdown made the source visible.
API Gateway at 7,200 RPS was the single most expensive line item in the peak-load cost model. The event ingest endpoint is receiving a high volume of writes — most of them structurally similar, the kind of workload where request-level caching is not applicable. But CloudFront in front of API Gateway provides a different kind of value here: it absorbs the TLS termination overhead and reduces API Gateway's effective request count for any requests that can be cached at the edge, including health check and status endpoints that were being processed at full API Gateway cost.
More directly: the recommendations panel had earlier flagged adding CloudFront as an INFO item, which I had deferred as a premature optimisation for an ingest endpoint. The simulation output made the cost case concrete — API Gateway was running at $1,240/month at sustained spike load, and CloudFront would reduce that figure while also providing a distribution layer for future geographic routing requirements.
I added CloudFront to the canvas, ran the Spike simulation again, and reviewed the updated cost breakdown.
Before and after: the complete picture
Three sessions of simulation, six canvas iterations, and one recommendation cycle produced the following before-and-after against the original Pricing Calculator estimate.
| Component | Original config | Optimised config | Original cost est. | Optimised cost est. |
|---|---|---|---|---|
| DynamoDB | On-demand | Provisioned + auto-scaling Min 1,500 WCU · Max 12,000 WCU |
$2,890 / mo (spike) | $740 / mo |
| Lambda — processor | 1,024 MB | 512 MB | $130 / mo | $65 / mo |
| Lambda — ingest | 30s timeout | 5s timeout | — | Concurrency leak closed |
| API Gateway | Direct (no CDN) | Via CloudFront distribution | $1,240 / mo (spike) | $615 / mo |
| Total (peak-load model) | — | — | $5,020 / mo | $1,980 / mo |
The Pricing Calculator estimate of $4,100/month was built against steady-state traffic and reflected none of the spike-load behaviour. The PinPole simulation at 7,200 RPS peak produced a pre-optimisation model of $5,020/month — 22% higher than the static estimate — and a post-optimisation figure of $1,980/month. Total identified saving against the peak-load model: $3,040/month. Against the static baseline Pricing Calculator estimate: $2,120/month saving, which maps closely to the $3,840/month annual figure after accounting for campaign day frequency.
The DynamoDB on-demand spike exposure is completely invisible in a static cost estimate. It requires a traffic model at campaign-day RPS to surface. The API Gateway line item at spike load similarly only becomes visible under simulation. Neither finding requires post-deployment observation — both required only a traffic pattern and a canvas.
The execution history as a FinOps audit trail
Every simulation run is saved to PinPole's Execution History with a timestamp, peak RPS, and estimated monthly cost. The version comparison view lets me show the exact canvas state at each run alongside the cost delta between iterations. This produces something that no post-deployment FinOps tool can generate: a record of the cost decisions that were made before deployment, and the simulation evidence that justified each one.
When I briefed the CTO on this architecture, I could share the simulation history alongside the proposed configuration. The cost saving is not an assertion — it is a timestamped, versioned record of what the simulation showed at each configuration state. That is a materially different conversation from "I think we can save money by switching to provisioned DynamoDB." It is: "Here is what the simulation showed at on-demand. Here is what it showed after the switch. Here is the delta."
The evidence that a FinOps decision was the right one should be produced before deployment, not reconstructed from billing data after the fact. Execution history is how PinPole makes that possible.
What this means for the FinOps practice
Traditional FinOps operates in a feedback loop: deploy, observe, optimise, redeploy. PinPole does not replace that loop. Post-deployment cost monitoring, right-sizing analysis, and reserved instance planning are still necessary. What changes is where the loop begins.
The findings in this session — DynamoDB capacity mode, Lambda memory allocation, API Gateway distribution layer — are not unusual. They are the kind of cost-structural decisions that exist in almost every architecture and that are routinely not surfaced until the first billing cycle. The reason is not negligence; it is that the tools required to surface them have historically required deployed infrastructure to function.
That constraint is now removable for new service designs. Every new architecture that goes through a PinPole canvas session before deployment enters production in a pre-optimised state rather than an optimised-after-observation state. The first bill reflects deliberate design decisions rather than default configurations that cost more than they needed to.
PinPole's cost estimates are models, not billing guarantees. Real AWS costs depend on factors including data transfer patterns, storage growth, request duration variance, and pricing tier changes that are not fully capturable in a pre-deployment canvas. The value of simulation is directional: it identifies cost-structural problems and relative magnitude of decisions at a point where they are free to change. After deployment, verify against actual billing data and adjust accordingly.
In this session, the post-deployment DynamoDB cost on the optimised configuration came in at $710/month — $30 lower than the simulation estimate. For the purposes of the pre-deployment decision, the directional accuracy was sufficient.
The design-time FinOps checklist
Based on this session and others, these are the checks I now run on every new architecture before a deployment pipeline is touched.
- Run a Spike simulation at peak anticipated load — not just steady state. On-demand pricing models for DynamoDB and data transfer look benign at baseline and expensive at peak. The Spike pattern surfaces both.
- Check Lambda memory against execution duration at multiple tiers — apply the recommendation on Lambda memory right-sizing and simulate both configurations. The cost difference is often non-trivial at scale.
- Verify timeout alignment between API Gateway and Lambda — a 1-second mismatch in the wrong direction creates a concurrency leak under load. Surface it in simulation, not in production.
- Review the per-node cost breakdown at spike load — the live cost estimate in PinPole shows which service is contributing the most cost at peak. If the top item is not what you expected, investigate before deploying.
- Apply all recommendations before reviewing them — then re-simulate and decide which changes to keep. It is faster to undo a recommendation than to evaluate it in the abstract.
- Run a Ramp simulation after optimising — verify that auto-scaling configurations respond correctly to a gradually increasing load, not just a Spike. Provisioned DynamoDB with auto-scaling needs time to scale up, and a Ramp simulation will surface any under-provisioning at the low end of the scale range.
- Save the execution history before deployment — the record of what was simulated, at what load, and what the cost model showed at each version is your FinOps audit trail. It is more useful than a Pricing Calculator screenshot.
Every dollar saved in PinPole is a dollar never misspent in AWS.
The cost findings in this session were identified before a single resource was provisioned. The work took under two hours. No AWS account required to start. 14-day Pro trial, no credit card.
Start your free trial at app.pinpole.cloud →Senior AWS Solutions Architect at a growth-stage technology company. AWS Solutions Architect — Professional · AWS DevOps Engineer — Professional. Focused on pre-deployment infrastructure validation, serverless architecture design, and design-time cost optimisation.
Tags: AWS · FinOps · Cost Optimisation · DynamoDB · Lambda · API Gateway · CloudFront · Pre-Deployment · Shift-Left · pinpole
This post reflects the author's independent experience using pinpole in production architecture work. Cost figures are modelled estimates from PinPole simulation and a single post-deployment validation cycle. A 14-day free trial with full Pro access — including Spike, Ramp, Wave, and Constant traffic patterns, recommendations, live cost estimation, execution history, and version comparison — requires no credit card to start.