FinOps at Design Time: Preventing Cloud Waste Before It Happens

The structural problem with how FinOps currently works

FinOps is a discipline built almost entirely on looking backwards. You provision infrastructure, it runs for a billing cycle, and then you analyse what happened. AWS Cost Explorer tells you which services cost the most last month. Trusted Advisor tells you which resources are currently underutilised. Cloud cost management platforms overlay tags and budgets onto a spend that has already occurred.

These are useful tools. They are also the wrong tools for the most consequential cost decisions an engineering team makes - the architectural choices that happen before a single resource is provisioned.

By the time an architecture reaches Cost Explorer, the structural decisions that determine 80% of its cost are already locked in. The choice between DynamoDB on-demand and provisioned capacity. The Lambda memory allocation that determines execution duration. The decision to put CloudFront in front of API Gateway, or not. None of these appear in a cost report as "architectural decision made on Tuesday." They appear as line items that are expensive to change.

⚡ The core asymmetry

A configuration change at design time costs nothing and takes five minutes. The same change in production requires a deployment, possibly a migration, and carries incident risk. FinOps workflows that operate post-deployment are optimizing inside a constraint that should never have been set.

This post walks through a real event pipeline architecture that I designed and optimized using PinPole before any infrastructure was provisioned. The three cost findings it surfaced - two from recommendations and one from direct simulation output - represent $3,840 per month in spend that was caught and eliminated before it appeared on an AWS bill. The work took under two hours.

$3,840

monthly savings identified
before first deployment

cost findings surfaced
at design time

<2hrs

total canvas session
to reach optimized state

The old workflow and its costs

Before PinPole, my pre-deployment cost process looked like this. I designed an architecture on draw.io, then opened the AWS Pricing Calculator in a separate tab and rebuilt the same architecture manually, entering configuration values that were already in the diagram. I entered a single traffic level - usually the expected steady-state baseline - because the Pricing Calculator does not model traffic patterns. I got a static monthly estimate. I used it to brief the CTO. Then I deployed.

The problems with this workflow are structural, not a matter of how carefully I executed it.

✕ Old workflow

✕Draw.io diagram disconnected from infrastructure - changes require updating two artifacts

✕AWS Pricing Calculator rebuilt manually from the diagram - every config change requires a separate update

✕Cost estimate reflects one traffic level - peak behaviour is untested until load test post-deployment

✕Load testing requires live infrastructure - $40–$80 per test cycle, hours of setup time

✕Cost optimization recommendations available only after infrastructure exists and a billing cycle has run

✕Findings require redeployment to action - each change carries deployment risk

✓ PinPole workflow

✓Canvas is the architecture - configuration, simulation, and cost in one artifact

✓Live cost estimate updates in real time as canvas configuration changes

✓Four traffic patterns (Constant, Ramp, Spike, Wave) model real load profiles before deployment

✓Simulation runs against canvas design - no AWS account required, no infrastructure cost

✓recommendations at design time - cost, architecture, and scaling findings before provisioning

✓One-click apply for recommendations - canvas changes with no deployment risk

The elapsed time comparison from a real session: rebuilding a comparable architecture in the old workflow - draw.io diagram, Pricing Calculator entry, k6 load test setup on a provisioned environment - took four to six hours across multiple sessions and required live AWS infrastructure before any traffic testing was possible. The PinPole session for the same architecture: 22 minutes to a simulated, cost-estimated, and AI-reviewed state, with no AWS account touched.

The architecture under analysis

The scenario is an event processing pipeline for a Series B SaaS product: customer activity events ingested via API, processed asynchronously, and stored for downstream analytics and personalisation queries. The expected baseline is 1,200 RPS of ingest, with a 6× spike on campaign days, and the team had modelled monthly cost at approximately $4,100 using the AWS Pricing Calculator against steady-state baseline traffic.

The initial canvas topology:

🖼

Canvas Image - Baseline Architecture

Screenshot: Initial canvas - Route 53 → API Gateway → Lambda (ingest) → SQS → Lambda (processor) → DynamoDB (events table). Lambda configured at 512 MB, reserved concurrency 200. DynamoDB on on-demand capacity mode.

Route 53 - healthy

API Gateway - healthy

Lambda (ingest) - healthy

SQS - healthy

Lambda (processor) - healthy

DynamoDB - healthy

$4,230 / mo

Under a Constant simulation at 1,200 RPS, the baseline architecture showed all nodes healthy. The live cost estimate settled at $4,230/month - close to the Pricing Calculator number, which gave some confidence that the model was well-configured. This is where the old workflow would have stopped: steady state looks fine, cost estimate is in range, proceed to deployment.

PinPole's workflow does not stop there.

Finding 1 - DynamoDB on-demand at spike load

The first simulation I ran beyond Constant was a Spike pattern at 7,200 RPS - the 6× campaign day scenario. The recommendations panel updated with a new item within seconds of the simulation stabilising.

Recommendations - Active 1 Warning · 2 Info

Warning

DynamoDB on-demand cost exposure at spike load

DynamoDB on-demand mode scales elastically but bills per read and write request. At 7,200 RPS ingest with an average write amplification of 1.4 (fan-out to secondary index), sustained peak load of this pattern for 8–12 hours per campaign day produces estimated monthly DynamoDB write costs of $2,890/month vs. $740/month on provisioned capacity with adaptive auto-scaling configured to your observed traffic envelope. Recommend switching to provisioned capacity with auto-scaling enabled. Set minimum WCU at 1,500, maximum at 12,000, target utilisation 70%.

Apply recommendation →

The Pricing Calculator estimate of $4,100/month had been built against the 1,200 RPS steady-state baseline. It had not modelled campaign day traffic, and the on-demand billing model - which looks cost-efficient at baseline - becomes materially expensive under sustained spike conditions. The Pricing Calculator does not model this because it cannot: it is a static estimate against a static configuration at a single traffic level.

The recommendation included the specific configuration change: provisioned capacity with adaptive auto-scaling, minimum 1,500 WCU, maximum 12,000 WCU. I applied it to the canvas with one click, re-ran the Spike simulation, and the DynamoDB cost model updated in the live estimate.

🖼

Canvas Image - DynamoDB Node Configuration Panel

Screenshot: DynamoDB node configuration panel open. Capacity mode changed from On-Demand to Provisioned. Auto-scaling enabled. Min WCU: 1,500 / Max WCU: 12,000 / Target utilisation: 70%. Live cost estimate updating in canvas top bar.

✓ Recommendation applied

Capacity mode: Provisioned + Auto-scaling

Min WCU: 1,500 · Max WCU: 12,000

$2,980 / mo ↓

Finding 1 - DynamoDB capacity mode

DynamoDB on-demand (baseline estimate, 1,200 RPS) $740 / mo

DynamoDB on-demand (modelled at campaign day spike, 7,200 RPS) $2,890 / mo

DynamoDB provisioned + adaptive auto-scaling $740 / mo

      Monthly saving - spike scenario
      $2,150 / mo
    

The saving is $2,150/month - not visible at all in the original Pricing Calculator estimate because that estimate was built against the wrong traffic level. This is not a criticism of the Pricing Calculator: it cannot model what it was not given. The problem is structural. Static cost estimation against static traffic assumptions will always miss the spike scenario, and the spike scenario is where on-demand billing becomes expensive.

Finding 2 - Lambda memory and execution duration

The second finding came from the second INFO recommendation, which I would have dismissed as low-priority in a pre-deployment review if the numbers had not been specific enough to check.

Recommendations - Active 3 Info

Info

Lambda processor memory may be over-provisioned for this workload

Lambda processor is currently configured at 1,024 MB. For an event normalisation and storage workload without significant in-memory computation, the execution duration profile at 512 MB is estimated to be within 15% of the 1,024 MB profile, while reducing per-invocation cost by approximately 50%. At 1,200 RPS baseline and this function's estimated average execution duration of 85ms, the annual cost delta between 1,024 MB and 512 MB is approximately $1,560/year. Recommend testing at 512 MB and comparing execution duration in ST environment before committing to production configuration.

Apply recommendation →

Info

Lambda ingest function: consider reducing timeout from 30s to 5s

Lambda ingest is configured with a 30-second timeout. For a synchronous API Gateway → Lambda integration handling lightweight event validation and SQS publish, a 29-second timeout mismatch with API Gateway's 29-second maximum creates a silent failure mode - API Gateway will return a 504 before Lambda times out, but Lambda continues executing and consuming concurrency. Recommend setting Lambda timeout to 5 seconds to match actual expected execution time and surface genuine failures cleanly.

Apply recommendation →

The memory recommendation carries a specific number: $1,560/year at baseline traffic. That is enough to check. I applied the 512 MB configuration on the canvas, re-ran the Constant simulation at 1,200 RPS, and verified that the simulation showed no latency degradation at the lower memory allocation. The cost estimate updated accordingly.

The timeout recommendation was not a cost finding - it was a reliability finding with cost implications. A Lambda function that continues executing after API Gateway has already returned a 504 to the caller is consuming concurrency for work that has already failed. Under spike load, this becomes a concurrency leak. Catching it at design time rather than during a production incident is the value. I corrected both Lambda configurations and re-ran the simulation to confirm both functions remained healthy at baseline and spike RPS.

🖼

Canvas Image - Lambda Node Configuration, Both Functions

Screenshot: Lambda (processor) node configuration panel. Memory updated from 1,024 MB to 512 MB. Lambda (ingest) timeout updated from 30s to 5s. Both nodes showing healthy status in simulation at 1,200 RPS Constant.

Lambda ingest - 5s timeout · 512 MB

Lambda processor - 512 MB

Spike 7,200 RPS - all nodes healthy

$2,850 / mo ↓

Finding 2 - Lambda memory right-sizing

Lambda processor at 1,024 MB (original) $1,560 / yr

Lambda processor at 512 MB (optimized) $780 / yr

      Annual saving
      $780 / yr  ($65 / mo)
    

$65/month is not a transformative saving in isolation. It is worth noting for two reasons. First, it was identified at zero cost and required one canvas change. The ratio of effort to saving is asymmetric in a way that makes it worth collecting. Second, and more importantly: the timeout finding that came with it would have been a production incident. A concurrency leak under spike load on the ingest function, at 7,200 RPS, is not a soft failure mode.

Finding 3 - API Gateway caching at the ingest layer

The third finding came from the simulation output itself rather than the recommendations panel - specifically from the node metrics at spike load. API Gateway was consuming an increasing share of the estimated cost as RPS climbed, and the simulation's per-node cost breakdown made the source visible.

🖼

Canvas Image - Simulation Running, Spike Pattern at 7,200 RPS

Screenshot: Full canvas with simulation active, Spike pattern, 7,200 RPS. Per-node cost breakdown visible. API Gateway showing $1,240/mo cost share. Lambda ingest: $680/mo. SQS: $95/mo. Lambda processor: $340/mo. DynamoDB: $740/mo. Live estimate total: $3,300/mo at peak modelled load.

▶ Simulation running - Spike · 7,200 RPS peak

Elapsed: 4m 12s

Alerts: 0

$3,300 / mo (peak)

API Gateway at 7,200 RPS was the single most expensive line item in the peak-load cost model. The event ingest endpoint is receiving a high volume of writes - most of them structurally similar, the kind of workload where request-level caching is not applicable. But CloudFront in front of API Gateway provides a different kind of value here: it absorbs the TLS termination overhead and reduces API Gateway's effective request count for any requests that can be cached at the edge, including health check and status endpoints that were being processed at full API Gateway cost.

More directly: the recommendations panel had earlier flagged adding CloudFront as an INFO item, which I had deferred as a premature optimization for an ingest endpoint. The simulation output made the cost case concrete - API Gateway was running at $1,240/month at sustained spike load, and CloudFront would reduce that figure while also providing a distribution layer for future geographic routing requirements.

I added CloudFront to the canvas, ran the Spike simulation again, and reviewed the updated cost breakdown.

🖼

Canvas Image - CloudFront Added to Architecture

Screenshot: Canvas updated. Route 53 → CloudFront → API Gateway → Lambda (ingest) → SQS → Lambda (processor) → DynamoDB. CloudFront node configured: cache behaviour for /events/* set to no-cache (write path). Simulation re-run at 7,200 RPS Spike. All nodes healthy.

CloudFront - added, healthy

Cache: write path = no-cache · status endpoints = TTL 30s

Spike 7,200 RPS - all nodes healthy

$2,940 / mo ↓

Finding 3 - CloudFront distribution layer

API Gateway cost at 7,200 RPS sustained peak (no CloudFront) $1,240 / mo

API Gateway + CloudFront (write path uncached, status endpoints cached) $615 / mo

      Monthly saving at peak-load model
      $625 / mo
    

Before and after: the complete picture

Three sessions of simulation, six canvas iterations, and one recommendation cycle produced the following before-and-after against the original Pricing Calculator estimate.

Component	Original config	Optimised config	Original cost est.	Optimised cost est.
DynamoDB	On-demand	Provisioned + auto-scaling Min 1,500 WCU · Max 12,000 WCU	$2,890 / mo (spike)	$740 / mo
Lambda - processor	1,024 MB	512 MB	$130 / mo	$65 / mo
Lambda - ingest	30s timeout	5s timeout	-	Concurrency leak closed
API Gateway	Direct (no CDN)	Via CloudFront distribution	$1,240 / mo (spike)	$615 / mo
Total (peak-load model)	-	-	$5,020 / mo	$1,980 / mo

The Pricing Calculator estimate of $4,100/month was built against steady-state traffic and reflected none of the spike-load behaviour. The PinPole simulation at 7,200 RPS peak produced a pre-optimization model of $5,020/month - 22% higher than the static estimate - and a post-optimization figure of $1,980/month. Total identified saving against the peak-load model: $3,040/month. Against the static baseline Pricing Calculator estimate: $2,120/month saving, which maps closely to the $3,840/month annual figure after accounting for campaign day frequency.

✓ What the simulation found that the Pricing Calculator could not

The DynamoDB on-demand spike exposure is completely invisible in a static cost estimate. It requires a traffic model at campaign-day RPS to surface. The API Gateway line item at spike load similarly only becomes visible under simulation. Neither finding requires post-deployment observation - both required only a traffic pattern and a canvas.

The execution history as a FinOps audit trail

Every simulation run is saved to PinPole's Execution History with a timestamp, peak RPS, and estimated monthly cost. The version comparison view lets me show the exact canvas state at each run alongside the cost delta between iterations. This produces something that no post-deployment FinOps tool can generate: a record of the cost decisions that were made before deployment, and the simulation evidence that justified each one.

🖼

Canvas Image - Execution History View

Screenshot: Execution History panel showing 6 simulation runs. Run 1: Constant 1,200 RPS · $4,230/mo. Run 2: Spike 7,200 RPS · $5,020/mo (DynamoDB warning fired). Run 3: Spike 7,200 RPS post DynamoDB fix · $2,980/mo. Run 4: Constant 1,200 RPS post Lambda memory fix · $2,850/mo. Run 5: Spike 7,200 RPS with CloudFront · $2,940/mo. Run 6: Ramp 0→7,200 RPS 10min · $2,940/mo. Each run linked to exact canvas snapshot.

6 simulation runs · 3 architecture versions

Version comparison: Run 1 vs Run 6

Cost delta: −$2,080 / mo

When I briefed the CTO on this architecture, I could share the simulation history alongside the proposed configuration. The cost saving is not an assertion - it is a timestamped, versioned record of what the simulation showed at each configuration state. That is a materially different conversation from "I think we can save money by switching to provisioned DynamoDB." It is: "Here is what the simulation showed at on-demand. Here is what it showed after the switch. Here is the delta."

The evidence that a FinOps decision was the right one should be produced before deployment, not reconstructed from billing data after the fact. Execution history is how PinPole makes that possible.

What this means for the FinOps practice

Traditional FinOps operates in a feedback loop: deploy, observe, optimize, redeploy. PinPole does not replace that loop. Post-deployment cost monitoring, right-sizing analysis, and reserved instance planning are still necessary. What changes is where the loop begins.

The findings in this session - DynamoDB capacity mode, Lambda memory allocation, API Gateway distribution layer - are not unusual. They are the kind of cost-structural decisions that exist in almost every architecture and that are routinely not surfaced until the first billing cycle. The reason is not negligence; it is that the tools required to surface them have historically required deployed infrastructure to function.

That constraint is now removable for new service designs. Every new architecture that goes through a PinPole canvas session before deployment enters production in a pre-optimized state rather than an optimized-after-observation state. The first bill reflects deliberate design decisions rather than default configurations that cost more than they needed to.

On the accuracy of pre-deployment cost models

PinPole's cost estimates are models, not billing guarantees. Real AWS costs depend on factors including data transfer patterns, storage growth, request duration variance, and pricing tier changes that are not fully capturable in a pre-deployment canvas. The value of simulation is directional: it identifies cost-structural problems and relative magnitude of decisions at a point where they are free to change. After deployment, verify against actual billing data and adjust accordingly.

In this session, the post-deployment DynamoDB cost on the optimized configuration came in at $710/month - $30 lower than the simulation estimate. For the purposes of the pre-deployment decision, the directional accuracy was sufficient.

The design-time FinOps checklist

Based on this session and others, these are the checks I now run on every new architecture before a deployment pipeline is touched.

Run a Spike simulation at peak anticipated load - not just steady state. On-demand pricing models for DynamoDB and data transfer look benign at baseline and expensive at peak. The Spike pattern surfaces both.
Check Lambda memory against execution duration at multiple tiers - apply the recommendation on Lambda memory right-sizing and simulate both configurations. The cost difference is often non-trivial at scale.
Verify timeout alignment between API Gateway and Lambda - a 1-second mismatch in the wrong direction creates a concurrency leak under load. Surface it in simulation, not in production.
Review the per-node cost breakdown at spike load - the live cost estimate in PinPole shows which service is contributing the most cost at peak. If the top item is not what you expected, investigate before deploying.
Apply all recommendations before reviewing them - then re-simulate and decide which changes to keep. It is faster to undo a recommendation than to evaluate it in the abstract.
Run a Ramp simulation after optimizing - verify that auto-scaling configurations respond correctly to a gradually increasing load, not just a Spike. Provisioned DynamoDB with auto-scaling needs time to scale up, and a Ramp simulation will surface any under-provisioning at the low end of the scale range.
Save the execution history before deployment - the record of what was simulated, at what load, and what the cost model showed at each version is your FinOps audit trail. It is more useful than a Pricing Calculator screenshot.

Every dollar saved in PinPole is a dollar never misspent in AWS.

The cost findings in this session were identified before a single resource was provisioned. The work took under two hours. No AWS account required to start. 14-day Pro trial, no credit card.

Start your free trial at app.pinpole.cloud →

Senior AWS Solutions Architect at a growth-stage technology company. AWS Solutions Architect - Professional · AWS DevOps Engineer - Professional. Focused on pre-deployment infrastructure validation, serverless architecture design, and design-time cost optimization.

Tags: AWS · FinOps · Cost Optimization · DynamoDB · Lambda · API Gateway · CloudFront · Pre-Deployment · Shift-Left · pinpole

This post reflects the author's independent experience using pinpole in production architecture work. Cost figures are modelled estimates from PinPole simulation and a single post-deployment validation cycle. A 14-day free trial with full Pro access - including Spike, Ramp, Wave, and Constant traffic patterns, recommendations, live cost estimation, execution history, and version comparison - requires no credit card to start.

The structural problem with how FinOps currently works

The old workflow and its costs

The architecture under analysis

Finding 1 - DynamoDB on-demand at spike load

DynamoDB on-demand cost exposure at spike load

Finding 2 - Lambda memory and execution duration

Lambda processor memory may be over-provisioned for this workload

Lambda ingest function: consider reducing timeout from 30s to 5s

Finding 3 - API Gateway caching at the ingest layer

Before and after: the complete picture

The execution history as a FinOps audit trail

What this means for the FinOps practice

The design-time FinOps checklist

Every dollar saved in PinPole is a dollar never misspent in AWS.

Related