Skip to main content

Command Palette

Search for a command to run...

The Architecture That Got You to Series B Will Not Get You to Series C

What AWS's own research says about the systems that fail scaling teams and why the signals appear long before the crisis does

Published
6 min read
The Architecture That Got You to Series B Will Not Get You to Series C

AWS's Well-Architected Framework makes an observation that doesn't get enough attention outside of technical architecture circles.

Most system failures at scale are not caused by bad engineering. They are caused by good engineering applied to requirements that no longer exist. The system was built correctly for the stage the business was at when it was designed. The business moved. The architecture didn't.

This is not a niche problem. It is one of the most documented patterns in cloud infrastructure. The DORA State of DevOps report consistently identifies architectural constraints specifically, tightly coupled systems and unclear service ownership as among the strongest predictors of declining engineering performance as organisations scale. Not tooling. Not headcount. Architecture.

Understanding why this happens, and what the early signals look like, is what this article is about.

What the research says about systems under scaling stress

The AWS Well-Architected Framework defines five pillars that characterise systems built to scale: operational excellence, security, reliability, performance efficiency, and cost optimisation. What's notable about this framework is what sits underneath all five of them the assumption that architectural decisions are revisited as the business evolves, not fixed at the point of initial deployment.

AWS documents this explicitly. Systems reviewed through the Well-Architected Review process where AWS or a certified partner evaluates an architecture against these pillars identify an average of around 30 medium to high risk findings per workload in environments that haven't been reviewed since initial deployment. Not because the original architects were careless. Because the requirements changed and the architecture didn't follow.

The DORA research adds a behavioural dimension to this. Their data shows that elite engineering teams deploy significantly more frequently and recover from incidents significantly faster than low-performing teams and that the primary differentiator is not the skill of the engineers but the looseness of the architectural coupling. Tightly coupled systems, regardless of the quality of the engineers working in them, produce slower deploys, more complex incidents, and higher cognitive load per change.

What this means in practice: an architecture that was appropriately designed for a smaller, simpler product becomes a source of engineering friction as the product grows. The friction is structural. It cannot be resolved by adding engineers or improving processes. It requires architectural change.

The specific patterns that indicate a system is scaling past its architecture

AWS's operational guidance and the Well-Architected Framework identify several consistent indicators that a system is under scaling strain.

Deployment frequency declining despite stable or growing headcount. When adding engineers produces slower rather than faster output, the constraint is almost always architectural typically tight coupling between components that means changes in one place require coordinated changes across many others.

Incident rate increasing without a corresponding increase in system complexity. The AWS reliability pillar identifies unclear failure domains as a primary driver of cascading incidents. Systems that were simple enough to understand holistically at an earlier stage become opaque as they grow, and the failure modes become harder to isolate.

Ownership ambiguity around shared components. As systems scale, components that were originally owned clearly by one team start being depended on by multiple teams. Without explicit architectural boundaries, this creates coordination overhead and change risk that scales faster than the team does.

Cost growing faster than usage. The AWS cost optimisation pillar documents this as a reliable indicator of architectural drift patterns that were efficient at one scale become inefficient at another, and the inefficiency compounds silently until the billing makes it visible.

None of these are threshold events. They are gradual signals. The research consistently shows they appear six to twelve months before the architectural strain produces a significant incident or delivery failure.

Why the Well-Architected Framework recommends continuous review, not point-in-time assessment

The framing most engineering teams use for architectural review is project-based. The architecture gets reviewed when something is being built or when something has gone wrong.

AWS's own recommendation is different. The Well-Architected Framework is explicitly designed for continuous use AWS suggests reviewing workloads against the framework at least annually, and more frequently when significant changes are occurring in the business or the system.

The reasoning behind this is architectural entropy. Systems degrade against the pillars not because of active decisions to compromise them but because the requirements the pillars were designed to meet keep changing. A reliability configuration appropriate for 10,000 users may have significant gaps at 500,000. A cost structure that was efficient at one transaction volume becomes inefficient at another. Security controls that covered the original threat surface don't automatically extend to cover new services and integrations.

Continuous review exists because the gap between what an architecture was designed to do and what it is currently being asked to do opens gradually, not suddenly. Catching it early when the gap is addressed by targeted changes rather than significant rework is consistently cheaper and less disruptive than catching it late.

What the research suggests about the cost of addressing this late

The AWS Well-Architected whitepaper on cost optimisation cites the principle that architectural decisions made without cost and performance modelling typically cost three to five times more to correct after deployment than to address during design. This is not specific to cost the same compounding applies to reliability, security, and operational complexity.

Gartner's research on technical debt reaches a consistent conclusion: organisations that treat architectural review as a continuous discipline rather than a reactive one spend significantly less on infrastructure remediation and experience fewer delivery delays attributable to technical constraint.

The implication for engineering leaders is straightforward. The architectural signals that appear as a system scales past its original design slower deploys, noisier incidents, growing coordination overhead, rising costs are not problems to address individually. They are indicators of a gap between what the architecture was built to do and what the business now requires it to do. Addressing that gap proactively, at the point the signals appear, is what the research consistently identifies as the lower-cost path.

The alternative is waiting for the signals to become a crisis. At which point the work is the same, but the conditions are significantly worse.

AWS Well-Architected reviews are one of the core components of a SyncYourCloud membership, a certified solutions architect reviewing your workloads against the five pillars on a continuous basis, not as a one-off project. From £2,950/month. See the membership tiers →

2 views