Recently, I ran an experiment to understand this better. I prepared an agentic AI  to build the same marketplace system three times. Each version had the same functional scope. The only thing that changed was the architecture. 

The results were consistent enough to make one thing clear:

AI needs much more architecture than a software developer. 
It reveals wrong decisions earlier, and the impact is far more painful for the project. 

The setup (briefly)

The experiment was intentionally simple.

I took a real marketplace concept and broke it down into more than 150 user stories. Each was run by an AI agent without manual corrections. The scope, requirements, and tooling stayed the same across all runs.

The only difference was structure:

  • A modular monolith
  • A classic n-layer architecture
  • A flat project with no defined architecture

I expected small differences. Instead, each structure produced clearly different outcomes.

Speed is not the same as progress

The least structured version was the fastest and cheapest to generate. The most structured version took longer.

That part was predictable.

What mattered more was functional correctness. When I checked how much of the core business flow actually worked, the picture changed. The n-layer version passed the most functional tests. The flat version passed the fewest. The modular monolith landed in between.

In other words, speed alone did not translate into business value.

AI is very good at producing code quickly. It is far less reliable at producing coherent systems unless boundaries are explicit.

Modular MonolithN-LayerNo Architecture
Effort72h 18min51h 32min43h 30min
Cost PR$3.93$3.17$3.22
Cost Github$34.70$24.74$20.88

Structure directly affects risk

I also looked at code quality and security signals using static analysis tools. The exact numbers matter less than the pattern they revealed.  For analysis of code quality, I used Ndepend, and for security vulnerabilities, Snyk was used.

Structured architectures consistently produced fewer critical issues. The flat structure (no architecture) had more severe problems.

On the surface, all versions looked acceptable. Under inspection, their risk profiles were very different.

This is an important distinction for technology leaders. AI can generate code that will be clean. Risk tends to hide in complexity, integration points, and implicit assumptions.

When architectural rules are missing, AI tends to create longer, more complex methods that are difficult to understand. It does not feel the cognitive cost of complexity. Human teams do.

Modular MonolithN-LayerNo Architecture
Effort72h 18min51h 32min43h 30min
Cost$38.63$27.91$24.10
Security3M4M6h 1m
Code Quality13.313.914.6
Code Complexity23.527.752.1
Business Value53%78%48%

Across all architectures, most failures came from the same place: integrations.

Integration with payments and email providers were frequent points of failure. Database migrations also failed regularly. Routing was another complicated pain point for the AI. URLs existed in the code but were not working correctly. Migrations conflicted. External systems required configuration AI could not infer.

This reinforces a practical reality:

AI writes code. It does not own the environment.

Systems rarely fail in the middle. They fail at the edges.

Architecture becomes a control surface

The most important lesson from this experiment is not about which architecture “won”.

It is about what architecture represents when AI is involved.

When humans build systems, architecture helps teams understand and coordinate.
When AI builds systems, architecture becomes a set of executable constraints.

It defines:

  • What are the boundaries of the project 
  • What are architecture decisions, what can be added or not
  • Where it is explicitly limited 
  • What quality is expected
  • What integrations are expected 

Without those constraints, AI still produces output. The outcomes are simply less predictable.

What this means for tech leaders

AI changes where value sits in software development.

The architecture you choose for AI-assisted development matters. Proper communication of decisions is equally important. To deliver high-quality code, you need an engineering approach and measurable quality using proven metrics. 

AI speeds up code generation, but without proper preparation, it just as quickly accelerates chaos. 

If you want a more technical breakdown of the experiment, including detailed metrics and tooling, I’ve shared a deeper analysis in a separate post on LinkedIn. 

Rate this post