Modularity and tokens: when architecture also pays the AI bill 💸

Code was never just for machines 🤖
For years we repeated one almost sacred sentence: code should be written for humans, not for the compiler. The compiler is, after all, the most tolerant reader on the team: it does not complain about style, it does not ask why one function has three responsibilities, and it never suggests you should step outside after the third nested if.
Today, though, there is a new reader at the table: the AI agent. And this reader has one special trait, almost brutally honest: it counts tokens. Every request, every loaded file, every extra line becomes a concrete metric. Not an aesthetic debate, but cost, response time, and output quality.
Code is still a human language, but now it also lives inside a context economy.
Why modularity was born: reducing human complexity 🧠
Modularity did not emerge because engineers enjoy sorting folders like kitchen cutlery. It emerged to reduce cognitive load. When you separate responsibilities, you make one practical statement: no one should have to understand everything, all at once, all the time.
A module is a promise: within these boundaries, this thing happens, with these rules, without theatrical surprises. That promise is first for us, when we return to a project after two months and wonder who wrote that section. Usually, we did. Different topic.
Good modularity reduces noise, exposes dependencies, and breaks problems into manageable units. In other words, it turns hostile territory into a readable map.
Single responsibility as a clarity tool 🎯
The Single Responsibility Principle is not a conference slogan for purists. It is an operational survival rule: if a component does one thing, its mental surface shrinks, with fewer concepts to hold, fewer hidden dependencies, and fewer domino effects when you touch one line.
When one function validates input, calls external services, formats output, and, in its free time, makes coffee, that is not a component. It is a Russian novel in one page.
In practice, SRP is respect for whoever will read, change, and maintain that code, including your future self, who deserves at least basic compassion.
The new constraint: scarce context, real cost 🧾
This is where the game truly changes. With LLMs, constraints arrive that we used to ignore:
- context windows are limited;
- every token has an economic and computational cost;
- more context means higher latency;
- irrelevant context lowers answer quality.
In the past, informational noise was mainly a human cognitive problem. Today, it is also a direct, measurable, repeatable cost. To avoid missing anything, we often hand the agent the whole repository: it looks prudent, but it is usually elegant self-sabotage. If you load 20 files when 3 were enough, you are not only confusing the agent. You are paying to confuse it.
Confusion is no longer just an architecture smell. It is a line item.
Too much indiscriminate context usually creates four effects:
- higher token consumption;
- higher chance of wrong focus;
- longer inference time;
- worse signal-to-noise ratio.
A practical metaphor: answering a precise question by bringing the full company archive into the meeting. Technically complete, operationally absurd.
Context is not free. It is a consumable resource.
Modularity as context (and token) compression 🗜️
When a codebase is truly modular, something powerful happens: you do not need to tell everything to get a good answer. You tell only the right slice. Modularity becomes semantic compression.
That gives three direct advantages:
- less context per operation;
- fewer files to read;
- lower token usage without losing precision.
A well-designed module explains itself in its domain: clear name, explicit dependencies, single responsibility. That makes retrieval more precise, cuts unnecessary reads, and gets useful answers faster.
Refactoring does not only reduce complexity. It reduces the token bill.
If you already worked through Context engineering: la vera magia dietro l’AI, you know this pattern: the best context is not the largest one. It is the most relevant.
SRP and modularity as an economic lever ⚖️
Up to this point, none of this sounds revolutionary: clarity, boundaries, low coupling. The shift is that these principles no longer improve only human work. They also change operational cost when AI agents are used at scale.
| Principle | Human benefit | AI benefit |
|---|---|---|
| SRP | Clarity | Fewer tokens |
| Modularity | Understanding | Smaller context |
| Low coupling | Maintenance | More precise retrieval |
This table looks obvious, yet it captures a major shift: principles created for engineering quality now have measurable financial impact. That is not cynicism, but operational reality. If you use agents daily, architecture quality becomes an economic lever as much as a technical one.
The monolith problem: explosion of useless tokens 🧱
A monolith is not automatically evil. But a disordered, opaque, multi-responsibility monolith quickly becomes a waste amplifier, because:
- it forces loading much more code than needed;
- it increases average cost per task;
- it makes retrieval noisy and less reliable;
- it pushes the agent to infer too much from ambiguous signals.
In this sense, the modern monolith can become a token monolith: not just hard to maintain, but expensive to query.
To evaluate when architecture complexity stops being useful and becomes style theater, Meno architettura è meglio (finchè non fa male) remains relevant: simplicity, yes, but with readable boundaries.
AI-aware architectures: from principle to practice 🧭
If context is scarce, software should be designed in an AI-aware way. This is not trend chasing. It is architecture aligned with real agent workflows, down to everyday coding habits.
Choices that help immediately:
- small, explicit, semantically isolated APIs;
- self-contained modules with clear interfaces;
- minimal implicit dependencies;
- naming that improves navigation and automated retrieval.
The goal is not abstractly “beautiful” code. The goal is to maximize useful information per consumed token.
To strengthen this approach on prompt and model interaction, Prompt engineering is a strong practical complement.
In day-to-day work, that becomes a set of very concrete design habits:
- short, focused functions;
- explicit naming to reduce exploratory reading;
- files with one dominant responsibility;
- lower grep cost for humans and agents;
- clear boundaries between domain, infrastructure, and integrations.
One simple and powerful practice is introducing a token budget per development task, not as punishment but as a design tool:
- what context is truly necessary for this change?
- which files are essential, and which are merely “maybe useful”?
- can I achieve the same result with half the context?
A more pragmatic move is to avoid stuffing whole files into the input tokens when they may never be used: it is better to list the potentially useful file paths in the prompt and let the agent decide whether it actually needs to open them.
Teams adopting this discipline often see a useful side effect: human reviews improve too, because intentions, boundaries, and impacts become clearer.
A good exercise is to apply the same iterative mindset described in Question-Driven Specification: define boundaries first, expand only when needed.
Conclusion: from readable software to economical software 🧮
For a long time, we wrote code primarily for human readability. That was right, and still is, but today there is a second responsibility: writing code that does not waste context when AI agents read it.
Good architecture is not only elegant. It is also efficient, because it compresses context, reduces noise, and lowers AI operating cost in software work.
In the past, we aimed for understandable code. Today we need understandable and economical code.
The good news is that no new dogma is required. Old principles, applied with modern discipline, are often enough to produce less noise, more clarity, and fewer tokens burned for nothing.
For your next refactor, add one criterion beyond “cleanliness”: how many tokens will this save in future tasks? It may become your most pragmatic metric this year.