Skip to content

The New LOC: Why Token Over-Consumption is Not an AI Strategy

The New LOC: Why Token Over-Consumption is Not an AI Strategy

Some companies are repeating the software mistakes of the 1990s. Tracking lines of code (LOC) led to bloated, unmaintainable software. Today, tracking total token volume or API requests is creating bloated cloud bills and fragile AI systems.

“token consumption, prompt volume or adoption rates as if those metrics automatically translate into business value”
— Simon Wardley

My Re-post

The Core Problem: Adoption vs. Consumption

Too many organisations confuse AI adoption with AI consumption.

  • Theater: Encouraging teams to maximise daily API prompts.
  • Intelligence: Re-architecting workflows so a simple script, regex, or business rule handles 80% of the problem for fractions of a cent.

The human element in AI ROI cannot be ignored. Re-designing effective AI/Human workflows is as much a behavioural challenge as a technical one.

The Enterprise “Token Taxes” Driving Up OpEx

When AI adoption is treated as a volume game, organisations introduce massive architectural inefficiencies. In traditional cloud-native design we optimise for network payloads, database connections, and compute. In AI engineering we must optimise for context efficiency.

Without strict technical governance, early enterprise AI systems incur three hidden “token taxes”:

1. The Model Context Protocol (MCP) Over-Fetch Tax
MCP is powerful for connecting external data to LLMs, but raw integrations are unscalable. Agents often brute-force discovery — dumping entire database schemas, unpaginated payloads, and dozens of tool definitions into the context window on every turn.

Fix: Implement a data abstraction and orchestration layer. Pre-paginate enterprise APIs and dynamically inject tool schemas based on user intent rather than loading everything globally.

2. Unbounded Retrieval-Augmented Generation (RAG) Pipelines
Many RAG systems shine in demos but collapse under real enterprise load. Large chunks of semi-relevant documents force the model to process thousands of irrelevant tokens.

Fix: Move beyond naive vector search. Adopt GraphRAG architectures by mapping corporate data into a Knowledge Graph. Structured entities + explicit relationships deliver precise context, slash token burn, reduce hallucinations, and improve predictability. (This is an area I’ve helped organisations with in complex data environments.)

3. Unbounded Agentic Execution Loops
Autonomous agents are powerful but dangerous without guardrails. An unmonitored agent hitting edge cases can enter recursive loops, burning rate limits and generating huge bills in hours.

Fix: Enforce deterministic boundaries — maximum loop counters, circuit breakers on token velocity spikes, and mandatory human-in-the-loop for ambiguous/low-confidence paths.

The Strategic Matrix: From Hype to Governance

Architectural PillarHype Approach (Naive Vector RAG)Advanced Approach (GraphRAG / Knowledge Graphs)Managed Approach (Context Tiering)
Context Payload EfficiencyPoor — large unrefined chunksExcellent — precise entity-relationship websHigh — pre-processing + prompt caching + ceilings
Cost PredictabilityUnpredictable OpExHigh upfront, very low per-queryBalanced — route simple tasks to lighter models
Data Relationship HandlingWeakSuperiorDeterministic via rules & caches
Primary System ValueFast basic searchReduces hallucinations in complex domainsProtects budgets with guardrails

The Executive Playbook: Moving from AI Theater to AI Governance

Boards and technology leaders must shift from consumption metrics to architectural governance.

Before approving your next AI budget, check these four boxes:

  1. Define Value Delivery Over Consumption Volume
    Stop tracking prompt counts. Measure real outcomes: support resolution time, deployment velocity, error reduction. Use frameworks like The Decision Lab’s AI Adoption Diagnostic.

  2. Mandate an Infrastructure Pre-Processing Layer
    Never connect raw enterprise apps/databases directly to frontier models. Implement context-tiering: route simple tasks to smaller/open-source models, reserve expensive ones for complex reasoning.

  3. Transition Data Silos into Knowledge Graphs
    Replace bloated RAG with GraphRAG to cut token waste and improve accuracy on complex corporate data.

  4. Enforce Strict Deterministic Guardrails
    Add circuit breakers, loop limits, and human oversight to prevent runaway costs.