Peregian Digital HubAI Adopted

Designing and deploying agents

Designing and deploying agents

What AI agents are, how they differ from chat assistants, what makes them reliable, and where they create bottlenecks in real organisations.

AI agents are software systems that execute tasks within workflows, not assistants that answer questions. The technology is immature but potentially enterprise-reshaping, comparable to cloud computing circa 2007. For organisations deploying them, the constraint has shifted from what models can do to what organisations can build: data infrastructure, access controls, and the expertise to wire agents into existing systems.

Read-write systems, not assistants

Aaron Levie, CEO of Box, describes the shift plainly: agents are "read-write" operations rather than "read-only" assistance. A chat assistant helps a human retrieve or create information; an agent accepts a task, performs work inside the organisation's systems, checks its own results, and reports back. The difference is decisive. Levie places agents at the maturity point cloud computing occupied in 2007: "very early" with potential to reshape enterprise work over a decade.

For a small manufacturing business or a law firm, the implication is concrete: agents could provide access to specialist capacity—legal research, sales support, operations analysis—that only larger competitors could previously afford. Levie argues this will erase the talent advantage size confers, enabling smaller enterprises to access specialist skills that were once the province of large competitors. At the same time, agents could absorb mechanical work—data entry, information extraction, routine documentation—freeing staff to focus on customer interactions and judgment-intensive tasks.

Infrastructure as the constraint

When Levie speaks to CIOs about agents in 2026, the conversation is not about model capability. Instead, he identifies a structural gap: coding agents work well because technical users can fix mid-run failures, outputs are verifiable, and access controls are clean. Knowledge work—the workflows across most SME functions—fails for opposite reasons. Context for those workflows is scattered across 20 systems, access permissions are tangled, and there is no single source of truth. Levie calls this the "Bob and Sally problem": Bob has too much access, Sally has too little, and the agent either bounces off an entitlement wall or answers using data it should not have seen.

Data is the deeper bottleneck. The problem long predates agents: contracts stored in five different places, roadmaps across 30 locations, inconsistent definitions of core metrics. When only a data science team needed those answers, humans compensated through tribal knowledge. When an agent needs to answer the same questions reliably, weak definitions become company-wide problems.

The corollary for agents specifically: there is no shortcut from model capability to stable business processes. Levie notes that both Anthropic and OpenAI have launched enterprise agent initiatives, recognising that the hard work is upgrading IT systems, provisioning agents with accurate context, redesigning workflows around human-agent handoffs, driving adoption, and managing change. This is not one-time setup. Each model upgrade creates fresh work: you either capture the gains of the new model or step backward into scaffolding the prior model required.

Self-improving loops through observation

When agents execute work repeatedly, their logs become a learning signal. The Peregian Digital Hub video on building CLIs for agents documents this pattern: when an agent performs an action, the action can be assessed, learnings can be fed back into the loop, and the system can iteratively improve. This requires structured output from tools—which is why CLIs outperform GUI-based tools for agent automation. A command-line interface returns JSON or tabular data that agents parse directly, saving tokens and reducing hallucination; every execution is logged and can be retrospectively analysed.

The retrospection process is where improvement lives. When a batch of completed tasks is reviewed, patterns surface—a missing database field, a repeated query failure, a systemic misunderstanding—and the agent itself can name the new CLI method the system should build. Tom Blomfield, a General Partner at Y Combinator, describes YC's implementation: a monitoring agent watches queries, diagnoses failures, proposes fixes to tools and context, commits code, and deploys overnight. The next morning, the same query succeeds. This is not productivity gain; it is self-improvement.

Blomfield frames this as a fundamental shift in company structure. When an organisation makes its knowledge legible to AI, hierarchical coordination becomes redundant; loops that self-improve transcend the traditional coordination problem. The constraint shifts: token budget becomes more precious than headcount, and data and comprehension become the permanent assets while software becomes ephemeral and regenerable.

The deployment gap

The gap between agent capability and deployed workflow is where the real work lives. Levie identifies the internal FDE (forward-deployed engineer) as the highest-demand hire in tech: a technical person embedded in a business function who maps workflows, wires agents to tools and data, manages permissions, and carries the knowledge forward through model upgrades. This is not a one-time engagement; each model upgrade creates fresh work.

The practical consequence: people tracking AI development closely are roughly two years ahead of their organisations' adoption cycles, creating a window for practitioners. For engineers, IT staff, and operations people, there is an opportunity to help organisations implement AI agents into working enterprise workflows, and that opportunity will persist. The need is structural, not temporary.

What an agent in production requires

Background agents—the largest source of token consumption and business value—require infrastructure most organisations lack. Levie argues the biggest agent use of tokens will come from always-on agents or workflow-triggered agents, not chat-based interaction. His example: Claude Managed Agents running overnight to review contracts uploaded to Box, extracting critical information and writing tasks into Linear. This same pattern applies to client onboarding, invoice processing, M&A due diligence, data extraction pipelines, and millions of similar workflows.

To operate at scale, background agents need long-running execution, safe code execution, tool access, compute sandboxes, and cross-system connectivity. For an SME, this means the agent must safely call APIs into the systems where the business already lives—accounting, CRM, document stores, project tracking. This is where MCP (Model Context Protocol) servers earn their place: they let agents discover and call APIs into existing systems without writing point-to-point integrations. Levie demonstrates this pattern: Claude uses Box and Linear MCP servers to turn product roadmap documents into trackable work issues, connecting a knowledge store to an execution system without manual transcription.

The lesson is simpler than the infrastructure: a deployed agent bridges a gap in the organisation. It does not need to be perfect. It needs to connect knowledge that exists in one place to tools where work gets tracked and executed in another.

What works, what does not

Coding agents: why they work

Coding agents have cleared the hurdles that stall agents in the rest of the business. Levie identifies the structural reasons: coding has highly technical users who fix mid-run failures, verifiable outputs (code runs or it does not), a single source of context (the codebase), clean access controls, and purely digital work. None of those apply to most SME workflows. A law firm does not have a codebase; it has case files scattered across three systems. A manufacturing business does not have "users" in the technical sense; it has people tracking inventory in sheets, email, and a decade-old system.

Knowledge work: the structural gap

Knowledge work has context strewn across 20 systems, many not digital, scattered access permissions, and no single truth. The barrier is not capability but infrastructure. This is not news; it is a restatement of the data problem every organisation has faced. What changes with agents is that the problem becomes acute: when a human handled it, silence and workarounds were acceptable. When an agent touches it, bad data becomes visible failure.

The implication: agent deployment in knowledge work requires data work first. Not perfect data, but data the agent can reliably use. Context the agent can find. Permissions the agent can understand. This is not model capability; it is operations.

Token costs and operational budgets

Token costs are reshaping enterprise budgets. Levie observes that a single agent run can cost $1,000, far above the $20-per-user-per-month ceiling that worked for chatbots. Labs have pricing power as capacity runs tight and frontier token prices keep rising. The cost dynamics are breaking the subscription model.

The consequence is structural: AI spending must escape the capped 3 to 7 per cent IT budget and flow into line-of-business allocations. That is marketing, sales, and operations. This creates friction between finance, IT, and business owners over compute spend. For a small business, this is simpler: there is no IT budget to escape; the agent either delivers value faster than it costs, or it does not. But the scrutiny will be sharper, and the cost transparency is immediate.

Blomfield turns this into strategy: burn tokens, not headcount. If staff time is the constraint you face, trading it for token spent is often the right move. If token spend is the constraint, then you are optimising for the wrong thing.

Building agents that scale

Tool design and CLI efficiency

The tools an agent calls matter enormously to its cost and reliability. The CLI pattern is proving more efficient than GUI-based or MCP-only approaches for agent automation. CLIs enable agents to self-improve because every execution is logged and the agent can inspect what it tried, what succeeded, and what failed. They parse structured output directly—JSON, tabular data—saving tokens and reducing hallucination.

The practical implication: build tools with structured output. Modern CLI frameworks (Click for Python, Bun for Node) require only a few lines of code. The CLI's --help text becomes its API spec; agents call it first, read it, and learn what is available without separate documentation. The 90 seconds case study demonstrates this in production: background agents monitoring production logs autonomously diagnose and fix recurring issues, discovering things like shared email-provider reputation blocks and rotating credentials without human intervention.

The loop structure

Blomfield describes the systematic approach YC takes. A self-improving loop has five layers: sensors (data in), policy (what the AI can do without asking), tools (deterministic APIs), quality gate (checks and human review for high-risk actions), and learning (feedback and iteration). When minimal human intervention is required, the system compounds.

The discipline this requires is legibility. Record everything the organisation does—office hours, Slack, decisions, telemetry—then diarise, aggregate, and synthesise it into context the models can use. Blomfield captures the maxim: "If it is recorded, it happened to the AI. If it did not get recorded, it did not happen." This is not poetic; it is operational constraint. The agent can only learn from and reason about what is in its context.

Headless software and the persistent GUI

Levie's forecast on software architecture: every enterprise software vendor will operate a hybrid model within three years: a seat business model for the human UI and a consumption model for the agent caller. By volume, headless queries will dwarf human-interface interactions. But the GUI remains useful for complex document work, data rooms, and cases where a person wants to be hands-on. The agent does not replace the interface; it multiplies the capacity of the person using it.

The SME's feasible first step

The sources do not prescribe where to start, but the pattern is clear. Find a workflow where knowledge already exists in one place, work happens in another, and the gap is manual transcription or routing. Wire an agent to bridge that gap. Use MCP servers for the connections if the tools already exist. Start with a small batch and retrospectively improve. This is not a deployment infrastructure problem; it is a wiring problem.

Sources

Aaron Levie on enterprise AI in 2026: token shock, agent diffusion, and the rise of the internal FDE — Aaron Levie, CEO of Box

Rebuilding companies around self-improving AI loops — Tom Blomfield, General Partner at Y Combinator

Building agent self-improvement loops with command-line interfaces — Peregian Digital Hub

AI agents as specialist capacity for smaller companies — Aaron Levie

AI agents could narrow the specialist-talent gap for SMEs — Aaron Levie

The agent implementation gap for engineers, IT, and operations people — Aaron Levie

Claude turning Box roadmap documents into Linear issues via MCP — Aaron Levie

Agent deployment needs systems, context, workflow, and change work — Aaron Levie

Background agents as the next workflow pattern — Aaron Levie

AI agents as software that can act inside workflows — Aaron Levie

Workflow implementation is the AI-agent labour opportunity — Aaron Levie

MCP as a bridge between documents and work systems — Aaron Levie

There is no shortcut from model capability to stable process — Aaron Levie

Agent use cases hiding in document-heavy workflows — Aaron Levie