If you read enterprise AI architecture diagrams from 2024, you saw a lot of arrows. Most of them were request/response. Frontend asks the agent, agent asks the tool, tool asks the database, response walks back up the chain. Repeat for every user action, every five seconds, forever.
That topology was always going to break under multi-agent load. It is now demonstrably breaking, and the answer the industry is converging on isn’t new — it’s just newly relevant.
The pattern is older than AI
Event-driven architecture has been the standard answer to high-throughput, low-latency, loosely-coupled systems since at least the early 2000s. Stock exchanges run on it. Massively-multiplayer games run on it. The entire telecom signalling stack runs on it. Modern logistics, modern banking back-ends, real-time fraud detection — all event-driven, none of them asking each other questions in synchronous round trips.
The reasons are well understood: events scale better than requests, decouple producers from consumers, give you replayability for free, and make backpressure a property of the bus rather than a property of every endpoint.
What changed is that AI workloads — agentic, multi-step, latency-sensitive, with humans waiting — have started to look exactly like the workloads that pushed trading and gaming to event-driven architectures decades ago.
Why the AI request/response default broke
Three things happened at once:
- Multi-agent topologies became normal. A single user action now triggers 5–15 LLM calls across multiple agents. Synchronous chains pile latency on latency.
- Long-running operations became normal. “Generate a 40-page report” isn’t a request, it’s a process. Holding a connection open for it is wrong.
- External triggers became normal. Workflows now react to incoming emails, webhook events, scheduled timers, file drops, IoT signals. None of those are user-initiated requests.
Once any of these is true for your workload, request/response stops being a sufficient primitive. Once all three are true — and they are, for almost every interesting enterprise AI workload — you’re building an event-driven system whether you admit it or not.
What an event-driven agent runtime actually looks like
Concretely, our platform looks like this:
- A central durable event bus (NATS JetStream in our default deployment, Kafka in customer environments that already run it). Every meaningful state transition goes through it.
- Agents are consumers, not callers. They subscribe to event types they care about, emit new events when they finish, and have no idea who consumes those.
- Tool calls are events too. An agent emits
tool.invoke, the tool service consumes it and emitstool.result. Same bus, same replay semantics, same observability. - Workflows are subscribers to event patterns. The workflow engine watches for event sequences that match its topology and advances state accordingly.
- The UI is an event consumer too. It opens a websocket subscribed to the user’s event stream and renders incrementally. No polling, no “loading...” spinners hiding silent backend work.
// Schematic — not real code, but shape is real
publish('agent.task.started', { agent: 'research', task_id })
→ research-agent consumes
→ publishes 'tool.invoke' { tool: 'web.search', ... }
→ tool-router consumes
→ publishes 'tool.result' { ... }
→ research-agent consumes (resumes)
→ publishes 'agent.task.completed' { result }
→ workflow-engine consumes (advances)
→ UI subscribes to user-stream, renders updates
Notice what isn’t in there: synchronous waits, polling loops, request/response chains, timeouts hiding stuck operations. Each step is a transition in a durable log. Anything that goes wrong is replayable from the bus.
The five wins you only get with events
- Backpressure for free. If a tool service is slow, events queue. The system degrades gracefully. With request/response, slow services cause timeout cascades.
- Replayability. Bug in production? Replay the event log against the fixed code. We do this almost weekly.
- Observability that’s actually useful. Every event has a correlation ID. Tracing a user action through 12 agents is grepping a log, not stitching together 12 distributed traces.
- Decoupled deployments. Adding a new agent that reacts to an existing event type requires zero changes to existing services. They emit; the new one subscribes.
- External triggers are first-class. An incoming email hitting an SMTP listener is just a publisher of
email.received. The rest of the workflow doesn’t care that it wasn’t a user click.
Why most agent frameworks haven’t converged here yet
The popular agent frameworks were optimised for the demo. The demo is one user, one prompt, one chain, one output. That is a request/response shape, and request/response APIs are easier to learn and easier to demo. That doesn’t make them right for production.
The frameworks that have converged on event-driven primitives — LangGraph being the most prominent — tend to look more complex on first encounter and more obvious on third. Once you’re running real workloads, the complexity moves to where you actually need it (the bus) and out of where you don’t (every individual call site).
The lesson the rest of distributed systems learned 20 years ago is now hitting AI workloads. The good news: the playbook is well-documented, the tools are mature, the failure modes are known.
Where to start if you’re still on synchronous chains
- Pick a bus. NATS for greenfield. Kafka if your enterprise already runs it.
- Move long-running operations off the request path first. “Generate report” is the obvious first candidate. The user gets immediate acknowledgement and a stream of progress events.
- Make tool calls go through the bus. Even if the tool service still implements the work synchronously internally, putting the bus in front gives you queueing and observability immediately.
- Treat the UI as just another subscriber. WebSocket or SSE, subscribed to the user’s relevant event types. Stop polling.
- Resist the urge to also rewrite everything. Strangler-fig migration. The bus and the synchronous code can coexist for years if needed.
None of this is new. None of this is exotic. It’s the same architecture the trading floor and the MMO server have used for decades. AI workloads finally need it. The companies that adopt it now are setting up for an order of magnitude scale advantage over the ones still wiring agents together with synchronous calls.