Mycelium and AI Agents: A Surprisingly Useful Metaphor for MAS Architecture

Forests share resources through underground fungal networks. Multi-agent systems share state through Kafka. The analogy is closer than it sounds — and it changes how we think about resource brokerage, backpressure, and resilience.

Biologists have a name for the underground network that connects trees in a forest: the “wood-wide web.” Networks of fungal hyphae — the mycelium — thread between root systems, exchange sugars and minerals, transmit chemical alerts, and broker resources between species that, on the surface, look like solitary individuals.

Trees that look independent are not. They’re nodes in a substrate that does the actual coordination work. The substrate is invisible, mostly; you’d miss it entirely if you only looked at the canopy.

Multi-agent systems, the good ones, look exactly like this. The agents are the trees. The bus is the mycelium. The work the bus does is what makes the architecture more than the sum of the agents.

Why metaphors matter for system design

You design what you can imagine. The dominant mental model for “multi-agent system” is still a directed acyclic graph: agent A calls agent B, B calls C, C returns to B, B returns to A. Hierarchical, request/response, finite. That’s a useful starting point and a terrible final architecture.

The mycelium metaphor pushes you toward something different: agents as peers connected to a shared substrate, the substrate doing brokerage and routing, no single agent “in charge” of the others, resilience emerging from the network not from any node. That’s the architecture we’ve converged on after eighteen months of running real customer deployments.

Five places the metaphor actually pays off

1. Resource brokerage

In a forest, a tree with surplus carbon (a strong producer) can “send” sugars to a tree under stress (a sapling, a wounded tree) through the fungal network. The network does the brokerage; no tree decides “I will help that one specifically.”

In our agent runtime, the same pattern shows up around GPU time, rate-limited APIs, and downstream-service capacity. An idle agent emits availability events; a busy task router consumes them and assigns work where the capacity is. No agent has a model of all other agents. The bus does the brokerage. We let the substrate do the bookkeeping.

2. Backpressure as a property of the network

Forests under drought signal stress through the network: chemical alerts flow, downstream trees alter their behaviour, the system adapts without any tree having a global view. That’s real, measurable, and well-documented.

An event bus does the same. When a downstream consumer is slow, events queue. The queue depth itself is the signal. Upstream producers can read the queue depth and slow down. No coordinator. No timeouts. No cascading failure. Just emergent backpressure as a property of the substrate.

Concretely: this is why we don’t try to put a coordinator on top of the agents. A coordinator becomes a single point of failure, a single point of bottleneck, and a single point of complexity. The substrate does the coordination job better — if you let it. The hardest part of designing MAS architecture is restraining yourself from adding the central manager that the metaphor tells you you don’t need.

3. Resilience by redundancy, not by orchestration

Old-growth forests survive storms because they’re networks. Lose a tree, the network reroutes resources around the gap. The resilience isn’t in any individual tree being especially robust; it’s in the substrate’s ability to keep delivering.

Translate to MAS: lose an agent (it crashed, it’s being redeployed, its GPU is reallocated), the bus keeps the events flowing, another instance picks up the subscription, the system continues. The agent is replaceable; the substrate is not. The substrate is what you invest in.

4. Heterogeneous specialists, shared substrate

A forest mycelial network connects oaks, pines, ferns, fungi-of-other-species. They all speak the same chemical/sugar protocols on the substrate, even though above ground they look completely different. The network doesn’t care what species you are. It cares whether you can produce or consume sugars.

In our runtime, the bus doesn’t care whether an agent is a Python LangGraph node, a Rust microservice, an SQL stored procedure wrapped in a tool, a human in Slack approving a workflow, or an external SaaS triggered via webhook. They all publish and consume events with the same envelope. The substrate is genuinely species-agnostic.

5. Memory and history live in the substrate

Mycelium has been shown to retain “memory” of past stresses — the network adapts based on history, not just current state. The history is in the substrate, not in any single tree.

For us this maps onto the durable event log. Past agent decisions, past tool invocations, past user interactions — they all live in the bus, replayable, queryable, auditable. Any new agent that joins the system can reconstruct the history it needs from the substrate. No agent has to remember anything important on its own.

Why this metaphor leads to Kafka (or NATS)

Once you accept that the substrate is doing the coordination work, you start asking what properties the substrate needs. The shopping list is unsurprising:

  • Durable — the substrate has to outlive any individual node
  • Multi-subscriber — many agents care about the same event types
  • Replayable — new agents joining must be able to catch up
  • Ordered (per partition) — sequence matters for correctness
  • Backpressure-aware — queue depth is a first-class signal
  • Cheap to subscribe — adding an agent shouldn’t require coordination with existing ones

That list is, basically, Kafka’s feature set. Or NATS JetStream. Or RedPanda. Pick your flavour. The key insight is that the substrate is now the most important architectural decision in the system — more important than which LLM you use, which framework you use, even which orchestration model you adopt.

The mycelium is what makes the forest a system. The bus is what makes your agents a system.

Where the metaphor breaks (because every metaphor does)

Forests are slow. Sugars and minerals propagate over hours and days. Agent systems work on milliseconds. The brokerage logic in your bus has to be optimised for that reality — partition keys, consumer groups, exactly-once semantics, dead-letter queues for poison messages. None of that is in the biology.

Forests are also cooperative-by-default. Multi-agent systems have to be designed for adversarial cases (compromised agent, runaway loop, cost-blowing tool call). Your substrate needs guardrails that biology doesn’t.

So: take the metaphor for what it gives you (the architecture intuition) and replace it with engineering rigour for the rest.

You don’t design a forest by deciding which tree is in charge. You design it by tending the soil. Multi-agent systems are the same.

Practical takeaways

  • Spend more time on your bus than on your agents. The bus is the durable architecture; agents come and go.
  • Treat agents as peers, not as a hierarchy. Every coordinator you add is a future bottleneck.
  • Let backpressure be emergent. Slow consumers should slow producers via queue depth, not via timeouts.
  • Make the event log durable from day one. The replay capability is what saves you in week 14.
  • Optimise for “new agents can join cheaply.” If adding an agent requires coordination with existing ones, your topology will calcify.

The forest didn’t need an architect. It needed soil good enough to support a network that did the rest. We’re building agent systems the same way.

Want to See an MAS Substrate Up Close?

We’ll walk through our event-driven runtime, the bus topology, and the trade-offs. No slides — live system.