These are my notes from https://cognition.ai/blog/dont-build-multi-agents
Context is EVERYTHING. Entire context is full of implicit decisions an LLM is making, and if you don’t share full context across the task, the chances of different submodules/agents diverging increases significantly
Think of it this way. If you can get away with doing some work yourself, you’d do that as communication has overheads and having a larger team in human-world only makes sense because humans get tired (but LLMs don’t)
So, do not import concepts from human world into LLM world
Also, an LLMs latent state communicates way more than just sampled tokens (so when you just share sampled tokens with another module/agent, you risk losing significant amount of information)
So, if you can get away with building a single, linear system using LLMs, always prefer that over a mix of agents
For complex tasks, you will run into context length not being enough, but solve that as a separate issue (e.g. use a separate compression module for passing context)
Multiple agents make sense a) when you need speed, AND b) when tasks are de-coupled and divergence doesn’t matter
E.g. explore things in parallel (but even there, a single agent exploring sequentially is likely to be more diverse than parallel exploration, unless parallel explorations are prompted very differently)
Most current systems are single agent sharing entire context
Gemini CLI, Claude Code
Even editing models in IDEs like cursor are the same as chatting models as shared, continuous context helps in precise editing