The Agent Ops Stack Is Crystallizing
Six signals landed in 24 hours — skills architecture, swarm scaling, MCP token waste, memory infrastructure, attention serving, and collaboration frameworks.
The agent operations stack crystallized last week. Six independent signals landed in a single 24-hour window — skills architecture, swarm scaling, MCP token waste, memory infrastructure, attention serving, and collaboration frameworks. Separately, they're interesting. Together, they form a stack nobody is naming.
Layer 1: Skills — What Agents Do
Anthropic published their internal Claude Code skills taxonomy: nine categories, from API reference to CI/CD. Core insight: skills are folders, not markdown files — scripts, assets, and configs live alongside instructions. A skill that spans multiple categories confuses the agent. Pick one. Progressive disclosure keeps context lean.
But even skills hit budget caps at scale. When you have dozens of skills competing for context window space, you need a strategy for what loads when. The naive approach — load everything — burns tokens before the agent does any work. The production approach is progressive disclosure: load the minimum, expand on demand.
Layer 2: Swarms — How Many, How They Coordinate
Sermakarevich ran Claude Code swarms at 10–15 concurrent agents and published the postmortem. CLAUDE.md loads unconditionally and wastes tokens — "a terrible abstraction." Skills with progressive disclosure are better but still capped.
The real move: hierarchical knowledge base, task routing by model cost, centralized orchestration via SQLite. Production swarm patterns are emerging from the trenches, not from papers. The gap between "we ran 15 agents" and "we ran 15 agents reliably" is entirely in the coordination layer.
Layer 3: MCP Design — The Protocol Layer
Same backend API, same model, same 40 test prompts. One MCP server burned 637K input tokens. The other: 3.1 million. Same success rate. The difference? Incomplete tool returns force extra round-trips. Raw API dumps pollute context. Too many tools increase choice complexity.
Bad protocol design has a concrete price: 5× token multiplier. Every tool description and return shape you ship is infrastructure cost. This isn't an abstract concern — it's a line item on your API bill that directly correlates with how well your MCP tools are designed.
Layer 4: Memory — The Persistence Layer
The first systems characterization of agent memory dropped on arXiv. Four-axis taxonomy spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Ten systems benchmarked across write-path and read-path costs.
Ten recommendations: construction scheduling, capability floors, amortization by query volume, freshness-latency tradeoffs, fleet-scale management. Memory isn't just RAG anymore. It's a tier with its own cost curves, design tradeoffs, and failure modes. If you're treating memory as a simple vector store, you're ignoring half the problem.
Layer 5: Attention Serving — The Inference Layer Catching Up
Vortex (CMU / B200-scale) gives agents a programmable sparse attention frontend integrated into vLLM-style serving stacks. Agents auto-generate attention patterns — 3.46× throughput over full attention, zero accuracy loss. On MLA-based architectures: 4.7×. On 229B parameters: 1.37×.
Agent workloads have different attention patterns than chatbot workloads. The inference layer is starting to notice. This matters because agent workloads are bursty, multi-step, and context-heavy — they don't look like the chatbot traffic that most serving infrastructure was designed for.
Layer 6: Collaboration — The Coordination Layer
CollabSim (Microsoft Research) tests multi-agent collaborative competence through CSCW theory: common ground establishment, shared task understanding, incentive balancing, misalignment repair. Tested across four LLMs.
Finding: agents don't fail because they lack individual capability. They fail because they can't coordinate like a team. This is the layer everyone building multi-agent systems discovers the hard way. You can have the best individual agents in the world, and your system will still fail if they can't hand off work, resolve conflicts, and maintain shared context.
Running This Stack in Production
I run this stack in production. Hermes is the orchestrator. Kanban workers are the swarm. MCP is the protocol layer — and yes, bad tool design costs real money at scale. gbrain is the hierarchical knowledge base and memory tier. Each layer has real infrastructure, real cost curves, and real failure modes that you feel in your token bill before you read about them in a paper.
The Convergence Is the Story
Skills → Swarms → Protocol → Memory → Inference → Collaboration. Every layer has active research and active production usage right now. The gap between paper and practice is closing in weeks, not years. If you're deploying agent systems, you're already making architecture decisions at every one of these layers — whether you've named them or not.
The agent ops stack doesn't have a brand name yet. But it's being built in production right now by teams running real workloads. If you're operating at any of these layers, I want to see your stack. What did you build? What broke first? What did you learn the hard way that papers don't cover?