Building Agent Teams with Claude Opus 4.6

March 15, 2026 | 8 minutes

In This Post

Why Agent Teams?
Designing Your Team Architecture
Coordination and State Management
- Shared Context via Structured Messages
- Error Boundaries and Retries
Lessons Learned
When Not to Use Agent Teams
What's Next

Single agents are powerful, but they eventually hit a ceiling. When tasks get complex, like spanning multiple codebases, requiring different expertise, or needing parallel execution, using a single agent starts to struggle with context limits, conflicting instructions, and slow sequential processing.

I've been building multi-agent systems with Claude Opus 4.6 and the results have been an improvement over what I was getting with single-agent architectures. In this post, I'll walk through how to design agent teams, the coordination patterns that work, common pitfalls, and practical TypeScript examples you can adapt for your own systems.

Why Agent Teams?

The case for multi-agent systems comes down to three things:

Specialization. A single agent juggling code review, test generation, documentation, and deployment is like a developer who's also the PM, QA lead, and DevOps engineer. It can technically do all of it, but the quality drops as responsibilities pile up. Specialized agents with focused system prompts perform significantly better at their individual tasks.

Parallelism. Sequential execution is slow. If your code review agent doesn't need to wait for the test agent to finish, why force them into a queue? Agent teams let you run independent work streams concurrently.

Context management. Opus 4.6 has a large context window, but filling it with code, docs, test results, deployment configs, and other information degrades performance. Splitting responsibilities across agents means each one gets a focused, relevant context.

Designing Your Team Architecture

Before writing any code, think about how your agents will be organized. I've found three patterns that work well in practice.

The Orchestrator Pattern

With the orchestrator pattern, one agent acts as the coordinator and delegates tasks to specialist agents. The coordinator then receives responses from the specialists and synthesizes their results. This is the most common pattern and the easiest to reason about.

This pattern works best when tasks have clear boundaries and the subtasks are mostly independent. Since the coordinator agent doesn't perform the tasks itself, it doesn't need deep expertise in any one area. All it needs is to be good at decomposition and synthesis.

1import Anthropic from "@anthropic-ai/sdk";
2
3const anthropic = new Anthropic();
4
5interface AgentResult {
6  agent: string;
7  output: string;
8  status: "success" | "error";
9}
10
11async function runSpecialist(
12  role: string,
13  systemPrompt: string,
14  task: string,
15): Promise<AgentResult> {
16  const response = await anthropic.messages.create({
17    model: "claude-sonnet-4-6",
18    max_tokens: 4096,
19    system: systemPrompt,
20    messages: [{ role: "user", content: task }],
21  });
22
23  const text = response.content
24    .filter((block) => block.type === "text")
25    .map((block) => block.text)
26    .join("\n");
27
28  return { agent: role, output: text, status: "success" };
29}
30
31// Example agent types and prompts
32function getSystemPrompt(agent: string): string {
33  const prompts: Record<string, string> = {
34    reviewer:
35      "You are a code reviewer. Identify bugs, security issues, and style problems.",
36    tester:
37      "You are a test engineer. Write thorough unit and integration tests.",
38    documenter:
39      "You are a technical writer. Write clear, concise documentation.",
40  };
41  return (
42    prompts[agent] ?? `You are a helpful assistant specializing in ${agent}.`
43  );
44}
45
46async function orchestrate(task: string): Promise<AgentResult[]> {
47  // Orchestrator breaks down the task
48  const planResponse = await anthropic.messages.create({
49    model: "claude-opus-4-6",
50    max_tokens: 2048,
51    system: `You are a task planner. Break down the given task into subtasks
52             and assign each to one of these specialists: reviewer, tester, documenter.
53             Respond with JSON: { "subtasks": [{ "agent": string, "task": string }] }`,
54    messages: [{ role: "user", content: task }],
55  });
56
57  const plan = JSON.parse(
58    planResponse.content
59      .filter((block) => block.type === "text")
60      .map((block) => block.text)
61      .join(""),
62  );
63
64  // Run independent subtasks in parallel
65  const results = await Promise.all(
66    plan.subtasks.map((subtask: { agent: string; task: string }) =>
67      runSpecialist(
68        subtask.agent,
69        getSystemPrompt(subtask.agent),
70        subtask.task,
71      ),
72    ),
73  );
74
75  return results;
76}
Copy to clipboard

The Pipeline Pattern

For the pipeline pattern, you arrange agents in a sequence where each agent's output feeds into the next. This works well for workflows with natural stages.

This pattern is intuitive but has an obvious weakness: it's sequential. Each stage blocks the next, and a failure early in the pipeline cascades. Use it when the stages genuinely depend on each other.

1async function pipeline(code: string): Promise<string> {
2  // Stage 1: Review
3  const review = await runSpecialist(
4    "reviewer",
5    "You are a code reviewer. Identify issues and suggest improvements.",
6    `Review this code:\n\n${code}`,
7  );
8
9  // Stage 2: Refactor based on review
10  const refactored = await runSpecialist(
11    "refactorer",
12    "You are a code refactoring specialist. Apply the suggested changes.",
13    `Original code:\n${code}\n\nReview feedback:\n${review.output}`,
14  );
15
16  // Stage 3: Generate tests for the refactored code
17  const tests = await runSpecialist(
18    "tester",
19    "You are a test engineer. Write comprehensive tests.",
20    `Write tests for this code:\n\n${refactored.output}`,
21  );
22
23  return tests.output;
24}
Copy to clipboard

The Debate Pattern

With the debate pattern, you have two or more agents independently tackle the same problem and a judge agent evaluates their approaches. This is surprisingly effective for design decisions and code architecture.

When I'm genuinely unsure about the right approach, I use the debate pattern. Disagreements between agents often surface tradeoffs that a single agent might gloss over.

1async function debate(question: string): Promise<string> {
2  // Two agents independently propose solutions
3  const [approachA, approachB] = await Promise.all([
4    runSpecialist(
5      "architect-a",
6      "You favor simple, pragmatic solutions. Minimize abstraction.",
7      question,
8    ),
9    runSpecialist(
10      "architect-b",
11      "You favor robust, extensible solutions. Plan for scale.",
12      question,
13    ),
14  ]);
15
16  // Judge evaluates both approaches
17  const judgment = await runSpecialist(
18    "judge",
19    `You are a senior engineer evaluating two proposed solutions.
20     Consider tradeoffs, maintainability, and fitness for the stated requirements.
21     Pick the better approach or synthesize the best parts of both.`,
22    `Question: ${question}
23
24     Approach A (pragmatic): ${approachA.output}
25
26     Approach B (robust): ${approachB.output}`,
27  );
28
29  return judgment.output;
30}
Copy to clipboard

Coordination and State Management

The hardest part of multi-agent systems is getting the agents to work together coherently. Here are a few patterns I've landed on.

Shared Context via Structured Messages

Passing raw text between agents introduces too much ambiguity and leads to unexpected communication breakdowns. You'll see more accuracy when you use structured formats so each agent can extract what it needs without parsing ambiguity.

1interface SharedContext {
2  task: string;
3  codebase: { path: string; content: string }[];
4  decisions: { agent: string; decision: string; reasoning: string }[];
5  constraints: string[];
6}
7
8function buildAgentPrompt(context: SharedContext, role: string): string {
9  const relevantDecisions = context.decisions
10    .map((d) => `${d.agent}: ${d.decision} (${d.reasoning})`)
11    .join("\n");
12
13  return `Current task: ${context.task}
14Prior decisions:\n${relevantDecisions}
15Constraints:\n${context.constraints.join("\n")}
16
17Your role: ${role}`;
18}
Copy to clipboard

Error Boundaries and Retries

Failure is inevitable in all things tech. 😅 A well-designed agent team handles this gracefully.

Retrying the same agent with the same input rarely helps unless it's an underlying temporary server error of some sort. If an agent fails, you'll want to either adjust the prompt, reduce the scope of the task, or route to a different agent entirely.

1async function runWithFallback(
2  primary: () => Promise<AgentResult>,
3  fallback: () => Promise<AgentResult>,
4  maxRetries = 2,
5): Promise<AgentResult> {
6  for (let attempt = 0; attempt <= maxRetries; attempt++) {
7    try {
8      return await primary();
9    } catch (error) {
10      if (attempt === maxRetries) {
11        console.warn("Primary agent failed, falling back");
12        return fallback();
13      }
14    }
15  }
16  return fallback();
17}
Copy to clipboard

Lessons Learned

After several months of building agent teams in production, here's what I wish I'd known earlier.

Start with one agent and split when you feel pain. Don't design a six-agent system on day one. Start with a single agent, identify where it struggles, and extract that responsibility into a specialist. Premature decomposition is a lot like premature abstraction.

Opus 4.6 is great as the orchestrator with lighter models for specialists. Not every agent needs to be the most capable model. I use Opus 4.6 for the orchestrator and planning-heavy roles, then Sonnet or Haiku for well-scoped specialist tasks. This cuts costs significantly without sacrificing quality where it matters.

Log everything between agents. When something goes wrong in a multi-agent system, you need to trace the full chain of decisions to find the issue. Log every inter-agent message, every planning step, and every tool call. You'll thank yourself during debugging.

Set explicit output formats. Agents communicating in free-form text tends to cause parsing errors and misunderstandings. Define clear schemas for inter-agent communication and validate outputs before passing them downstream.

Keep system prompts short and focused. A specialist agent with a 200-word system prompt outperforms one with a 2000-word prompt that tries to cover every edge case. Constraints should come from the task description, not the system prompt. Even with larger context windows, agents still perform best on smaller inputs as they're easier to "remember" and they help the agent focus on the right tasks.

When Not to Use Agent Teams

Agent teams add complexity. Don't use them when:

A single agent can handle the task within its context window
You can't afford the additional latency of coordination
The individual subtasks are too tightly coupled to separate cleanly

A well-prompted single agent will beat a poorly designed multi-agent system every time. Agent teams are a tool for scaling complexity, not a default architecture.

What's Next

Multi-agent systems are still early. The patterns are evolving fast, and what works today might look different in six months. But the core principles of specialization, clear interfaces, and structured communication are likely to hold.

If you're just getting started, pick one workflow that's hitting the limits of a single agent and try splitting it into two. You'll learn more from that one experiment than from any architecture diagram.

Also, for more on popular agentic patterns, checkout agentic-patterns.com.