MCP & A2A Agent Orchestration — Technical Deep Dive

01

Project Overview

This project demonstrates how to build, deploy, and orchestrate agent systems using the A2A (Agent-to-Agent) v1.0 protocol on AWS Bedrock AgentCore. The primary focus is a fully layered A2A research service — from protocol-compliant AgentCard discovery through a LangGraph pipeline with quantized ONNX models — with an MCP fashion tool server included for protocol comparison.

What makes this different

Protocol as first-class architecture: A2A isn't bolted on — it shapes every layer from server routes to task lifecycle to artifact delivery
Five cleanly separated layers: Starlette server, DefaultRequestHandler, ResearchAgentExecutor, LangChain agent, LangGraph pipeline
Production-grade deployment: SigV4 signing, AWS Secrets Manager, Docker multi-stage builds, Bedrock AgentCore runtime
Practical performance work: Background model warmup, producer-consumer scraping, quantized ONNX models, FAISS dedup

System topology

USER { research query }

↓

🤖 Google ADK Local Agent Orchestrator

↓

🔐 A2A Client + AWS SigV4 Auth A2A

↓

☁️ Bedrock AgentCore Runtime AWS

↓

📡 A2A Starlette Server A2A v1.0

↓

⚙️ ResearchAgentExecutor Bridge

↓

🔗 LangChain Tool-Calling Agent Agent

↓

🔬 LangGraph 3-Node Pipeline DAG

↓

OUTPUT { research_summary artifact }

02

AgentCard Construction

The AgentCard is the A2A protocol's discovery mechanism — a structured JSON document served at /.well-known/agent-card.json (RFC 8615) that describes the agent's capabilities, skills, supported input/output modes, and endpoint URL.

AgentSkill definition

AgentSkill(
    id="research",
    name="Deep Research",
    description="Search, scrape, deduplicate, and summarize web content on any topic.",
    tags=["research", "web", "summarization"],
    examples=[
        "Explain quantum computing in simple terms",
        "What are the latest advances in renewable energy?",
    ],
    input_modes=["text/plain"],
    output_modes=["text/plain"],
)

AgentCard assembly

AgentCard(
    name="Research Agent",
    description="LangGraph research pipeline: search -> scrape -> dedup -> summarize.",
    url=BASE_URL,                                  # http://localhost:8080
    version="1.0.0",
    default_input_modes=["text/plain"],
    default_output_modes=["text/plain"],
    capabilities=AgentCapabilities(streaming=False),
    skills=[skill],
)

Field	Value	Purpose
name	Research Agent	Display name for discovery UIs
url	BASE_URL	Endpoint where the agent accepts JSON-RPC requests
version	1.0.0	Semantic versioning for capability negotiation
capabilities	streaming=False	Declares sync message/send support (no SSE streaming)
skills	[research]	List of AgentSkill objects with examples and tags
default_input_modes	["text/plain"]	MIME types the agent accepts
default_output_modes	["text/plain"]	MIME types the agent produces

Discovery is automatic. The A2AStarletteApplication from the a2a-sdk automatically intercepts GET requests to /.well-known/agent-card.json and returns the AgentCard as JSON. No manual route registration is needed — the SDK handles RFC 8615 compliance.

03

A2A Server Architecture

The server is built as a Starlette ASGI application using the official a2a-sdk, with three integration points: the SDK's request handler, a health endpoint, and an AgentCore compatibility bridge.

Server assembly

# 1. Wire executor + task store into the A2A request handler
request_handler = DefaultRequestHandler(
    agent_executor=ResearchAgentExecutor(),
    task_store=InMemoryTaskStore(),
)

# 2. Build the Starlette app with AgentCard + handler
server = A2AStarletteApplication(
    agent_card=agent_card,
    http_handler=request_handler,
)
app = server.build()

# 3. Add health check + AgentCore /invocations bridge
app.add_route("/ping", ping, methods=["GET"])
app.add_route("/invocations", server._handle_requests, methods=["POST"])

Routes exposed

Endpoint	Method	Handler	Purpose
/.well-known/agent-card.json	GET	A2AStarletteApplication	RFC 8615 discovery — returns AgentCard JSON
/	POST	_handle_requests	JSON-RPC 2.0 endpoint (message/send, message/stream)
/invocations	POST	_handle_requests	AgentCore compatibility — mirrors the JSON-RPC handler
/ping	GET	ping()	Health check for Docker/AgentCore probes

The /invocations bridge is key to Bedrock AgentCore. AgentCore routes POST /runtimes/{arn}/invocations to the container's /invocations path. By mirroring the JSON-RPC handler at both / and /invocations, the same code works locally and on AgentCore with zero changes.

DefaultRequestHandler responsibilities

Parses incoming JSON-RPC 2.0 requests and validates method names
Extracts RequestContext (task_id, context_id, message) from the payload
Routes to the ResearchAgentExecutor's execute() method
Manages InMemoryTaskStore for task lifecycle persistence
Serializes events from EventQueue back to the client (buffered or SSE)

04

ResearchAgentExecutor & Task Lifecycle

The executor is the critical bridge between the A2A protocol layer and the LangChain agent. It implements the AgentExecutor interface from the a2a-sdk and manages the full task lifecycle through the EventQueue.

Execution sequence

Step 1 Create or retrieve Task from context.current_task, emit initial Task event to EventQueue

Step 2 Emit TaskStatusUpdateEvent with state=TaskState.working and final=False — signals processing started

Step 3 Extract user query from context.message.parts[0].root.text — A2A Message Part oneOf pattern

Step 4 Offload blocking LangChain agent to thread pool: await loop.run_in_executor(None, run_agent, query)

Step 5 Emit TaskArtifactUpdateEvent with new_text_artifact(name="research_summary", text=result) and last_chunk=True

Step 6 Emit final TaskStatusUpdateEvent with state=TaskState.completed and final=True

On Error Catch exception, emit TaskStatusUpdateEvent with state=TaskState.failed and error message

Method signature

class ResearchAgentExecutor(AgentExecutor):
    """Bridges the A2A v1.0 protocol to the LangChain research agent."""

    async def execute(
        self,
        context: RequestContext,   # task_id, context_id, message, current_task
        event_queue: EventQueue,   # async queue for streaming events back
    ) -> None:

    async def cancel(self, context, event_queue) -> None:
        raise Exception("cancel not supported")

Event streaming pattern

The executor never returns directly to the client. Instead:

Events are enqueued to the EventQueue as processing progresses
DefaultRequestHandler dequeues and serializes them
For message/send: events are buffered and returned after the final event
For message/stream: events are streamed as Server-Sent Events (SSE) in real-time

Thread pool bridging is essential. The LangChain agent is synchronous, but the A2A server is async (Starlette). The executor wraps the blocking run_agent() call in loop.run_in_executor() to avoid blocking the async event loop — a critical detail for production deployments.

05

A2A Client & SigV4 Authentication

The client supports three modes: discovery, synchronous message/send, and streaming message/stream. For production deployments, all requests are signed with AWS SigV4.

Discovery

def discover(base_url: str) -> dict:
    url = f"{base_url}/.well-known/agent-card.json"
    resp = httpx.get(url, timeout=30.0)
    return resp.json()

message/send — synchronous JSON-RPC request

payload = {
    "jsonrpc": "2.0",
    "id": str(uuid.uuid4()),
    "method": "message/send",
    "params": {
        "message": {
            "messageId": str(uuid.uuid4()),
            "role": "user",
            "parts": [{"kind": "text", "text": query}],
        }
    },
}
resp = httpx.post(f"{base_url}/", json=payload, timeout=300)

message/stream — SSE streaming

with httpx.stream("POST", url, json=payload,
    headers={"Accept": "text/event-stream"}
) as resp:
    for line in resp.iter_lines():
        if line.startswith("data:"):
            yield json.loads(line[5:].strip())

SigV4 signing for production

# AWS SigV4 authentication for Bedrock AgentCore
_session     = boto3.Session()
_credentials = _session.get_credentials()

aws_req = AWSRequest(
    method="POST",
    url=f"{DP_ENDPOINT}/runtimes/{AGENTCORE_ARN}/invocations?qualifier=DEFAULT",
    data=payload,
    headers={
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream",
        "X-Amzn-Bedrock-AgentCore-Runtime-Session-Id": session_id,
    },
)
SigV4Auth(_credentials.get_frozen_credentials(),
          "bedrock-agentcore", AWS_REGION).add_auth(aws_req)

Pragmatic protocol adaptation. In production, the A2A client constructs the AgentCard directly when the runtime ARN is already known — rather than assuming AgentCore will expose card discovery over GET. This avoids a brittle assumption while staying protocol-compliant.

06

A2A Protocol Deep Dive

The A2A v1.0 protocol defines a structured way for agents to discover each other, exchange messages, and manage task lifecycles. Here's how each concept is implemented.

Task state machine

Created

→

Working

→

Artifact

→

Completed

Working

→

Failed

Event types used

Event	When Emitted	Key Fields
Task	Task created or retrieved	task_id, context_id, message
TaskStatusUpdateEvent	State transitions (working, completed, failed)	state, final, message
TaskArtifactUpdateEvent	Research results available	artifact.name, artifact.parts, last_chunk

Message structure

# A2A Message uses the Part oneOf pattern
message = {
    "messageId": "uuid-here",
    "role": "user",               # or "agent"
    "parts": [
        {"kind": "text", "text": "Research quantum computing"}
    ],
    "contextId": "ctx-uuid",      # optional, for multi-turn
}

# Artifact structure
artifact = new_text_artifact(
    name="research_summary",     # identifier
    text=result_string              # content
)

JSON-RPC 2.0 throughout. Every A2A request follows the JSON-RPC 2.0 spec with jsonrpc, id, method, and params fields. Methods include message/send (synchronous) and message/stream (SSE). The SDK handles serialization, deserialization, and error responses.

07

LangGraph Research Pipeline

The core pipeline is a 3-node LangGraph StateGraph with Pydantic-typed state (ResearchState). Each node has a single responsibility.

Graph construction

g = StateGraph(ResearchState)

g.add_node("research",   research.run)
g.add_node("dedup",      dedup.run)
g.add_node("summarizer", summarizer.run)

g.set_entry_point("research")
g.add_edge("research", "dedup")
g.add_edge("dedup", "summarizer")
g.add_edge("summarizer", END)

compiled_graph = g.compile()

Pipeline state model

class ResearchState(BaseModel):
    user_query: str           # Raw user query string
    topic: str                # Extracted research topic
    desired_length: int       # Target word count (default: 500)
    raw_research_data: str    # Assembled corpus from web scraping
    ranked_chunks: List[Dict]  # Top-k deduplicated, reranked chunks
    research_output: str      # Final summarized research
    last_error: Optional[str] # Latest error message

Research Node

Query Planner: LLM generates 3-5 diverse search queries from the topic
Parallel Search: Google Custom Search (primary) + Tavily (fallback), all queries in ThreadPoolExecutor
Producer-Consumer Scraping: as each search completes, URLs feed into a scrape pool of 15 workers — scraping starts before all searches finish
Content Extraction: BeautifulSoup preprocessing + Trafilatura, minimum 300 characters per page
Background Warmup: kicks off ONNX model loading while scraping runs, saving ~8 seconds

Summarizer Node

Uses a higher-tier model (gpt-5.4-mini vs nano) for quality
Constructs corpus from ranked chunks, sends with system prompt enforcing target word count
Logs input/output token counts and estimated cost per call
Produces the final research_output string returned as an A2A artifact

08

Semantic Dedup & ONNX Inference

The dedup node is where ML meets engineering — quantized ONNX models running on CPU without PyTorch, with FAISS-based deduplication and CrossEncoder reranking.

🧠Embedding Model
all-MiniLM-L6-v2 quantized to int8 ONNX. 384-dim embeddings via onnxruntime CPUExecutionProvider. Pre-bundled in Docker image.

🎯Reranker Model
cross-encoder/ms-marco-MiniLM-L-6-v2 quantized to int8 ONNX. Scores every chunk against the research topic for relevance.

📊

FAISS Deduplication

IndexFlatIP for cosine similarity. Threshold 0.85 removes near-duplicates while preserving diverse perspectives. Prefers longer chunks.

✂️

Chunking

RecursiveCharacterTextSplitter: 500-word chunks, 100-word overlap, min 50 words. Returns top 25 chunks after reranking.

Processing pipeline

# In-memory dedup pipeline
raw_corpus (str)
    → chunks (List[Dict])           # 500-word overlapping segments
    → embeddings (np.ndarray)       # ONNX all-MiniLM-L6-v2 (int8)
    → faiss_index (IndexFlatIP)     # Cosine similarity, threshold 0.85
    → unique_chunks (List)          # Near-duplicates removed
    → ranked_chunks (List)          # CrossEncoder reranking vs topic
    → state["ranked_chunks"]       # Top 25 with relevance scores

Performance optimizations

1️⃣Background Model Warmup
Research node kicks off ONNX model loading in a separate thread while scraping runs. Both embedder and reranker are warm by the time dedup starts — saves ~8 seconds of cold I/O.

2️⃣Producer-Consumer Scraping
Search and scrape tasks are decoupled. URLs enter the scrape pool as each search completes — scraping starts before all searches finish.

3️⃣Background Reranker
Dedup node starts loading CrossEncoder in background while embeddings compute. Reranker is ready by the time it's needed.

No PyTorch, no GPU required. Both models are pre-exported to quantized ONNX and bundled in the Docker image. This eliminates ~2 GB of PyTorch dependencies, enables CPU-only containers, and cuts cold-start time dramatically.

09

MCP Fashion Server (Protocol Comparison)

The MCP track demonstrates the tool-provider pattern alongside A2A — showing when to use each protocol. MCP is for tools; A2A is for autonomous agents.

A2A When to use

Remote side is an autonomous agent with task lifecycle, status transitions, and structured artifacts.

MCP When to use

Remote side is a tool provider — capabilities are listed, discovered, and called individually via JSON-RPC.

MCP tools exposed

Tool	Purpose
get_product_tags	Look up product attributes and tags from catalog
search_catalog	Search products by query, gender, category, price
generate_description	LLM-powered marketing copy generation

Dynamic tool discovery

The local fashion agent doesn't hardcode tool wrappers. It calls tools/list, parses JSON schemas, and dynamically builds Google ADK FunctionTool wrappers at runtime — adapting to whatever tools are deployed.

# Dynamic MCP tool wrapping
for tool in mcp_client.tools_list():
    schema = tool["inputSchema"]
    params = build_params_from_schema(schema)
    adk_tool = FunctionTool(func=make_handler(tool["name"]))
    # Agent can now call this tool naturally

10

Deployment & Infrastructure

The same code runs in three environments with configuration changes only.

💻

Local Development

uvicorn on localhost:8080, .env files, direct HTTP. Fastest iteration.

🐳

Docker Container

Multi-stage build with uv (14 parallel installs), ARM64-native, pre-compiled bytecode, ONNX models bundled.

☁️

Bedrock AgentCore

ECR-hosted container, managed runtime, SigV4 auth, CloudWatch logging. deploy.sh automates build → push → update.

🔒

Secrets Management

Dual-mode: env vars locally, AWS Secrets Manager in production. get_secret() abstracts both behind a single interface.

Deployment pipeline

Source + ONNX

→

Docker Build

→

Amazon ECR

→

AgentCore Runtime

→

CloudWatch

11

Tech Stack

Protocol

A2A v1.0

AgentCard discovery, JSON-RPC 2.0, task lifecycle, event streaming

Protocol

MCP v1.0

Tool interoperability with dynamic discovery and SSE transport

Orchestrator

Google ADK

Local agents with LiteLLM, tool calling, and delegation

Pipeline

LangGraph

3-node deterministic DAG with Pydantic typed state

A2A Server

Starlette + a2a-sdk

ASGI app with automatic AgentCard serving and /invocations bridge

MCP Server

FastMCP

Fashion tool registration with input schemas

Inference

ONNX Runtime

Quantized int8 embedding + reranking (no PyTorch)

Vector Index

FAISS

In-memory cosine similarity for semantic dedup

LLM

OpenAI

Research (nano) and summarization (mini) with cost tracking

Search

Google CSE + Tavily

Primary search with automatic fallback

Auth

AWS SigV4

botocore-based request signing for all remote calls

Cloud

Bedrock AgentCore

Managed runtime, ECR, Secrets Manager, CloudWatch

12

Design Principles

Principle 01

Protocol-First Architecture

A2A shapes every layer — AgentCard discovery, typed requests via DefaultRequestHandler, task lifecycle through EventQueue. Protocol isn't bolted on; it's the foundation.

Principle 02

Five-Layer Separation

Server (Starlette) → Handler (a2a-sdk) → Executor (bridge) → Agent (LangChain) → Pipeline (LangGraph). Each layer can be tested and replaced independently.

Principle 03

Thin Orchestrators

Local ADK agents are lightweight coordinators. All domain logic lives in remote services. The orchestrator decides what to call, not how to do it.

Principle 04

Dynamic Discovery

MCP consumer discovers tools at runtime via tools/list. A2A consumer discovers capabilities via AgentCard. No hardcoded knowledge of remote surfaces.

Principle 05

Local-to-Cloud Parity

Same code runs locally (uvicorn), in Docker (compose), and on AgentCore (ECR). The /invocations bridge is the only cloud-specific addition.

Principle 06

Security by Default

SigV4 signing for all production calls. Secrets abstracted behind get_secret() with env-var fallback. .env files excluded from Git and Docker context.

Principle 07

Performance-Aware ML

Quantized ONNX models (int8, no PyTorch), background warmup, producer-consumer scraping, parallel ThreadPoolExecutors throughout the pipeline.

Principle 08

Progressive Infrastructure

Local env vars now, AWS Secrets Manager for production, Bedrock Memory service module prepared for future activation. Built to grow without rewrites.

A2A Agent Orchestration
on AWS Bedrock AgentCore

Project Overview

What makes this different

System topology

AgentCard Construction

AgentSkill definition

AgentCard assembly

A2A Server Architecture

Server assembly

Routes exposed

DefaultRequestHandler responsibilities

ResearchAgentExecutor & Task Lifecycle

Execution sequence

Method signature

Event streaming pattern

A2A Client & SigV4 Authentication

Discovery

message/send — synchronous JSON-RPC request

message/stream — SSE streaming

SigV4 signing for production

A2A Protocol Deep Dive

Task state machine

Event types used

Message structure

LangGraph Research Pipeline

Graph construction

Pipeline state model

Research Node

Summarizer Node

Semantic Dedup & ONNX Inference

Processing pipeline

Performance optimizations

MCP Fashion Server (Protocol Comparison)

MCP tools exposed

Dynamic tool discovery

Deployment & Infrastructure

Deployment pipeline

Tech Stack

Design Principles

Explore the Source

A2A Agent Orchestrationon AWS Bedrock AgentCore

Project Overview

What makes this different

System topology

AgentCard Construction

AgentSkill definition

AgentCard assembly

A2A Server Architecture

Server assembly

Routes exposed

DefaultRequestHandler responsibilities

ResearchAgentExecutor & Task Lifecycle

Execution sequence

Method signature

Event streaming pattern

A2A Client & SigV4 Authentication

Discovery

message/send — synchronous JSON-RPC request

message/stream — SSE streaming

SigV4 signing for production

A2A Protocol Deep Dive

Task state machine

Event types used

Message structure

LangGraph Research Pipeline

Graph construction

Pipeline state model

Research Node

Summarizer Node

Semantic Dedup & ONNX Inference

Processing pipeline

Performance optimizations

MCP Fashion Server (Protocol Comparison)

MCP tools exposed

Dynamic tool discovery

Deployment & Infrastructure

Deployment pipeline

Tech Stack

Design Principles

Explore the Source

A2A Agent Orchestration
on AWS Bedrock AgentCore