A deep dive into building a production-grade A2A (Agent-to-Agent) research service — from AgentCard
construction and Starlette server setup, through the ResearchAgentExecutor bridge, to a 3-node
LangGraph pipeline with ONNX-based semantic dedup. Deployed on AWS Bedrock AgentCore with SigV4
authentication. Includes an MCP fashion tool server for protocol comparison.
This project demonstrates how to build, deploy, and orchestrate agent systems using the
A2A (Agent-to-Agent) v1.0 protocol on AWS Bedrock AgentCore.
The primary focus is a fully layered A2A research service — from protocol-compliant AgentCard
discovery through a LangGraph pipeline with quantized ONNX models — with an MCP fashion tool
server included for protocol comparison.
What makes this different
Protocol as first-class architecture: A2A isn't bolted on — it shapes every layer from server routes to task lifecycle to artifact delivery
The AgentCard is the A2A protocol's discovery mechanism — a structured JSON document
served at /.well-known/agent-card.json (RFC 8615) that describes the agent's capabilities,
skills, supported input/output modes, and endpoint URL.
AgentSkill definition
AgentSkill(
id="research",
name="Deep Research",
description="Search, scrape, deduplicate, and summarize web content on any topic.",
tags=["research", "web", "summarization"],
examples=[
"Explain quantum computing in simple terms",
"What are the latest advances in renewable energy?",
],
input_modes=["text/plain"],
output_modes=["text/plain"],
)
Endpoint where the agent accepts JSON-RPC requests
version
1.0.0
Semantic versioning for capability negotiation
capabilities
streaming=False
Declares sync message/send support (no SSE streaming)
skills
[research]
List of AgentSkill objects with examples and tags
default_input_modes
["text/plain"]
MIME types the agent accepts
default_output_modes
["text/plain"]
MIME types the agent produces
Discovery is automatic. The A2AStarletteApplication from the a2a-sdk
automatically intercepts GET requests to /.well-known/agent-card.json and returns the
AgentCard as JSON. No manual route registration is needed — the SDK handles RFC 8615 compliance.
03
A2A Server Architecture
The server is built as a Starlette ASGI application using the official a2a-sdk, with three integration points: the SDK's request handler, a health endpoint, and an AgentCore compatibility bridge.
Server assembly
# 1. Wire executor + task store into the A2A request handler
request_handler = DefaultRequestHandler(
agent_executor=ResearchAgentExecutor(),
task_store=InMemoryTaskStore(),
)
# 2. Build the Starlette app with AgentCard + handler
server = A2AStarletteApplication(
agent_card=agent_card,
http_handler=request_handler,
)
app = server.build()
# 3. Add health check + AgentCore /invocations bridge
app.add_route("/ping", ping, methods=["GET"])
app.add_route("/invocations", server._handle_requests, methods=["POST"])
AgentCore compatibility — mirrors the JSON-RPC handler
/ping
GET
ping()
Health check for Docker/AgentCore probes
The /invocations bridge is key to Bedrock AgentCore. AgentCore routes
POST /runtimes/{arn}/invocations to the container's /invocations path.
By mirroring the JSON-RPC handler at both / and /invocations, the same
code works locally and on AgentCore with zero changes.
DefaultRequestHandler responsibilities
Parses incoming JSON-RPC 2.0 requests and validates method names
Extracts RequestContext (task_id, context_id, message) from the payload
Routes to the ResearchAgentExecutor's execute() method
Manages InMemoryTaskStore for task lifecycle persistence
Serializes events from EventQueue back to the client (buffered or SSE)
04
ResearchAgentExecutor & Task Lifecycle
The executor is the critical bridge between the A2A protocol layer and the LangChain agent. It implements the AgentExecutor interface from the a2a-sdk and manages the full task lifecycle through the EventQueue.
Execution sequence
Step 1Create or retrieve Task from context.current_task, emit initial Task event to EventQueue
Step 2Emit TaskStatusUpdateEvent with state=TaskState.working and final=False — signals processing started
Step 3Extract user query from context.message.parts[0].root.text — A2A Message Part oneOf pattern
Step 5Emit TaskArtifactUpdateEvent with new_text_artifact(name="research_summary", text=result) and last_chunk=True
Step 6Emit final TaskStatusUpdateEvent with state=TaskState.completed and final=True
On ErrorCatch exception, emit TaskStatusUpdateEvent with state=TaskState.failed and error message
Method signature
classResearchAgentExecutor(AgentExecutor):
"""Bridges the A2A v1.0 protocol to the LangChain research agent."""async defexecute(
self,
context: RequestContext, # task_id, context_id, message, current_task
event_queue: EventQueue, # async queue for streaming events back
) -> None:
async defcancel(self, context, event_queue) -> None:
raiseException("cancel not supported")
Event streaming pattern
The executor never returns directly to the client. Instead:
Events are enqueued to the EventQueue as processing progresses
DefaultRequestHandler dequeues and serializes them
For message/send: events are buffered and returned after the final event
For message/stream: events are streamed as Server-Sent Events (SSE) in real-time
Thread pool bridging is essential. The LangChain agent is synchronous, but the A2A
server is async (Starlette). The executor wraps the blocking run_agent() call in
loop.run_in_executor() to avoid blocking the async event loop — a critical detail
for production deployments.
05
A2A Client & SigV4 Authentication
The client supports three modes: discovery, synchronous message/send, and streaming message/stream. For production deployments, all requests are signed with AWS SigV4.
with httpx.stream("POST", url, json=payload,
headers={"Accept": "text/event-stream"}
) as resp:
for line in resp.iter_lines():
if line.startswith("data:"):
yield json.loads(line[5:].strip())
Pragmatic protocol adaptation. In production, the A2A client constructs the
AgentCard directly when the runtime ARN is already known — rather than assuming AgentCore will
expose card discovery over GET. This avoids a brittle assumption while staying protocol-compliant.
06
A2A Protocol Deep Dive
The A2A v1.0 protocol defines a structured way for agents to discover each other, exchange messages, and manage task lifecycles. Here's how each concept is implemented.
JSON-RPC 2.0 throughout. Every A2A request follows the JSON-RPC 2.0 spec with
jsonrpc, id, method, and params fields.
Methods include message/send (synchronous) and message/stream (SSE).
The SDK handles serialization, deserialization, and error responses.
07
LangGraph Research Pipeline
The core pipeline is a 3-node LangGraph StateGraph with Pydantic-typed state (ResearchState). Each node has a single responsibility.
classResearchState(BaseModel):
user_query: str# Raw user query string
topic: str# Extracted research topic
desired_length: int# Target word count (default: 500)
raw_research_data: str# Assembled corpus from web scraping
ranked_chunks: List[Dict] # Top-k deduplicated, reranked chunks
research_output: str# Final summarized research
last_error: Optional[str] # Latest error message
Research Node
Query Planner: LLM generates 3-5 diverse search queries from the topic
Parallel Search: Google Custom Search (primary) + Tavily (fallback), all queries in ThreadPoolExecutor
Producer-Consumer Scraping: as each search completes, URLs feed into a scrape pool of 15 workers — scraping starts before all searches finish
Background Warmup: kicks off ONNX model loading while scraping runs, saving ~8 seconds
Summarizer Node
Uses a higher-tier model (gpt-5.4-mini vs nano) for quality
Constructs corpus from ranked chunks, sends with system prompt enforcing target word count
Logs input/output token counts and estimated cost per call
Produces the final research_output string returned as an A2A artifact
08
Semantic Dedup & ONNX Inference
The dedup node is where ML meets engineering — quantized ONNX models running on CPU without PyTorch, with FAISS-based deduplication and CrossEncoder reranking.
🧠
Embedding Model
all-MiniLM-L6-v2 quantized to int8 ONNX. 384-dim embeddings via onnxruntime CPUExecutionProvider. Pre-bundled in Docker image.
🎯
Reranker Model
cross-encoder/ms-marco-MiniLM-L-6-v2 quantized to int8 ONNX. Scores every chunk against the research topic for relevance.
📊
FAISS Deduplication
IndexFlatIP for cosine similarity. Threshold 0.85 removes near-duplicates while preserving diverse perspectives. Prefers longer chunks.
✂️
Chunking
RecursiveCharacterTextSplitter: 500-word chunks, 100-word overlap, min 50 words. Returns top 25 chunks after reranking.
Research node kicks off ONNX model loading in a separate thread while scraping runs. Both embedder and reranker are warm by the time dedup starts — saves ~8 seconds of cold I/O.
2️⃣
Producer-Consumer Scraping
Search and scrape tasks are decoupled. URLs enter the scrape pool as each search completes — scraping starts before all searches finish.
3️⃣
Background Reranker
Dedup node starts loading CrossEncoder in background while embeddings compute. Reranker is ready by the time it's needed.
No PyTorch, no GPU required. Both models are pre-exported to quantized ONNX and
bundled in the Docker image. This eliminates ~2 GB of PyTorch dependencies, enables CPU-only
containers, and cuts cold-start time dramatically.
09
MCP Fashion Server (Protocol Comparison)
The MCP track demonstrates the tool-provider pattern alongside A2A — showing when to use each protocol. MCP is for tools; A2A is for autonomous agents.
A2AWhen to use
Remote side is an autonomous agent with task lifecycle, status transitions, and structured artifacts.
MCPWhen to use
Remote side is a tool provider — capabilities are listed, discovered, and called individually via JSON-RPC.
MCP tools exposed
Tool
Purpose
get_product_tags
Look up product attributes and tags from catalog
search_catalog
Search products by query, gender, category, price
generate_description
LLM-powered marketing copy generation
Dynamic tool discovery
The local fashion agent doesn't hardcode tool wrappers. It calls tools/list, parses JSON schemas, and dynamically builds Google ADK FunctionTool wrappers at runtime — adapting to whatever tools are deployed.
# Dynamic MCP tool wrappingfor tool in mcp_client.tools_list():
schema = tool["inputSchema"]
params = build_params_from_schema(schema)
adk_tool = FunctionTool(func=make_handler(tool["name"]))
# Agent can now call this tool naturally
10
Deployment & Infrastructure
The same code runs in three environments with configuration changes only.
💻
Local Development
uvicorn on localhost:8080, .env files, direct HTTP. Fastest iteration.
Tool interoperability with dynamic discovery and SSE transport
Orchestrator
Google ADK
Local agents with LiteLLM, tool calling, and delegation
Pipeline
LangGraph
3-node deterministic DAG with Pydantic typed state
A2A Server
Starlette + a2a-sdk
ASGI app with automatic AgentCard serving and /invocations bridge
MCP Server
FastMCP
Fashion tool registration with input schemas
Inference
ONNX Runtime
Quantized int8 embedding + reranking (no PyTorch)
Vector Index
FAISS
In-memory cosine similarity for semantic dedup
LLM
OpenAI
Research (nano) and summarization (mini) with cost tracking
Search
Google CSE + Tavily
Primary search with automatic fallback
Auth
AWS SigV4
botocore-based request signing for all remote calls
Cloud
Bedrock AgentCore
Managed runtime, ECR, Secrets Manager, CloudWatch
12
Design Principles
Principle 01
Protocol-First Architecture
A2A shapes every layer — AgentCard discovery, typed requests via DefaultRequestHandler, task lifecycle through EventQueue. Protocol isn't bolted on; it's the foundation.
Principle 02
Five-Layer Separation
Server (Starlette) → Handler (a2a-sdk) → Executor (bridge) → Agent (LangChain) → Pipeline (LangGraph). Each layer can be tested and replaced independently.
Principle 03
Thin Orchestrators
Local ADK agents are lightweight coordinators. All domain logic lives in remote services. The orchestrator decides what to call, not how to do it.
Principle 04
Dynamic Discovery
MCP consumer discovers tools at runtime via tools/list. A2A consumer discovers capabilities via AgentCard. No hardcoded knowledge of remote surfaces.
Principle 05
Local-to-Cloud Parity
Same code runs locally (uvicorn), in Docker (compose), and on AgentCore (ECR). The /invocations bridge is the only cloud-specific addition.
Principle 06
Security by Default
SigV4 signing for all production calls. Secrets abstracted behind get_secret() with env-var fallback. .env files excluded from Git and Docker context.
Principle 07
Performance-Aware ML
Quantized ONNX models (int8, no PyTorch), background warmup, producer-consumer scraping, parallel ThreadPoolExecutors throughout the pipeline.
Principle 08
Progressive Infrastructure
Local env vars now, AWS Secrets Manager for production, Bedrock Memory service module prepared for future activation. Built to grow without rewrites.
Explore the Source
Full A2A server, AgentCard, executor, LangGraph pipeline, ONNX models, MCP tools, and deployment scripts.