Back / Technical Deep Dive
Technical Documentation

A2A Agent Orchestration
on AWS Bedrock AgentCore

A deep dive into building a production-grade A2A (Agent-to-Agent) research service — from AgentCard construction and Starlette server setup, through the ResearchAgentExecutor bridge, to a 3-node LangGraph pipeline with ONNX-based semantic dedup. Deployed on AWS Bedrock AgentCore with SigV4 authentication. Includes an MCP fashion tool server for protocol comparison.

A2A v1.0 MCP v1.0 Google ADK LangGraph AWS Bedrock ONNX Runtime Starlette SigV4
Contents
  1. Project Overview & Architecture
  2. AgentCard Construction
  3. A2A Server Architecture
  4. ResearchAgentExecutor & Task Lifecycle
  5. A2A Client & SigV4 Authentication
  6. A2A Protocol Deep Dive
  7. LangGraph Research Pipeline
  8. Semantic Dedup & ONNX Inference
  9. MCP Fashion Server (Protocol Comparison)
  10. Deployment & Infrastructure
  11. Tech Stack
  12. Design Principles

This project demonstrates how to build, deploy, and orchestrate agent systems using the A2A (Agent-to-Agent) v1.0 protocol on AWS Bedrock AgentCore. The primary focus is a fully layered A2A research service — from protocol-compliant AgentCard discovery through a LangGraph pipeline with quantized ONNX models — with an MCP fashion tool server included for protocol comparison.

What makes this different

System topology

USER { research query }
🤖 Google ADK Local Agent Orchestrator
🔐 A2A Client + AWS SigV4 Auth A2A
☁️ Bedrock AgentCore Runtime AWS
📡 A2A Starlette Server A2A v1.0
⚙️ ResearchAgentExecutor Bridge
🔗 LangChain Tool-Calling Agent Agent
🔬 LangGraph 3-Node Pipeline DAG
OUTPUT { research_summary artifact }

The AgentCard is the A2A protocol's discovery mechanism — a structured JSON document served at /.well-known/agent-card.json (RFC 8615) that describes the agent's capabilities, skills, supported input/output modes, and endpoint URL.

AgentSkill definition

AgentSkill( id="research", name="Deep Research", description="Search, scrape, deduplicate, and summarize web content on any topic.", tags=["research", "web", "summarization"], examples=[ "Explain quantum computing in simple terms", "What are the latest advances in renewable energy?", ], input_modes=["text/plain"], output_modes=["text/plain"], )

AgentCard assembly

AgentCard( name="Research Agent", description="LangGraph research pipeline: search -> scrape -> dedup -> summarize.", url=BASE_URL, # http://localhost:8080 version="1.0.0", default_input_modes=["text/plain"], default_output_modes=["text/plain"], capabilities=AgentCapabilities(streaming=False), skills=[skill], )
FieldValuePurpose
nameResearch AgentDisplay name for discovery UIs
urlBASE_URLEndpoint where the agent accepts JSON-RPC requests
version1.0.0Semantic versioning for capability negotiation
capabilitiesstreaming=FalseDeclares sync message/send support (no SSE streaming)
skills[research]List of AgentSkill objects with examples and tags
default_input_modes["text/plain"]MIME types the agent accepts
default_output_modes["text/plain"]MIME types the agent produces
Discovery is automatic. The A2AStarletteApplication from the a2a-sdk automatically intercepts GET requests to /.well-known/agent-card.json and returns the AgentCard as JSON. No manual route registration is needed — the SDK handles RFC 8615 compliance.

The server is built as a Starlette ASGI application using the official a2a-sdk, with three integration points: the SDK's request handler, a health endpoint, and an AgentCore compatibility bridge.

Server assembly

# 1. Wire executor + task store into the A2A request handler request_handler = DefaultRequestHandler( agent_executor=ResearchAgentExecutor(), task_store=InMemoryTaskStore(), ) # 2. Build the Starlette app with AgentCard + handler server = A2AStarletteApplication( agent_card=agent_card, http_handler=request_handler, ) app = server.build() # 3. Add health check + AgentCore /invocations bridge app.add_route("/ping", ping, methods=["GET"]) app.add_route("/invocations", server._handle_requests, methods=["POST"])

Routes exposed

EndpointMethodHandlerPurpose
/.well-known/agent-card.jsonGETA2AStarletteApplicationRFC 8615 discovery — returns AgentCard JSON
/POST_handle_requestsJSON-RPC 2.0 endpoint (message/send, message/stream)
/invocationsPOST_handle_requestsAgentCore compatibility — mirrors the JSON-RPC handler
/pingGETping()Health check for Docker/AgentCore probes
The /invocations bridge is key to Bedrock AgentCore. AgentCore routes POST /runtimes/{arn}/invocations to the container's /invocations path. By mirroring the JSON-RPC handler at both / and /invocations, the same code works locally and on AgentCore with zero changes.

DefaultRequestHandler responsibilities

The executor is the critical bridge between the A2A protocol layer and the LangChain agent. It implements the AgentExecutor interface from the a2a-sdk and manages the full task lifecycle through the EventQueue.

Execution sequence

Step 1 Create or retrieve Task from context.current_task, emit initial Task event to EventQueue
Step 2 Emit TaskStatusUpdateEvent with state=TaskState.working and final=False — signals processing started
Step 3 Extract user query from context.message.parts[0].root.text — A2A Message Part oneOf pattern
Step 4 Offload blocking LangChain agent to thread pool: await loop.run_in_executor(None, run_agent, query)
Step 5 Emit TaskArtifactUpdateEvent with new_text_artifact(name="research_summary", text=result) and last_chunk=True
Step 6 Emit final TaskStatusUpdateEvent with state=TaskState.completed and final=True
On Error Catch exception, emit TaskStatusUpdateEvent with state=TaskState.failed and error message

Method signature

class ResearchAgentExecutor(AgentExecutor): """Bridges the A2A v1.0 protocol to the LangChain research agent.""" async def execute( self, context: RequestContext, # task_id, context_id, message, current_task event_queue: EventQueue, # async queue for streaming events back ) -> None: async def cancel(self, context, event_queue) -> None: raise Exception("cancel not supported")

Event streaming pattern

The executor never returns directly to the client. Instead:

Thread pool bridging is essential. The LangChain agent is synchronous, but the A2A server is async (Starlette). The executor wraps the blocking run_agent() call in loop.run_in_executor() to avoid blocking the async event loop — a critical detail for production deployments.

The client supports three modes: discovery, synchronous message/send, and streaming message/stream. For production deployments, all requests are signed with AWS SigV4.

Discovery

def discover(base_url: str) -> dict: url = f"{base_url}/.well-known/agent-card.json" resp = httpx.get(url, timeout=30.0) return resp.json()

message/send — synchronous JSON-RPC request

payload = { "jsonrpc": "2.0", "id": str(uuid.uuid4()), "method": "message/send", "params": { "message": { "messageId": str(uuid.uuid4()), "role": "user", "parts": [{"kind": "text", "text": query}], } }, } resp = httpx.post(f"{base_url}/", json=payload, timeout=300)

message/stream — SSE streaming

with httpx.stream("POST", url, json=payload, headers={"Accept": "text/event-stream"} ) as resp: for line in resp.iter_lines(): if line.startswith("data:"): yield json.loads(line[5:].strip())

SigV4 signing for production

# AWS SigV4 authentication for Bedrock AgentCore _session = boto3.Session() _credentials = _session.get_credentials() aws_req = AWSRequest( method="POST", url=f"{DP_ENDPOINT}/runtimes/{AGENTCORE_ARN}/invocations?qualifier=DEFAULT", data=payload, headers={ "Content-Type": "application/json", "Accept": "application/json, text/event-stream", "X-Amzn-Bedrock-AgentCore-Runtime-Session-Id": session_id, }, ) SigV4Auth(_credentials.get_frozen_credentials(), "bedrock-agentcore", AWS_REGION).add_auth(aws_req)
Pragmatic protocol adaptation. In production, the A2A client constructs the AgentCard directly when the runtime ARN is already known — rather than assuming AgentCore will expose card discovery over GET. This avoids a brittle assumption while staying protocol-compliant.

The A2A v1.0 protocol defines a structured way for agents to discover each other, exchange messages, and manage task lifecycles. Here's how each concept is implemented.

Task state machine

Created
Working
Artifact
Completed
Working
Failed

Event types used

EventWhen EmittedKey Fields
Task Task created or retrieved task_id, context_id, message
TaskStatusUpdateEvent State transitions (working, completed, failed) state, final, message
TaskArtifactUpdateEvent Research results available artifact.name, artifact.parts, last_chunk

Message structure

# A2A Message uses the Part oneOf pattern message = { "messageId": "uuid-here", "role": "user", # or "agent" "parts": [ {"kind": "text", "text": "Research quantum computing"} ], "contextId": "ctx-uuid", # optional, for multi-turn } # Artifact structure artifact = new_text_artifact( name="research_summary", # identifier text=result_string # content )
JSON-RPC 2.0 throughout. Every A2A request follows the JSON-RPC 2.0 spec with jsonrpc, id, method, and params fields. Methods include message/send (synchronous) and message/stream (SSE). The SDK handles serialization, deserialization, and error responses.

LangGraph Research Pipeline

The core pipeline is a 3-node LangGraph StateGraph with Pydantic-typed state (ResearchState). Each node has a single responsibility.

Graph construction

g = StateGraph(ResearchState) g.add_node("research", research.run) g.add_node("dedup", dedup.run) g.add_node("summarizer", summarizer.run) g.set_entry_point("research") g.add_edge("research", "dedup") g.add_edge("dedup", "summarizer") g.add_edge("summarizer", END) compiled_graph = g.compile()

Pipeline state model

class ResearchState(BaseModel): user_query: str # Raw user query string topic: str # Extracted research topic desired_length: int # Target word count (default: 500) raw_research_data: str # Assembled corpus from web scraping ranked_chunks: List[Dict] # Top-k deduplicated, reranked chunks research_output: str # Final summarized research last_error: Optional[str] # Latest error message

Research Node

Summarizer Node

Semantic Dedup & ONNX Inference

The dedup node is where ML meets engineering — quantized ONNX models running on CPU without PyTorch, with FAISS-based deduplication and CrossEncoder reranking.

🧠
Embedding Model
all-MiniLM-L6-v2 quantized to int8 ONNX. 384-dim embeddings via onnxruntime CPUExecutionProvider. Pre-bundled in Docker image.
🎯
Reranker Model
cross-encoder/ms-marco-MiniLM-L-6-v2 quantized to int8 ONNX. Scores every chunk against the research topic for relevance.
📊
FAISS Deduplication
IndexFlatIP for cosine similarity. Threshold 0.85 removes near-duplicates while preserving diverse perspectives. Prefers longer chunks.
✂️
Chunking
RecursiveCharacterTextSplitter: 500-word chunks, 100-word overlap, min 50 words. Returns top 25 chunks after reranking.

Processing pipeline

# In-memory dedup pipeline raw_corpus (str) → chunks (List[Dict]) # 500-word overlapping segments → embeddings (np.ndarray) # ONNX all-MiniLM-L6-v2 (int8) → faiss_index (IndexFlatIP) # Cosine similarity, threshold 0.85 → unique_chunks (List) # Near-duplicates removed → ranked_chunks (List) # CrossEncoder reranking vs topic → state["ranked_chunks"] # Top 25 with relevance scores

Performance optimizations

1️⃣
Background Model Warmup
Research node kicks off ONNX model loading in a separate thread while scraping runs. Both embedder and reranker are warm by the time dedup starts — saves ~8 seconds of cold I/O.
2️⃣
Producer-Consumer Scraping
Search and scrape tasks are decoupled. URLs enter the scrape pool as each search completes — scraping starts before all searches finish.
3️⃣
Background Reranker
Dedup node starts loading CrossEncoder in background while embeddings compute. Reranker is ready by the time it's needed.
No PyTorch, no GPU required. Both models are pre-exported to quantized ONNX and bundled in the Docker image. This eliminates ~2 GB of PyTorch dependencies, enables CPU-only containers, and cuts cold-start time dramatically.

MCP Fashion Server (Protocol Comparison)

The MCP track demonstrates the tool-provider pattern alongside A2A — showing when to use each protocol. MCP is for tools; A2A is for autonomous agents.

A2A When to use
Remote side is an autonomous agent with task lifecycle, status transitions, and structured artifacts.
MCP When to use
Remote side is a tool provider — capabilities are listed, discovered, and called individually via JSON-RPC.

MCP tools exposed

ToolPurpose
get_product_tagsLook up product attributes and tags from catalog
search_catalogSearch products by query, gender, category, price
generate_descriptionLLM-powered marketing copy generation

Dynamic tool discovery

The local fashion agent doesn't hardcode tool wrappers. It calls tools/list, parses JSON schemas, and dynamically builds Google ADK FunctionTool wrappers at runtime — adapting to whatever tools are deployed.

# Dynamic MCP tool wrapping for tool in mcp_client.tools_list(): schema = tool["inputSchema"] params = build_params_from_schema(schema) adk_tool = FunctionTool(func=make_handler(tool["name"])) # Agent can now call this tool naturally

Deployment & Infrastructure

The same code runs in three environments with configuration changes only.

💻
Local Development
uvicorn on localhost:8080, .env files, direct HTTP. Fastest iteration.
🐳
Docker Container
Multi-stage build with uv (14 parallel installs), ARM64-native, pre-compiled bytecode, ONNX models bundled.
☁️
Bedrock AgentCore
ECR-hosted container, managed runtime, SigV4 auth, CloudWatch logging. deploy.sh automates build → push → update.
🔒
Secrets Management
Dual-mode: env vars locally, AWS Secrets Manager in production. get_secret() abstracts both behind a single interface.

Deployment pipeline

Source + ONNX
Docker Build
Amazon ECR
AgentCore Runtime
CloudWatch

Tech Stack

Protocol
A2A v1.0
AgentCard discovery, JSON-RPC 2.0, task lifecycle, event streaming
Protocol
MCP v1.0
Tool interoperability with dynamic discovery and SSE transport
Orchestrator
Google ADK
Local agents with LiteLLM, tool calling, and delegation
Pipeline
LangGraph
3-node deterministic DAG with Pydantic typed state
A2A Server
Starlette + a2a-sdk
ASGI app with automatic AgentCard serving and /invocations bridge
MCP Server
FastMCP
Fashion tool registration with input schemas
Inference
ONNX Runtime
Quantized int8 embedding + reranking (no PyTorch)
Vector Index
FAISS
In-memory cosine similarity for semantic dedup
LLM
OpenAI
Research (nano) and summarization (mini) with cost tracking
Search
Google CSE + Tavily
Primary search with automatic fallback
Auth
AWS SigV4
botocore-based request signing for all remote calls
Cloud
Bedrock AgentCore
Managed runtime, ECR, Secrets Manager, CloudWatch

Design Principles

Principle 01
Protocol-First Architecture
A2A shapes every layer — AgentCard discovery, typed requests via DefaultRequestHandler, task lifecycle through EventQueue. Protocol isn't bolted on; it's the foundation.
Principle 02
Five-Layer Separation
Server (Starlette) → Handler (a2a-sdk) → Executor (bridge) → Agent (LangChain) → Pipeline (LangGraph). Each layer can be tested and replaced independently.
Principle 03
Thin Orchestrators
Local ADK agents are lightweight coordinators. All domain logic lives in remote services. The orchestrator decides what to call, not how to do it.
Principle 04
Dynamic Discovery
MCP consumer discovers tools at runtime via tools/list. A2A consumer discovers capabilities via AgentCard. No hardcoded knowledge of remote surfaces.
Principle 05
Local-to-Cloud Parity
Same code runs locally (uvicorn), in Docker (compose), and on AgentCore (ECR). The /invocations bridge is the only cloud-specific addition.
Principle 06
Security by Default
SigV4 signing for all production calls. Secrets abstracted behind get_secret() with env-var fallback. .env files excluded from Git and Docker context.
Principle 07
Performance-Aware ML
Quantized ONNX models (int8, no PyTorch), background warmup, producer-consumer scraping, parallel ThreadPoolExecutors throughout the pipeline.
Principle 08
Progressive Infrastructure
Local env vars now, AWS Secrets Manager for production, Bedrock Memory service module prepared for future activation. Built to grow without rewrites.

Explore the Source

Full A2A server, AgentCard, executor, LangGraph pipeline, ONNX models, MCP tools, and deployment scripts.

View on GitHub