MCP Agent Attestation Extension
MCP authenticates users but not agents. When a server receives a connection claiming to be Claude, there's no cryptographic proof. I identified the gap, designed an attestation protocol extension, and built working implementations in Python and TypeScript.
The problem
MCP handles user authorization via OAuth 2.1 but has no agent identity verification. A server can't distinguish a legitimate Claude instance from a compromised variant, can't verify who deployed the agent, and can't confirm the provenance chain is intact. As agents gain real privileges — file access, code execution, API calls — this becomes a security liability, not a theoretical concern.
Two recent systems show what this looks like in practice. Moltbook launched in January 2026 as a social network exclusively for AI agents — 1.65 million registered agents in the first week. The entire trust model was requesting an API key and posting a code on X. Wiz found the Supabase backend was misconfigured with no Row Level Security, exposing 1.5 million API tokens and 35,000 email addresses with full read/write access. The "1.5 million agents" turned out to be 17,000 humans averaging 88 bots each, with nothing preventing anyone from spawning millions more in a loop. Researchers found 2.6% of posts contained hidden prompt injection payloads — agents instructing other agents to leak credentials, delete their own accounts, run crypto pump schemes. Because agents fetch remote skill files and execute them, a single compromised post could propagate across the network. Time-shifted attacks were documented: payloads stored in agent memory that activated days later when conditions aligned.
Coinbase's x402 protocol is the payment side of the same gap. It lets agents make autonomous micropayments over HTTP — 75M+ transactions, backed by Cloudflare, Vercel, and Google. But x402 explicitly doesn't handle identity. The payment proves wallet authority, not agent legitimacy. A compromised agent making unauthorized payments looks identical to a legitimate one.
Both expose the same structural problem. The ecosystem is building execution capabilities — communication, payments — faster than governance capabilities. Identity is the missing layer.
What I built
Protocol extension. JWT-based attestation tokens with Ed25519 signatures, delivered via MCP's experimental capability field. No core protocol changes required. SPIFFE-compatible subject identifiers for enterprise interoperability. JWKS key distribution so servers verify tokens independently.
Dual implementation. Python (140 tests) and TypeScript (68 tests) libraries with full MCP SDK integration — AttestingClientSession injects tokens during handshake, AttestingServer verifies them with configurable policy enforcement (required/preferred/optional). Production concerns addressed: Redis-backed distributed replay cache, circuit breakers on JWKS fetching, Prometheus-compatible observability.
Attack harness. 8 vectors covering model spoofing, provenance forgery, token replay, tampering, issuer typosquatting, downgrade attacks, audience mismatch, and safety level manipulation. 100% detection rate across both implementations.
Spec and threat analysis. Full technical specification, security audit, and a threat analysis that's honest about what attestation doesn't solve — behavioral verification, key compromise detection, malicious operators. The limitations section matters more than the features list.
What this solves and what it doesn't
Attestation gives you connection-time identity verification. A Moltbook server running this could have verified that an agent claiming to be Claude was actually an attested Claude instance — which would have caught the 88:1 bot farms and the agents with no provenance chain. An x402 service could layer attestation on top of payment verification — knowing not just that a wallet paid, but that the agent behind it is who it claims to be.
But attestation wouldn't have stopped the prompt injection attacks. An agent could pass attestation at connection time and then have its behavior modified by reading a malicious post — attestation confirms identity, not runtime behavior. The time-shifted attacks on Moltbook, where payloads sat in memory for days before activating, operate entirely below the identity layer. Attestation is one layer of a governance stack that doesn't fully exist yet.
And none of it matters unless model providers actually participate. The attestation protocol requires inference providers — Anthropic, OpenAI, Google — to sign tokens attesting that a given agent is running their model through their infrastructure. Without that signature, there's nothing to verify. This is the cold-start problem at the center of the design.
What makes it interesting is that providers have real incentive to do this. Attestation is a moat. If Anthropic signs attestation tokens and OpenAI doesn't, any platform that cares about agent identity — enterprise MCP deployments, regulated x402 payment flows, anything with compliance requirements — has a reason to prefer Anthropic-backed agents. The signature becomes a trust signal that differentiates providers in a market where model quality is converging. It's the same logic as code signing certificates or verified app stores: the provider who establishes the trust infrastructure captures the ecosystem lock-in.
The flip side is that providers also have reasons to hesitate. Signing attestation tokens creates liability — if an attested Claude instance gets prompt-injected and does damage, the attestation signature becomes evidence that Anthropic vouched for it. There's a scope question providers have to answer: are you attesting to identity (this is Claude), capability (this Claude instance can do X), or behavior (this Claude instance is acting within bounds)? Each level carries different risk. The protocol as designed only attests to identity, which is the safest scope for providers to adopt, but the pressure to expand scope will come quickly once the infrastructure exists.
What I learned
The interesting question isn't implementation, it's scope. Ed25519 and JWTs are solved problems. The hard part is deciding what you're attesting to. Identity at connection time is useful but insufficient — it says nothing about behavior during the session. Moltbook proved this: even if every agent had been attested, the prompt injection and context poisoning attacks would have still worked.
Attack harnesses are better than threat models. Writing the 8-vector harness exposed edge cases — clock skew interactions with replay windows, Redis failover creating distributed replay gaps — that I wouldn't have found from the defensive side alone. The harness is the real design document. Moltbook itself was an unintentional attack harness at scale — 506 documented injection attacks surfacing failure modes that controlled testing wouldn't.
Protocol timing matters. MCP is young enough that the experimental field exists specifically for extensions like this. Moltbook went from zero to 1.65 million agents in a week. x402 has processed 75 million transactions. Governance infrastructure is lagging everywhere. Contributing research now has more leverage than contributing code later. The provider incentive alignment — attestation as competitive moat — is strongest during the current land-grab phase where ecosystem norms are still forming.
Built with