Your agent doesn’t need to scrape another listicle.
AI Demos serves AI-tool evaluations as structured, sourced evidence — over MCP.
Most AI-tool content is written for humans to read — so an agent has to scrape the page, summarize the prose, and infer a conclusion it can’t check. AI Demos exposes its evaluations as structured intelligence instead: your agent queries the actual observations, scores, evidence, and artifacts directly — sourced, reproducible answers it can reason over, not paragraphs it has to re-summarize and hope it got right.
AI Demos stores observations, not pages.
We’re building a structured intelligence platform for AI tools. Instead of publishing research only as web pages, we store every evaluation as a structured observation — a tool tested on a specific scenario, against a specific criterion, with the real input and captured output attached — that humans, LLMs, and agents can all query.
Pages, rankings, comparisons, use cases, MCP responses, and future APIs are all generated from the same underlying observations and evidence — one source of truth, many views over it.
Extract the evidence once. Expose it many ways.
A single observation can power every surface below — so we capture the evidence once instead of recreating the same knowledge in article after article.
Built for both humans and machines from the same source of truth — the advantage isn’t that we ignore humans, it’s that the same evidence powers the human-readable research and the machine-readable interfaces. MCP, the evidence graph, and Markdown twins are the surfaces over that evidence — below is how your agent reaches them.
Most “best AI tools for X” content is prose written to rank on Google.
An LLM that consumes it inherits its problems: claims with no provenance, stale tests, marketing language, and no way to tell a measured result from an opinion. AI Demos is the opposite by construction.
Evidence, not assertions
Every score and recommendation traces back to an observation — a tool tested on a specific scenario against a specific criterion, with the real input and output artifact captured. Your agent can pull that proof, not just the conclusion.
Structured, not scraped
You query typed fields — pricing, scores, verdicts, criteria, relationships — through function calls. No HTML parsing, no guessing which <div> holds the price.
Honest comparisons, enforced structurally
Same-input head-to-head results are flagged as such and kept separate from "comparable but tested on different inputs." Nothing is compared that wasn’t actually tested together.
Re-tested on a cadence
Observations are dated. The substrate is re-run over time, so freshness is a property of the data — not a publish date you have to trust.
The result: an agent grounded on AI Demos can cite a verdict and the artifact behind it — and a downstream user can verify it.
Pick the surface that fits how your agent thinks.
Structured fields and evidence over MCP, clean narrative as Markdown twins, and one evidence model underneath them both.
MCP server
Connect any MCP-compatible client — Claude Code, Claude Desktop, Cursor, or your own agent — over Streamable HTTP and query the catalogue and the evidence directly.
https://mcp.aidemos.com/api/mcpMarkdown twins
Every published page has a clean .md twin — same content, stripped of nav and markup, ready for a context window. Use it when you want the narrative.
/tools/llamaparse.mdStructured evidence model
Underneath both: AI Demos stores observations, not pages. Every cell is a tool × scenario × criterion — pages are views, MCP verbs are queries, over the same source of truth.
Enumerate, fetch detail, then reach the ground truth.
The server exposes its tools in three layers — from listing every published page, to fetching a full structured-plus-Markdown envelope, to querying the observation cells the rest of the web can’t give you.
Discovery
enumerate & traversePaged lists of every published page, the taxonomy in use, and graph traversal — each result carries id, slug, and a full url.
Detail
JSON + MarkdownThe full page as a structured-JSON + Markdown-content envelope: identity, pricing, per-feature scores, fit, FAQ and relationships as JSON; editorial prose as clean Markdown. Optional fields projection controls token cost.
Evidence graph
the part you can’t scrapeQuery the observation cells directly — filter by tool(s), scenario, criterion, verdict or evidence state — or get an evidence-aligned, honesty-enforced comparison of two tools.
The verdict, the score — and the URLs of the screenshots that prove it.
A call returns the observation cells — one per (tool × scenario × criterion). Your agent gets the answer and can show its work.
// get_evidence({ tool: "llamaparse", criterion: "table extraction" }) { "tool": { "name": "LlamaParse", "slug": "llamaparse", "url": "https://aidemos.com/tools/llamaparse" }, "scenario": { "name": "Scanned research paper with tables", "group_tag": "scanned-research-paper" }, "criterion": { "name": "Table extraction" }, "verdict": "worked", // worked | mixed | struggled | failed "score": 4, "evidence_state": "verified", // verified | observed | scored-only "note": "Reconstructed the multi-row header correctly; merged cells preserved.", "tested_at": "2026-05-22", "artifacts": [ { "url": "https://.../input.png", "role": "input", "caption": "Source page" }, { "url": "https://.../output.png", "role": "output", "caption": "Parsed table" } ] }
Fetch the clean .md of any page — no client needed.
Append .md to the URL
https://aidemos.com/tools/llamaparse → …/tools/llamaparse.md. Works today for tool pages (/tools/…), ranking pages (/best/…) and use-case pages (/use-cases/…).
Or content-negotiate
Send Accept: text/markdown to the canonical URL and get the twin back. The HTML page advertises it with a Link: rel="alternate"; type="text/markdown" header, so crawlers and agents can discover it.
When to use which
Twins when you want the narrative — the full review or how-to as text. MCP when you want structured fields and evidence.
https://aidemos.com/tools/llamaparse
# LlamaParse
## Our take
Strong on complex tables and scanned
documents; the multi-row header
reconstruction held up across our…
## Pricing
Free tier: 1,000 pages/day…
One MCP connection, or one curl.
Add the MCP server to your client — Claude Code
$ claude mcp add --transport http aidemos https://mcp.aidemos.com/api/mcp
Claude Desktop / Cursor / any mcpServers config:
"mcpServers": { "aidemos": { "type": "http", "url": "https://mcp.aidemos.com/api/mcp" } }
Or call it from your own agent — MCP SDK over Streamable HTTP
TypeScript or Python — or plain JSON-RPC 2.0 over HTTP POST (initialize → tools/list → tools/call). Each call returns its result as a JSON string in content[0].text.
import { Client } from '@modelcontextprotocol/sdk/client/index.js' import { StreamableHTTPClientTransport } from '…/client/streamableHttp.js' const transport = new StreamableHTTPClientTransport(new URL('https://mcp.aidemos.com/api/mcp')) const client = new Client({ name: 'my-agent', version: '1.0.0' }) await client.connect(transport) const res = await client.callTool({ name: 'get_evidence', arguments: { tool: 'llamaparse', criterion: 'table extraction' }, }) const evidence = JSON.parse(res.content[0].text)
Or just fetch a Markdown twin — no client needed
$ curl https://aidemos.com/best/resume-parsing-api.md # or, by content negotiation: $ curl -H 'Accept: text/markdown' https://aidemos.com/tools/llamaparse
Stop teaching your agent to read listicles.
Point it at evidence it can trace, compare, and verify — one MCP connection away.