# AI Demos for AI Agents
> Your agent doesn't need to scrape another listicle. AI Demos serves AI-tool
> evaluations as structured, sourced evidence — over MCP.
**MCP endpoint:** `https://mcp.aidemos.com/api/mcp` — 16 tools · read-only · no API key · protocol 2025-06-18
Most AI-tool content is written for humans to read, so an agent has to scrape the page, summarize the
prose, and infer a conclusion it can't check. AI Demos exposes its evaluations as **structured
intelligence** instead: your agent queries the actual observations, scores, evidence, and artifacts
directly — sourced, reproducible answers it can reason over, not paragraphs it has to re-summarize and
hope it got right.
## AI Demos stores observations, not pages
We're building a structured intelligence platform for AI tools. Instead of publishing research only as
web pages, we store every evaluation as a structured **observation** — a tool tested on a specific
scenario, against a specific criterion, with the real input and captured output attached — that humans,
LLMs, and agents can all query.
- **Traditional review sites publish pages. AI Demos stores observations.**
- One observation can power every surface: a ranking page, a tool page, a use-case recommendation, a
comparison, a Markdown twin, and an MCP response — one source of truth, many views over it.
- Built for both humans and machines from the same source of truth: the same evidence powers the
human-readable research **and** the machine-readable interfaces.
## Why agents should use AI Demos
Most "best AI tools for X" content is prose written to rank on Google. An LLM that consumes it inherits
its problems: claims with no provenance, stale tests, marketing language, and no way to tell a measured
result from an opinion. AI Demos is the opposite by construction:
- **Evidence, not assertions.** Every score and recommendation traces back to an observation — a tool
tested on a specific scenario against a specific criterion, with the real input and output artifact
captured. Your agent can pull that proof, not just the conclusion.
- **Structured, not scraped.** You query typed fields — pricing, scores, verdicts, criteria,
relationships — through function calls. No HTML parsing, no guessing which `
` holds the price.
- **Honest comparisons, enforced structurally.** Same-input head-to-head results are flagged as such
and kept separate from "comparable but tested on different inputs." Nothing is compared that wasn't
actually tested together.
- **Re-tested on a cadence.** Observations are dated. The substrate is re-run over time, so freshness
is a property of the data — not a publish date you have to trust.
The result: an agent grounded on AI Demos can cite a verdict **and** the artifact behind it — and a
downstream user can verify it.
## Three ways to consume
1. **MCP server** — connect any MCP-compatible client (Claude Code, Claude Desktop, Cursor, or your own
agent) over Streamable HTTP and query the catalogue and the evidence directly.
`https://mcp.aidemos.com/api/mcp` — 16 tools, no API key, read-only.
2. **Markdown twins** — every published page has a clean `.md` twin: same content, stripped of nav and
markup, ready for a context window. Use it when you want the narrative.
e.g. `https://aidemos.com/tools/llamaparse.md`
3. **Structured evidence model** — underneath both, AI Demos stores observations, not pages. Every cell
is a `tool × scenario × criterion` (with `verdict`, `score`, `artifact`, `tested_at`). Pages are
views, MCP verbs are queries, over the same source of truth.
## The 16 MCP tools
The server exposes its tools in three layers — from listing every published page, to fetching a full
structured-plus-Markdown envelope, to querying the observation cells the rest of the web can't give you.
**Discovery (enumerate & traverse)** — paged lists of every published page, the taxonomy in use, and
graph traversal; each result carries `id`, `slug`, and a full `url`:
`list_tools`, `list_rankings`, `list_use_cases`, `list_compares`, `list_toolkits`, `list_personas`,
`list_categories`, `search`, `tools_in_ranking`, `rankings_for_tool`, `get_persona`
**Detail (JSON + Markdown)** — the full page as a structured-JSON + Markdown-content envelope: identity,
pricing, per-feature scores, fit, FAQ and relationships as JSON; editorial prose as clean Markdown. An
optional `fields` projection controls token cost:
`get_tool`, `get_ranking`, `get_use_case`
**Evidence graph (the part you can't scrape)** — query the observation cells directly (filter by
tool(s), scenario, criterion, verdict or evidence state), or get an evidence-aligned, honesty-enforced
comparison of two tools:
`get_evidence`, `compare_tools`
## What `get_evidence` returns
A call returns the observation cells — one per `(tool × scenario × criterion)`. Your agent gets the
answer **and** can show its work.
```json
// get_evidence({ tool: "llamaparse", criterion: "table extraction" })
{
"tool": {
"name": "LlamaParse",
"slug": "llamaparse",
"url": "https://aidemos.com/tools/llamaparse"
},
"scenario": {
"name": "Scanned research paper with tables",
"group_tag": "scanned-research-paper"
},
"criterion": { "name": "Table extraction" },
"verdict": "worked", // worked | mixed | struggled | failed
"score": 4,
"evidence_state": "verified", // verified | observed | scored-only
"note": "Reconstructed the multi-row header correctly; merged cells preserved.",
"tested_at": "2026-05-22",
"artifacts": [
{
"url": "https://.../input.png",
"role": "input",
"caption": "Source page"
},
{
"url": "https://.../output.png",
"role": "output",
"caption": "Parsed table"
}
]
}
```
`evidence_state` tells your agent exactly how strong each cell is: **verified** = artifact-backed ·
**observed** = noted without an artifact · **scored-only** = a number only. It can answer **and** hand a
user the real input/output screenshots behind the call.
## Markdown twins — fetch the clean `.md` of any page, no client needed
1. **Append `.md` to the URL.** `https://aidemos.com/tools/llamaparse` →
`https://aidemos.com/tools/llamaparse.md`. Works for tool pages (`/tools/…`), ranking pages
(`/best/…`), use-case pages (`/use-cases/…`), and comparisons (`/compare/…`).
2. **Or content-negotiate.** Send `Accept: text/markdown` to the canonical URL and get the twin back.
The HTML page advertises it with a `Link: rel="alternate"; type="text/markdown"` header, so crawlers
and agents can discover it.
3. **When to use which.** Twins when you want the narrative — the full review or how-to as text. MCP
when you want structured fields and evidence.
## Get started — one MCP connection, or one curl
**1. Add the MCP server to your client (Claude Code):**
```
$ claude mcp add --transport http aidemos https://mcp.aidemos.com/api/mcp
```
Claude Desktop / Cursor / any `mcpServers` config:
```json
"mcpServers": {
"aidemos": { "type": "http", "url": "https://mcp.aidemos.com/api/mcp" }
}
```
**2. Or call it from your own agent (MCP SDK over Streamable HTTP)** — TypeScript or Python, or plain
JSON-RPC 2.0 over HTTP POST (`initialize` → `tools/list` → `tools/call`). Each call returns its result
as a JSON string in `content[0].text`:
```ts
import { StreamableHTTPClientTransport } from '…/client/streamableHttp.js'
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
const transport = new StreamableHTTPClientTransport(
new URL('https://mcp.aidemos.com/api/mcp')
)
const client = new Client({ name: 'my-agent', version: '1.0.0' })
await client.connect(transport)
const res = await client.callTool({
name: 'get_evidence',
arguments: { tool: 'llamaparse', criterion: 'table extraction' },
})
const evidence = JSON.parse(res.content[0].text)
```
**3. Or just fetch a Markdown twin — no client needed:**
```
$ curl https://aidemos.com/best/resume-parsing-api.md
# or, by content negotiation:
$ curl -H 'Accept: text/markdown' https://aidemos.com/tools/llamaparse
```
---
**Full developer guide:** https://aidemos.com/docs/mcp.md ·
**Browse the catalogue:** https://aidemos.com/tools ·
**LLM index:** https://aidemos.com/llms.txt
Stop teaching your agent to read listicles. Point it at evidence it can trace, compare, and verify —
one MCP connection away.