# AI Demos for AI Agents

> Your agent doesn't need to scrape another listicle. AI Demos serves AI-tool
> evaluations as structured, sourced evidence — over MCP.

**MCP endpoint:** `https://mcp.aidemos.com/api/mcp` — 16 tools · read-only · no API key · protocol 2025-06-18

Most AI-tool content is written for humans to read, so an agent has to scrape the page, summarize the
prose, and infer a conclusion it can't check. AI Demos exposes its evaluations as **structured
intelligence** instead: your agent queries the actual observations, scores, evidence, and artifacts
directly — sourced, reproducible answers it can reason over, not paragraphs it has to re-summarize and
hope it got right.

## AI Demos stores observations, not pages

We're building a structured intelligence platform for AI tools. Instead of publishing research only as
web pages, we store every evaluation as a structured **observation** — a tool tested on a specific
scenario, against a specific criterion, with the real input and captured output attached — that humans,
LLMs, and agents can all query.

- **Traditional review sites publish pages. AI Demos stores observations.**
- One observation can power every surface: a ranking page, a tool page, a use-case recommendation, a
  comparison, a Markdown twin, and an MCP response — one source of truth, many views over it.
- Built for both humans and machines from the same source of truth: the same evidence powers the
  human-readable research **and** the machine-readable interfaces.

## Why agents should use AI Demos

Most "best AI tools for X" content is prose written to rank on Google. An LLM that consumes it inherits
its problems: claims with no provenance, stale tests, marketing language, and no way to tell a measured
result from an opinion. AI Demos is the opposite by construction:

- **Evidence, not assertions.** Every score and recommendation traces back to an observation — a tool
  tested on a specific scenario against a specific criterion, with the real input and output artifact
  captured. Your agent can pull that proof, not just the conclusion.
- **Structured, not scraped.** You query typed fields — pricing, scores, verdicts, criteria,
  relationships — through function calls. No HTML parsing, no guessing which `<div>` holds the price.
- **Honest comparisons, enforced structurally.** Same-input head-to-head results are flagged as such
  and kept separate from "comparable but tested on different inputs." Nothing is compared that wasn't
  actually tested together.
- **Re-tested on a cadence.** Observations are dated. The substrate is re-run over time, so freshness
  is a property of the data — not a publish date you have to trust.

The result: an agent grounded on AI Demos can cite a verdict **and** the artifact behind it — and a
downstream user can verify it.

## Three ways to consume

1. **MCP server** — connect any MCP-compatible client (Claude Code, Claude Desktop, Cursor, or your own
   agent) over Streamable HTTP and query the catalogue and the evidence directly.
   `https://mcp.aidemos.com/api/mcp` — 16 tools, no API key, read-only.
2. **Markdown twins** — every published page has a clean `.md` twin: same content, stripped of nav and
   markup, ready for a context window. Use it when you want the narrative.
   e.g. `https://aidemos.com/tools/llamaparse.md`
3. **Structured evidence model** — underneath both, AI Demos stores observations, not pages. Every cell
   is a `tool × scenario × criterion` (with `verdict`, `score`, `artifact`, `tested_at`). Pages are
   views, MCP verbs are queries, over the same source of truth.

## The 16 MCP tools

The server exposes its tools in three layers — from listing every published page, to fetching a full
structured-plus-Markdown envelope, to querying the observation cells the rest of the web can't give you.

**Discovery (enumerate & traverse)** — paged lists of every published page, the taxonomy in use, and
graph traversal; each result carries `id`, `slug`, and a full `url`:
`list_tools`, `list_rankings`, `list_use_cases`, `list_compares`, `list_toolkits`, `list_personas`,
`list_categories`, `search`, `tools_in_ranking`, `rankings_for_tool`, `get_persona`

**Detail (JSON + Markdown)** — the full page as a structured-JSON + Markdown-content envelope: identity,
pricing, per-feature scores, fit, FAQ and relationships as JSON; editorial prose as clean Markdown. An
optional `fields` projection controls token cost:
`get_tool`, `get_ranking`, `get_use_case`

**Evidence graph (the part you can't scrape)** — query the observation cells directly (filter by
tool(s), scenario, criterion, verdict or evidence state), or get an evidence-aligned, honesty-enforced
comparison of two tools:
`get_evidence`, `compare_tools`

## What `get_evidence` returns

A call returns the observation cells — one per `(tool × scenario × criterion)`. Your agent gets the
answer **and** can show its work.

```json
// get_evidence({ tool: "llamaparse", criterion: "table extraction" })
{
  "tool": {
    "name": "LlamaParse",
    "slug": "llamaparse",
    "url": "https://aidemos.com/tools/llamaparse"
  },
  "scenario": {
    "name": "Scanned research paper with tables",
    "group_tag": "scanned-research-paper"
  },
  "criterion": { "name": "Table extraction" },
  "verdict": "worked", // worked | mixed | struggled | failed
  "score": 4,
  "evidence_state": "verified", // verified | observed | scored-only
  "note": "Reconstructed the multi-row header correctly; merged cells preserved.",
  "tested_at": "2026-05-22",
  "artifacts": [
    {
      "url": "https://.../input.png",
      "role": "input",
      "caption": "Source page"
    },
    {
      "url": "https://.../output.png",
      "role": "output",
      "caption": "Parsed table"
    }
  ]
}
```

`evidence_state` tells your agent exactly how strong each cell is: **verified** = artifact-backed ·
**observed** = noted without an artifact · **scored-only** = a number only. It can answer **and** hand a
user the real input/output screenshots behind the call.

## Markdown twins — fetch the clean `.md` of any page, no client needed

1. **Append `.md` to the URL.** `https://aidemos.com/tools/llamaparse` →
   `https://aidemos.com/tools/llamaparse.md`. Works for tool pages (`/tools/…`), ranking pages
   (`/best/…`), use-case pages (`/use-cases/…`), and comparisons (`/compare/…`).
2. **Or content-negotiate.** Send `Accept: text/markdown` to the canonical URL and get the twin back.
   The HTML page advertises it with a `Link: rel="alternate"; type="text/markdown"` header, so crawlers
   and agents can discover it.
3. **When to use which.** Twins when you want the narrative — the full review or how-to as text. MCP
   when you want structured fields and evidence.

## Get started — one MCP connection, or one curl

**1. Add the MCP server to your client (Claude Code):**

```
$ claude mcp add --transport http aidemos https://mcp.aidemos.com/api/mcp
```

Claude Desktop / Cursor / any `mcpServers` config:

```json
"mcpServers": {
  "aidemos": { "type": "http", "url": "https://mcp.aidemos.com/api/mcp" }
}
```

**2. Or call it from your own agent (MCP SDK over Streamable HTTP)** — TypeScript or Python, or plain
JSON-RPC 2.0 over HTTP POST (`initialize` → `tools/list` → `tools/call`). Each call returns its result
as a JSON string in `content[0].text`:

```ts
import { StreamableHTTPClientTransport } from '…/client/streamableHttp.js'
import { Client } from '@modelcontextprotocol/sdk/client/index.js'

const transport = new StreamableHTTPClientTransport(
  new URL('https://mcp.aidemos.com/api/mcp')
)
const client = new Client({ name: 'my-agent', version: '1.0.0' })
await client.connect(transport)

const res = await client.callTool({
  name: 'get_evidence',
  arguments: { tool: 'llamaparse', criterion: 'table extraction' },
})
const evidence = JSON.parse(res.content[0].text)
```

**3. Or just fetch a Markdown twin — no client needed:**

```
$ curl https://aidemos.com/best/resume-parsing-api.md
# or, by content negotiation:
$ curl -H 'Accept: text/markdown' https://aidemos.com/tools/llamaparse
```

---

**Full developer guide:** https://aidemos.com/docs/mcp.md ·
**Browse the catalogue:** https://aidemos.com/tools ·
**LLM index:** https://aidemos.com/llms.txt

Stop teaching your agent to read listicles. Point it at evidence it can trace, compare, and verify —
one MCP connection away.