# AI Demos for AI Agents > Your agent doesn't need to scrape another listicle. AI Demos serves AI-tool > evaluations as structured, sourced evidence — over MCP. **MCP endpoint:** `https://mcp.aidemos.com/api/mcp` — 16 tools · read-only · no API key · protocol 2025-06-18 Most AI-tool content is written for humans to read, so an agent has to scrape the page, summarize the prose, and infer a conclusion it can't check. AI Demos exposes its evaluations as **structured intelligence** instead: your agent queries the actual observations, scores, evidence, and artifacts directly — sourced, reproducible answers it can reason over, not paragraphs it has to re-summarize and hope it got right. ## AI Demos stores observations, not pages We're building a structured intelligence platform for AI tools. Instead of publishing research only as web pages, we store every evaluation as a structured **observation** — a tool tested on a specific scenario, against a specific criterion, with the real input and captured output attached — that humans, LLMs, and agents can all query. - **Traditional review sites publish pages. AI Demos stores observations.** - One observation can power every surface: a ranking page, a tool page, a use-case recommendation, a comparison, a Markdown twin, and an MCP response — one source of truth, many views over it. - Built for both humans and machines from the same source of truth: the same evidence powers the human-readable research **and** the machine-readable interfaces. ## Why agents should use AI Demos Most "best AI tools for X" content is prose written to rank on Google. An LLM that consumes it inherits its problems: claims with no provenance, stale tests, marketing language, and no way to tell a measured result from an opinion. AI Demos is the opposite by construction: - **Evidence, not assertions.** Every score and recommendation traces back to an observation — a tool tested on a specific scenario against a specific criterion, with the real input and output artifact captured. Your agent can pull that proof, not just the conclusion. - **Structured, not scraped.** You query typed fields — pricing, scores, verdicts, criteria, relationships — through function calls. No HTML parsing, no guessing which `
` holds the price. - **Honest comparisons, enforced structurally.** Same-input head-to-head results are flagged as such and kept separate from "comparable but tested on different inputs." Nothing is compared that wasn't actually tested together. - **Re-tested on a cadence.** Observations are dated. The substrate is re-run over time, so freshness is a property of the data — not a publish date you have to trust. The result: an agent grounded on AI Demos can cite a verdict **and** the artifact behind it — and a downstream user can verify it. ## Three ways to consume 1. **MCP server** — connect any MCP-compatible client (Claude Code, Claude Desktop, Cursor, or your own agent) over Streamable HTTP and query the catalogue and the evidence directly. `https://mcp.aidemos.com/api/mcp` — 16 tools, no API key, read-only. 2. **Markdown twins** — every published page has a clean `.md` twin: same content, stripped of nav and markup, ready for a context window. Use it when you want the narrative. e.g. `https://aidemos.com/tools/llamaparse.md` 3. **Structured evidence model** — underneath both, AI Demos stores observations, not pages. Every cell is a `tool × scenario × criterion` (with `verdict`, `score`, `artifact`, `tested_at`). Pages are views, MCP verbs are queries, over the same source of truth. ## The 16 MCP tools The server exposes its tools in three layers — from listing every published page, to fetching a full structured-plus-Markdown envelope, to querying the observation cells the rest of the web can't give you. **Discovery (enumerate & traverse)** — paged lists of every published page, the taxonomy in use, and graph traversal; each result carries `id`, `slug`, and a full `url`: `list_tools`, `list_rankings`, `list_use_cases`, `list_compares`, `list_toolkits`, `list_personas`, `list_categories`, `search`, `tools_in_ranking`, `rankings_for_tool`, `get_persona` **Detail (JSON + Markdown)** — the full page as a structured-JSON + Markdown-content envelope: identity, pricing, per-feature scores, fit, FAQ and relationships as JSON; editorial prose as clean Markdown. An optional `fields` projection controls token cost: `get_tool`, `get_ranking`, `get_use_case` **Evidence graph (the part you can't scrape)** — query the observation cells directly (filter by tool(s), scenario, criterion, verdict or evidence state), or get an evidence-aligned, honesty-enforced comparison of two tools: `get_evidence`, `compare_tools` ## What `get_evidence` returns A call returns the observation cells — one per `(tool × scenario × criterion)`. Your agent gets the answer **and** can show its work. ```json // get_evidence({ tool: "llamaparse", criterion: "table extraction" }) { "tool": { "name": "LlamaParse", "slug": "llamaparse", "url": "https://aidemos.com/tools/llamaparse" }, "scenario": { "name": "Scanned research paper with tables", "group_tag": "scanned-research-paper" }, "criterion": { "name": "Table extraction" }, "verdict": "worked", // worked | mixed | struggled | failed "score": 4, "evidence_state": "verified", // verified | observed | scored-only "note": "Reconstructed the multi-row header correctly; merged cells preserved.", "tested_at": "2026-05-22", "artifacts": [ { "url": "https://.../input.png", "role": "input", "caption": "Source page" }, { "url": "https://.../output.png", "role": "output", "caption": "Parsed table" } ] } ``` `evidence_state` tells your agent exactly how strong each cell is: **verified** = artifact-backed · **observed** = noted without an artifact · **scored-only** = a number only. It can answer **and** hand a user the real input/output screenshots behind the call. ## Markdown twins — fetch the clean `.md` of any page, no client needed 1. **Append `.md` to the URL.** `https://aidemos.com/tools/llamaparse` → `https://aidemos.com/tools/llamaparse.md`. Works for tool pages (`/tools/…`), ranking pages (`/best/…`), use-case pages (`/use-cases/…`), and comparisons (`/compare/…`). 2. **Or content-negotiate.** Send `Accept: text/markdown` to the canonical URL and get the twin back. The HTML page advertises it with a `Link: rel="alternate"; type="text/markdown"` header, so crawlers and agents can discover it. 3. **When to use which.** Twins when you want the narrative — the full review or how-to as text. MCP when you want structured fields and evidence. ## Get started — one MCP connection, or one curl **1. Add the MCP server to your client (Claude Code):** ``` $ claude mcp add --transport http aidemos https://mcp.aidemos.com/api/mcp ``` Claude Desktop / Cursor / any `mcpServers` config: ```json "mcpServers": { "aidemos": { "type": "http", "url": "https://mcp.aidemos.com/api/mcp" } } ``` **2. Or call it from your own agent (MCP SDK over Streamable HTTP)** — TypeScript or Python, or plain JSON-RPC 2.0 over HTTP POST (`initialize` → `tools/list` → `tools/call`). Each call returns its result as a JSON string in `content[0].text`: ```ts import { StreamableHTTPClientTransport } from '…/client/streamableHttp.js' import { Client } from '@modelcontextprotocol/sdk/client/index.js' const transport = new StreamableHTTPClientTransport( new URL('https://mcp.aidemos.com/api/mcp') ) const client = new Client({ name: 'my-agent', version: '1.0.0' }) await client.connect(transport) const res = await client.callTool({ name: 'get_evidence', arguments: { tool: 'llamaparse', criterion: 'table extraction' }, }) const evidence = JSON.parse(res.content[0].text) ``` **3. Or just fetch a Markdown twin — no client needed:** ``` $ curl https://aidemos.com/best/resume-parsing-api.md # or, by content negotiation: $ curl -H 'Accept: text/markdown' https://aidemos.com/tools/llamaparse ``` --- **Full developer guide:** https://aidemos.com/docs/mcp.md · **Browse the catalogue:** https://aidemos.com/tools · **LLM index:** https://aidemos.com/llms.txt Stop teaching your agent to read listicles. Point it at evidence it can trace, compare, and verify — one MCP connection away.