AI Demos for AI Agents

Your agent doesn’t need to scrape another listicle.

AI Demos serves AI-tool evaluations as structured, sourced evidence — over MCP.

Most AI-tool content is written for humans to read — so an agent has to scrape the page, summarize the prose, and infer a conclusion it can’t check. AI Demos exposes its evaluations as structured intelligence instead: your agent queries the actual observations, scores, evidence, and artifacts directly — sourced, reproducible answers it can reason over, not paragraphs it has to re-summarize and hope it got right.

MCP → https://mcp.aidemos.com/api/mcp16 tools·read-only·no API key

Connect the server See get_evidence output Full developer guide

A structured intelligence platform — not a blog

AI Demos stores observations, not pages.

We’re building a structured intelligence platform for AI tools. Instead of publishing research only as web pages, we store every evaluation as a structured observation — a tool tested on a specific scenario, against a specific criterion, with the real input and captured output attached — that humans, LLMs, and agents can all query.

Traditional review sitespublishpages.

AI Demosstoresobservations.

Pages, rankings, comparisons, use cases, MCP responses, and future APIs are all generated from the same underlying observations and evidence — one source of truth, many views over it.

Extract the evidence once. Expose it many ways.

A single observation can power every surface below — so we capture the evidence once instead of recreating the same knowledge in article after article.

1 observation→Ranking pageTool pageUse-case recComparisonMarkdown twinMCP response

Built for both humans and machines from the same source of truth — the advantage isn’t that we ignore humans, it’s that the same evidence powers the human-readable research and the machine-readable interfaces. MCP, the evidence graph, and Markdown twins are the surfaces over that evidence — below is how your agent reaches them.

Why agents should use AI Demos

Most “best AI tools for X” content is prose written to rank on Google.

An LLM that consumes it inherits its problems: claims with no provenance, stale tests, marketing language, and no way to tell a measured result from an opinion. AI Demos is the opposite by construction.

Evidence, not assertions

Every score and recommendation traces back to an observation — a tool tested on a specific scenario against a specific criterion, with the real input and output artifact captured. Your agent can pull that proof, not just the conclusion.

Structured, not scraped

You query typed fields — pricing, scores, verdicts, criteria, relationships — through function calls. No HTML parsing, no guessing which <div> holds the price.

Honest comparisons, enforced structurally

Same-input head-to-head results are flagged as such and kept separate from "comparable but tested on different inputs." Nothing is compared that wasn’t actually tested together.

Re-tested on a cadence

Observations are dated. The substrate is re-run over time, so freshness is a property of the data — not a publish date you have to trust.

The result: an agent grounded on AI Demos can cite a verdict and the artifact behind it — and a downstream user can verify it.

Three ways to consume

Pick the surface that fits how your agent thinks.

Structured fields and evidence over MCP, clean narrative as Markdown twins, and one evidence model underneath them both.

MCP server

Connect any MCP-compatible client — Claude Code, Claude Desktop, Cursor, or your own agent — over Streamable HTTP and query the catalogue and the evidence directly.

https://mcp.aidemos.com/api/mcp

16 toolsno API keyread-onlyproto 2025-06-18

Markdown twins

Every published page has a clean .md twin — same content, stripped of nav and markup, ready for a context window. Use it when you want the narrative.

/tools/llamaparse.md

/tools/best/use-casesAccept: text/markdown

Structured evidence model

Underneath both: AI Demos stores observations, not pages. Every cell is a tool × scenario × criterion — pages are views, MCP verbs are queries, over the same source of truth.

verdictscoreartifacttested_at

The 16 MCP tools

Enumerate, fetch detail, then reach the ground truth.

The server exposes its tools in three layers — from listing every published page, to fetching a full structured-plus-Markdown envelope, to querying the observation cells the rest of the web can’t give you.

Discovery

enumerate & traverse

Paged lists of every published page, the taxonomy in use, and graph traversal — each result carries id, slug, and a full url.

list_toolslist_rankingslist_use_caseslist_compareslist_toolkitslist_personaslist_categoriessearchtools_in_rankingrankings_for_toolget_persona

Detail

JSON + Markdown

The full page as a structured-JSON + Markdown-content envelope: identity, pricing, per-feature scores, fit, FAQ and relationships as JSON; editorial prose as clean Markdown. Optional fields projection controls token cost.

get_toolget_rankingget_use_case

Evidence graph

the part you can’t scrape

Query the observation cells directly — filter by tool(s), scenario, criterion, verdict or evidence state — or get an evidence-aligned, honesty-enforced comparison of two tools.

get_evidencecompare_tools

What get_evidence returns

The verdict, the score — and the URLs of the screenshots that prove it.

A call returns the observation cells — one per (tool × scenario × criterion). Your agent gets the answer and can show its work.

get_evidence — observation cellapplication/json

// get_evidence({ tool: "llamaparse", criterion: "table extraction" })
{
  "tool":      { "name": "LlamaParse", "slug": "llamaparse",
                "url": "https://aidemos.com/tools/llamaparse" },
  "scenario":  { "name": "Scanned research paper with tables",
                "group_tag": "scanned-research-paper" },
  "criterion": { "name": "Table extraction" },
  "verdict":       "worked",            // worked | mixed | struggled | failed
  "score":         4,
  "evidence_state": "verified",         // verified | observed | scored-only
  "note":  "Reconstructed the multi-row header correctly; merged cells preserved.",
  "tested_at": "2026-05-22",
  "artifacts": [
    { "url": "https://.../input.png", "role": "input", "caption": "Source page" },
    { "url": "https://.../output.png", "role": "output", "caption": "Parsed table" }
  ]
}

evidence_state tells your agent exactly how strong each cell is: verified = artifact-backed · observed = noted without an artifact · scored-only = a number only. It can answer and hand a user the real input/output screenshots behind the call.

Markdown twins

Fetch the clean .md of any page — no client needed.

Append `.md` to the URL

https://aidemos.com/tools/llamaparse → …/tools/llamaparse.md. Works today for tool pages (/tools/…), ranking pages (/best/…) and use-case pages (/use-cases/…).

Or content-negotiate

Send Accept: text/markdown to the canonical URL and get the twin back. The HTML page advertises it with a Link: rel="alternate"; type="text/markdown" header, so crawlers and agents can discover it.

→

When to use which

Twins when you want the narrative — the full review or how-to as text. MCP when you want structured fields and evidence.

curl

$ curl -H 'Accept: text/markdown' \
https://aidemos.com/tools/llamaparse

# LlamaParse
## Our take
Strong on complex tables and scanned
documents; the multi-row header
reconstruction held up across our…

## Pricing
Free tier: 1,000 pages/day…

Get started

One MCP connection, or one curl.

Add the MCP server to your client — Claude Code

$ claude mcp add --transport http aidemos https://mcp.aidemos.com/api/mcp

Claude Desktop / Cursor / any mcpServers config:

"mcpServers": {
  "aidemos": { "type": "http", "url": "https://mcp.aidemos.com/api/mcp" }
}

Or call it from your own agent — MCP SDK over Streamable HTTP

TypeScript or Python — or plain JSON-RPC 2.0 over HTTP POST (initialize → tools/list → tools/call). Each call returns its result as a JSON string in content[0].text.

import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StreamableHTTPClientTransport } from '…/client/streamableHttp.js'

const transport = new StreamableHTTPClientTransport(new URL('https://mcp.aidemos.com/api/mcp'))
const client = new Client({ name: 'my-agent', version: '1.0.0' })
await client.connect(transport)

const res = await client.callTool({
  name: 'get_evidence',
  arguments: { tool: 'llamaparse', criterion: 'table extraction' },
})
const evidence = JSON.parse(res.content[0].text)

Or just fetch a Markdown twin — no client needed

$ curl https://aidemos.com/best/resume-parsing-api.md
# or, by content negotiation:
$ curl -H 'Accept: text/markdown' https://aidemos.com/tools/llamaparse

Stop teaching your agent to read listicles.

Point it at evidence it can trace, compare, and verify — one MCP connection away.

MCP → https://mcp.aidemos.com/api/mcp16 tools·read-only·no API key

Connect the server Full developer guide Browse the catalogue

AI Demos stores observations, not pages.

Extract the evidence once. Expose it many ways.

Most “best AI tools for X” content is prose written to rank on Google.

Evidence, not assertions

Structured, not scraped

Honest comparisons, enforced structurally

Re-tested on a cadence

Pick the surface that fits how your agent thinks.

MCP server

Markdown twins

Structured evidence model

Enumerate, fetch detail, then reach the ground truth.

Discovery

Detail

Evidence graph

The verdict, the score — and the URLs of the screenshots that prove it.

Fetch the clean .md of any page — no client needed.

Append .md to the URL

Or content-negotiate

When to use which

One MCP connection, or one curl.

Add the MCP server to your client — Claude Code

Or call it from your own agent — MCP SDK over Streamable HTTP

Or just fetch a Markdown twin — no client needed

Stop teaching your agent to read listicles.

Append `.md` to the URL