# AI Demos MCP — Developer Guide

**How AI agents use the AI Demos MCP server to make grounded tool-choice decisions.**

This is the practical deep-dive companion to [AI Demos for AI Agents](https://aidemos.com/for-ai-agents)
(the platform overview — what the evidence model is and why it exists). This document goes
deeper: the full API reference with real responses, multi-call workflows, and how an agent chains
the verbs to answer questions like *"which resume parser should I use, and prove it."*

Every example response below was produced by calling the live endpoint
(`https://mcp.aidemos.com/api/mcp`) on 2026-07-02 and truncating (`…` marks elided content).
Nothing is invented. Where the data has gaps, the gaps are stated.

- **Endpoint:** `https://mcp.aidemos.com/api/mcp` (prod) · `https://mcp-staging.aidemos.com/api/mcp` (staging)
- **Transport:** Streamable HTTP (JSON-RPC 2.0 over POST, stateless — no session required)
- **Auth:** none. Public, read-only, published data only.
- **16 tools** ("verbs" below, to avoid confusion with the AI *tools* in the catalogue).

---

## Table of contents

1. [Introduction — why this exists](#1-introduction--why-this-exists)
2. [Architecture](#2-architecture)
3. [API Reference (16 verbs)](#3-api-reference)
4. [Practical Workflows](#4-practical-workflows)
5. [AI Agent Examples](#5-ai-agent-examples)
6. [Best Practices](#6-best-practices)

---

## 1. Introduction — why this exists

AI Demos publishes hands-on testing of AI tools: rankings ("best X"), tool reviews, head-to-head
comparisons, and how-to use cases — all backed by an **evidence substrate** of real test
observations with screenshots. The MCP server exposes all of it as structured functions.

### The problem it solves

An agent asked *"what's the best resume-parsing API and why?"* has two options:

1. **Scrape aidemos.com HTML** — fetch a rendered page, burn thousands of tokens on nav/CSS/JSX
   noise, regex out scores that were designed for human eyes, and hope the layout doesn't change.
2. **Call the MCP** — one `get_ranking({"slug": "resume-parsing-api"})` returns ranked tools with
   badges, per-metric scores, the testing methodology, and per-tool worked/struggled breakdowns as
   typed JSON.

Structured beats scraping for agents because:

- **Deterministic fields.** `winner.slug`, `tools[n].rank`, `pricing.plans[n].price` are stable
  keys, not CSS selectors. No parsing heuristics, no layout drift.
- **Scores with provenance.** Every ranking carries its `methodology` (what was tested, on which
  inputs, against which criteria). Every score is traceable to observation cells with researcher
  notes and `tested_at` timestamps.
- **Artifact-backed observations.** Evidence cells carry the actual input/output screenshots
  (`artifacts[].url`) — an agent can cite *the image of the failed table extraction*, not just a
  claim about it.
- **Token-efficient projections.** `get_tool({"slug": "affinda", "fields": ["pricing"]})` returns
  ~1 KB instead of the full ~8 KB envelope. Lists return light refs, not full documents.

### What decisions become easier

| Decision | Verbs |
|---|---|
| "Which tool should I use for X?" | `search` → `get_ranking` → `get_tool` |
| "A vs B — which is better, provably?" | `compare_tools` (same-input evidence, structurally honest) |
| "Why is X ranked above Y?" | `get_ranking` (breakdown) + `get_evidence` (raw cells) |
| "Does it have an API / what does it cost?" | `get_tool` with `fields: ["pricing"]` / `["fit"]` / `["features"]` |
| "Where does this tool break?" | `get_evidence` with `verdict: "failed"` |
| "What exists for my kind of user?" | `get_persona`, `list_*`, `search` with `persona` filter |

---

## 2. Architecture

### 2.1 Three layers + search

```
                         ┌──────────────────────────────┐
                         │            search             │  keyword / semantic / hybrid
                         │  (one query across all kinds) │  → ranked light refs
                         └──────────────┬───────────────┘
                                        │ slugs / ids
  ┌─────────────────────────────────────▼─────────────────────────────────────┐
  │ LAYER 1 · DISCOVERY — what exists, how it links (light refs, cheap)       │
  │   list_use_cases · list_rankings · list_tools · list_compares             │
  │   list_toolkits · list_personas · list_categories                         │
  │   tools_in_ranking · rankings_for_tool · get_persona                      │
  └─────────────────────────────────────┬─────────────────────────────────────┘
                                        │ slug
  ┌─────────────────────────────────────▼─────────────────────────────────────┐
  │ LAYER 2 · DETAIL ENVELOPES — the full page, JSON structure + Markdown     │
  │   get_tool · get_ranking · get_use_case      (all support `fields`)       │
  └─────────────────────────────────────┬─────────────────────────────────────┘
                                        │ tool slug / scenario / criterion
  ┌─────────────────────────────────────▼─────────────────────────────────────┐
  │ LAYER 3 · EVIDENCE GRAPH — the raw research behind the scores             │
  │   get_evidence · compare_tools                                            │
  │   observation cell = tool × scenario × criterion                          │
  │     → verdict + note + evidence_state + artifacts (screenshots)           │
  └───────────────────────────────────────────────────────────────────────────┘
```

### 2.2 The data model

**Catalogue** (what the public site shows):

```
persona ──┐                    ┌── tool (review page: pricing, features, fit, our-take)
category ─┼─ tags on ─────────▶├── ranking ("best X": ranked tools + methodology + breakdown)
          │  every page type   ├── use_case (how-to guide: step_guide, pros/cons, FAQ)
          │                    ├── compare (tool A vs tool B page)
          └────────────────────└── toolkit (curated bundle)

ranking ⟷ tool     : tools_in_ranking / rankings_for_tool (rank + badge per pairing)
persona → all types: get_persona (mirrors a persona landing page)
```

**Evidence substrate** (why the catalogue says what it says):

```
observation cell = (tool × scenario × criterion)
  ├── verdict        : worked | mixed | struggled | failed
  ├── score          : nullable number (+ score_total)   ← currently null on live cells
  ├── note           : the researcher's observation, one specific claim
  ├── evidence_state : verified (artifact-backed) | observed (noted, no artifact)
  │                    | scored-only | null
  ├── source         : e.g. "first-party" (we ran the test)
  ├── tested_at      : timestamp
  └── artifacts[]    : { url, alt, role, caption } — real input/output screenshots (CDN)

scenario = a concrete test input ("Scanned Research Paper", "RAG Ingestion Pipeline")
  └── group_tag : cross-run tag linking scenarios that probe the same thing
criterion = the dimension judged ("Table Preservation", "Branching & Logic")
```

Two cells on the **same scenario** = the tools saw the **same input** — that is what makes
`compare_tools.head_to_head` a provable comparison rather than marketing prose.

### 2.3 Conventions (hold across all 16 verbs)

- Every entity carries `id`, `slug` (or `name`/`title`), and a full public `url` on
  `https://aidemos.com`.
- Related entities are **light refs** — `{ id, name|title, slug }` (sometimes + `url`) — traverse
  by calling another verb with the slug/id.
- **Empty results return `[]` (or `{count: 0, cells: []}`), never an error.** Don't retry on empty.
- **Unknown detail slug returns `null`** (`get_tool` / `get_ranking` / `get_use_case`).
  `compare_tools` with an unknown tool returns `{"error": "Tool not found: … (published tools only)"}`.
- **Prefer slugs over ids.** IDs are not type-stable across verbs (`774` in `list_tools`, `"774"`
  in `tools_in_ranking`; category ids are UUIDs or `null`). Slugs are the reliable join key —
  and even they can be `null` on ranked entries whose tool page isn't linked yet (see §3.13).
- Results are **published data only**; drafts never appear.
- Every `tools/call` result is a JSON **string** inside `result.content[0].text` — parse it.

### 2.4 Which verb do I need?

| You have… | You want… | Call |
|---|---|---|
| a vague need ("parse invoices") | candidate pages | `search` (semantic or hybrid) |
| an exact product name | its page | `search` mode `keyword`, or `get_tool` if you can guess the slug |
| nothing | the lay of the land | `list_personas`, `list_categories`, `list_rankings` |
| a user type ("founder") | everything for them | `get_persona` |
| a ranking slug | full ranking with scores | `get_ranking` |
| a ranking id | just the ranked tool list | `tools_in_ranking` |
| a tool slug | pricing / features / review | `get_tool` (+ `fields`) |
| a tool id | every ranking it appears in | `rankings_for_tool` |
| a use-case slug | the how-to guide | `get_use_case` |
| a tool | the raw test observations | `get_evidence` |
| two tools | a provable comparison | `compare_tools` |
| a claim to verify ("X fails on Y") | supporting/refuting cells | `get_evidence` with `verdict` / `criterion` filters |

---

## 3. API Reference

### Calling conventions — setup once

Client installation and MCP-client registration (Claude Code, Claude Desktop, Cursor,
`mcp-remote` bridge) are covered in [`AIDEMOS_MCP.md`](./AIDEMOS_MCP.md#how-to-add-it-to-your-program)
— not repeated here. Below is the programmatic setup each per-verb snippet assumes.

**curl** — the endpoint is plain JSON-RPC 2.0 over POST; a bare `tools/call` works without
`initialize` (the server is stateless):

```bash
MCP=https://mcp.aidemos.com/api/mcp
# generic call shape — swap name/arguments per verb:
curl -s $MCP -H "Content-Type: application/json" -d '{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": { "name": "list_personas", "arguments": {} }
}'
# response: {"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"<JSON string>"}]}}
# extract the payload:
#   ... | jq -r '.result.content[0].text' | jq .
```

**Python** (`pip install mcp`) — full setup once; every per-verb snippet below is a `one(...)`
call inside this session:

```python
import asyncio, json
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async def main():
    async with streamablehttp_client("https://mcp.aidemos.com/api/mcp") as (read, write, _):
        async with ClientSession(read, write) as session:
            await session.initialize()

            async def one(name: str, args: dict | None = None):
                res = await session.call_tool(name, args or {})
                return json.loads(res.content[0].text)

            # per-verb snippets go here, e.g.:
            personas = await one("list_personas")

asyncio.run(main())
```

**JavaScript / TypeScript** (`npm install @modelcontextprotocol/sdk`):

```ts
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'

const transport = new StreamableHTTPClientTransport(new URL('https://mcp.aidemos.com/api/mcp'))
const client = new Client({ name: 'my-agent', version: '1.0.0' })
await client.connect(transport)

const one = async (name: string, args: Record<string, unknown> = {}) => {
  const res = await client.callTool({ name, arguments: args })
  return JSON.parse((res.content as any)[0].text)
}

// per-verb snippets go here, e.g.:
const personas = await one('list_personas')
```

**Error behavior** (verified live):

| Situation | What you get |
|---|---|
| Unknown verb name | JSON-RPC error `-32602` `"Unknown tool: …"` |
| Missing required argument | tool result with `isError: true`, text `"Error in get_tool: 'slug'"` |
| Unknown slug on `get_*` detail verbs | `null` (a valid result, not an error) |
| Unknown tool on `compare_tools` | `{"error": "Tool not found: … (published tools only)"}` |
| Unknown tool on `get_evidence` | `{"cells": [], "note": "No published tool matched: …"}` |
| Valid filter, no matches | `[]` / `{"count": 0, "cells": []}` |

Catalogue size at time of writing (live counts): **128 tools · 21 rankings · 17 use cases ·
2 comparisons · 3 toolkits · 6 personas**.

---

### 3.1 `search`

Search the published catalogue across use cases, rankings, tools, comparisons, and toolkits.
Returns ranked light refs — then call `get_tool` / `get_ranking` / `get_use_case` for detail.

| Param | Type | Required | Description |
|---|---|---|---|
| `query` | string | yes | Free-text query. |
| `mode` | `"keyword"` \| `"semantic"` \| `"hybrid"` | no | Default `hybrid`. `keyword` = substring; `semantic` = embedding similarity over each page's Markdown (finds pages by what they *cover*); `hybrid` = reciprocal-rank fusion of both. |
| `type` | string[] of `use_case` \| `ranking` \| `tool` \| `compare` \| `toolkit` | no | Restrict kinds (default: all). |
| `persona` | string | no | Persona slug filter. |
| `category` | string | no | Category filter (case-insensitive). |
| `limit` | integer 1–50 | no | Default 20. |

**Response shape:** `{ query, mode_used, count, results[{ kind, id, title, slug, url, snippet, score, meta }] }`.
`meta` is per-kind: tools carry `domain/category/personas/sub_category/content_type`; rankings
carry `winner/tools_count/tested_as_of`; etc. **Score scales differ by mode** — semantic ≈ cosine
similarity (0–1), hybrid = small RRF fusion values (~0.01–0.04), keyword = integer match weight.
Scores rank results *within one response*; never compare them across calls or modes.

Live example — semantic, tools only:

```json
// search {"query": "parse resumes into structured JSON", "mode": "semantic", "type": ["tool"], "limit": 5}
{
  "query": "parse resumes into structured JSON",
  "mode_used": "semantic",
  "count": 5,
  "results": [
    {
      "kind": "tool", "id": 781, "title": "LlamaParse", "slug": "llamaparse",
      "url": "https://aidemos.com/tools/llamaparse",
      "snippet": "LlamaParse Review: AI Resume Parser & Schema Extraction Tested (2026)",
      "score": 0.46,
      "meta": { "domain": "llamaindex.ai", "category": "Developer Tools & APIs",
                "personas": ["founders", "editors"], "sub_category": "APIs", "…": "…" }
    },
    { "kind": "tool", "id": 778, "title": "Extracta Labs", "slug": "extracta-labs", "score": 0.452, "…": "…" },
    { "kind": "tool", "id": 794, "title": "Hireability", "slug": "hireability", "score": 0.413, "…": "…" },
    { "kind": "tool", "id": 774, "title": "Affinda", "slug": "affinda", "score": 0.409, "…": "…" },
    { "kind": "tool", "id": 780, "title": "Hrflow", "slug": "hrflow", "score": 0.394, "…": "…" }
  ]
}
```

The same query in default `hybrid` mode also surfaces the **ranking** page
(`resume-parsing-api`, `meta.winner: "Affinda"`) — usually the better entry point than any single
tool. Keyword mode on `"llamaparse"` returns exactly one hit.

```bash
curl -s $MCP -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
  "params":{"name":"search","arguments":{"query":"resume parsing","limit":5}}}' \
  | jq -r '.result.content[0].text' | jq '.results[] | {kind, slug, score}'
```

```python
hits = await one("search", {"query": "resume parsing", "limit": 5})
```

```ts
const hits = await one('search', { query: 'resume parsing', limit: 5 })
```

---

### 3.2 `list_use_cases`

List published use-case pages (how-to guides).

| Param | Type | Required | Description |
|---|---|---|---|
| `limit` | integer 1–500 | no | Max items. Omit for all. |
| `offset` | integer ≥ 0 | no | Paging. |

**Response shape:** `[{ id, title, slug, url, persona{…}, category, updated_at }]`

```json
// list_use_cases {"limit": 2}
[
  {
    "id": 64,
    "title": "Turn Academic PDFs Into Exam-Ready Revision Notes Using AI",
    "slug": "turn-academic-pdfs-into-revision-notes",
    "url": "https://aidemos.com/use-cases/turn-academic-pdfs-into-revision-notes",
    "persona": { "id": 6, "name": "Teachers", "slug": "teachers", "url": "https://aidemos.com/for/teachers" },
    "category": "education-research",
    "updated_at": "2026-06-18T17:57:58.444Z"
  },
  { "id": 70, "title": "Convert Lecture Recordings into Structured Exam-Ready Notes Using AI",
    "slug": "convert-lecture-recordings-into-exam-ready-notes", "…": "…" }
]
```

> ⚠️ **Known gap:** 8 of the 17 listed use cases are legacy prose-only pages that predate the
> structured data model — `get_use_case` returns `null` for them (verified live:
> `access-google-veo-3`, `create-talking-baby-videos`, `identify-crop-diseases-from-images`,
> `meeting-notes-to-tasks`, `generate-copyright-safe-background-music`,
> `turn-scripts-into-short-videos`, `create-ai-influencer`, `turn-long-videos-into-short-clips`).
> Handle `null` and fall back to the page `url` if you must have content.

```python
ucs = await one("list_use_cases", {"limit": 50})
```

```ts
const ucs = await one('list_use_cases', { limit: 50 })
```

---

### 3.3 `list_rankings`

List published ranking pages ("best X").

| Param | Type | Required | Description |
|---|---|---|---|
| `limit` | integer 1–500 | no | Max items. Omit for all. |
| `offset` | integer ≥ 0 | no | Paging. |

**Response shape:**
`[{ id, title, slug, url, use_case, persona{…}, category, tools_count, winner{id,name,slug}, tested_as_of, updated_at }]`

```json
// list_rankings {"limit": 2}
[
  {
    "id": 173,
    "title": "Best AI Tools to Generate Diagram Animations from Text Descriptions",
    "slug": "best-ai-tools-to-generate-diagram-animations-from-text-descriptions",
    "url": "https://aidemos.com/best/best-ai-tools-to-generate-diagram-animations-from-text-descriptions",
    "use_case": null,
    "persona": { "id": 7, "name": "Founders", "slug": "founders", "url": "https://aidemos.com/for/founders" },
    "category": null,
    "tools_count": 5,
    "winner": { "id": "698", "name": "EasyMotion", "slug": "easymotion" },
    "tested_as_of": "2026-04",
    "updated_at": "2026-06-25T06:10:55.867Z"
  },
  { "id": 176, "title": "Best AI Tools for Outreach Prospecting and Personalized Cold Email",
    "slug": "cold-email-tools", "winner": { "name": "Apollo", "slug": "apollo", "…": "…" }, "…": "…" }
]
```

Notes from live data: `use_case` is `null` until a ranking is tagged to its use case in the CMS;
`category` can be `null`; `tested_as_of` is a free-form string (`"2026-04"`, `"June 2026"`, or
`null`) — display it, don't parse it.

```python
rankings = await one("list_rankings")
```

```ts
const rankings = await one('list_rankings')
```

---

### 3.4 `list_tools`

List published AI tool pages.

| Param | Type | Required | Description |
|---|---|---|---|
| `limit` | integer 1–500 | no | Max items. Omit for all (128 today). |
| `offset` | integer ≥ 0 | no | Paging. |

**Response shape:** `[{ id, name, slug, url, domain, personas[{…}], categories[] }]`

```json
// list_tools {"limit": 2}
[
  {
    "id": 718, "name": "Academa AI", "slug": "academa-studio",
    "url": "https://aidemos.com/tools/academa-studio",
    "domain": "studio.academa.ai",
    "personas": [ { "id": 6, "name": "Teachers", "slug": "teachers", "…": "…" }, "…" ],
    "categories": ["video-generator"]
  },
  { "id": 841, "name": "Adobe Podcast Enhance", "slug": "adobe-podcast-enhance",
    "domain": "podcast.adobe.com", "…": "…" }
]
```

`domain` can be `null` (Affinda's is, today). `categories` is an array of category slugs.

```python
tools = await one("list_tools", {"limit": 200})
```

```ts
const tools = await one('list_tools', { limit: 200 })
```

---

### 3.5 `list_compares`

List published head-to-head comparison pages.

| Param | Type | Required | Description |
|---|---|---|---|
| `limit` | integer 1–500 | no | Max items. Omit for all (2 today). |
| `offset` | integer ≥ 0 | no | Paging. |

**Response shape:**
`[{ id, title, slug, url, tool_a{…}, tool_b{…}, personas[], shared_use_cases[], updated_at }]`

```json
// list_compares {}
[
  {
    "id": 50, "title": "YouLearn vs NoteGPT", "slug": "youlearn-ai-vs-notegpt",
    "url": "https://aidemos.com/compare/youlearn-ai-vs-notegpt",
    "tool_a": { "id": 723, "name": "YouLearn AI", "slug": "youlearn" },
    "tool_b": { "id": 703, "name": "NoteGPT", "slug": "notegpt" },
    "personas": [ { "id": 3, "name": "Students", "slug": "students", "…": "…" } ],
    "shared_use_cases": [
      { "id": 64, "title": "Turn Academic PDFs Into Exam-Ready Revision Notes Using AI",
        "slug": "turn-academic-pdfs-into-revision-notes" }, "…"
    ],
    "updated_at": "2026-05-14T19:03:30.819Z"
  },
  { "id": 6, "title": "InVideo AI vs Revid AI", "slug": "invideo-ai-vs-revid-ai",
    "tool_a": null, "tool_b": null, "shared_use_cases": [], "…": "…" }
]
```

`tool_a`/`tool_b` can be `null` on legacy comparisons (second entry above) — the page exists but
isn't wired to tool records. Only 2 comparison *pages* exist today; for evidence-based comparisons
of **any** two tested tools use `compare_tools` (§3.16), which doesn't need a comparison page.

```python
compares = await one("list_compares")
```

```ts
const compares = await one('list_compares')
```

---

### 3.6 `list_toolkits`

List published toolkit pages (curated bundles).

| Param | Type | Required | Description |
|---|---|---|---|
| `limit` | integer 1–500 | no | Max items. Omit for all (3 today). |
| `offset` | integer ≥ 0 | no | Paging. |

**Response shape:** `[{ id, title, slug, url, use_case, category }]`

```json
// list_toolkits {}
[
  { "id": "4", "title": "3 Powerful AI Tools Every Content Creator Needs in 2025",
    "slug": "ai-content-creator", "url": "https://aidemos.com/toolkits/ai-content-creator",
    "use_case": null, "category": null },
  { "id": "3", "title": "AI Avatar tools for faceless youtube channels",
    "slug": "ai-avatar-generator", "…": "…" },
  { "id": "2", "title": "5 Must Try Chatgpt-Image-Generation-Prompts", "…": "…" }
]
```

Toolkits are the thinnest page type — light refs only, no `get_toolkit` detail verb exists.

```python
toolkits = await one("list_toolkits")
```

```ts
const toolkits = await one('list_toolkits')
```

---

### 3.7 `list_personas`

List personas with published-page counts per type. No parameters. The slugs here are the valid
input for `get_persona` and for `search`'s `persona` filter.

**Response shape:** `[{ id, slug, name, counts{ use_cases, rankings, tools, compares, toolkits } }]`

```json
// list_personas {}
[
  { "id": 5, "slug": "creators", "name": "Creators",
    "counts": { "use_cases": 12, "rankings": 11, "tools": 48, "compares": 1, "toolkits": 0 } },
  { "id": 9, "slug": "editors",  "name": "Editors",  "counts": { "use_cases": 2, "rankings": 1,  "tools": 13, "…": 0 } },
  { "id": 7, "slug": "founders", "name": "Founders", "counts": { "use_cases": 3, "rankings": 8,  "tools": 46, "…": 0 } },
  { "id": 8, "slug": "marketing","name": "Marketing","counts": { "use_cases": 9, "rankings": 10, "tools": 32, "…": 0 } },
  { "id": 3, "slug": "students", "name": "Students", "counts": { "use_cases": 2, "rankings": 3,  "tools": 12, "…": 0 } },
  { "id": 6, "slug": "teachers", "name": "Teachers", "counts": { "use_cases": 2, "rankings": 5,  "tools": 14, "…": 0 } }
]
```

```python
personas = await one("list_personas")
```

```ts
const personas = await one('list_personas')
```

---

### 3.8 `list_categories`

List the category vocabulary in use, with per-type counts and a `source` flag, sorted by total.
No parameters.

**Response shape:** `[{ id, slug, name, source, counts{…} }]` — `source` is `"collection"`
(a real category record) or `"derived"` (a value in use that has no record yet).

```json
// list_categories {}
[
  {
    "id": "edc52cd8-aa5c-43ea-b331-3f0df7f8b68d",
    "slug": "video-generator", "name": "Video Generation", "source": "collection",
    "counts": { "use_cases": 10, "rankings": 5, "tools": 31, "compares": 1, "toolkits": 0 }
  },
  { "id": "3c4ad1a1-…", "slug": "audio-speech",    "name": "Audio & Speech",        "source": "collection", "…": "…" },
  { "id": "d10ed9e0-…", "slug": "developer-tools", "name": "Developer Tools & APIs", "source": "collection", "…": "…" },
  "…"
]
```

> ⚠️ Category `id`s are **UUIDs**, not integers — and can be `null` for `derived` entries. Key
> everything on the `slug`.

```python
cats = await one("list_categories")
```

```ts
const cats = await one('list_categories')
```

---

### 3.9 `tools_in_ranking`

Given a ranking id, return its ranked tools as light refs — the cheap alternative to a full
`get_ranking` when you only need the ordering.

| Param | Type | Required | Description |
|---|---|---|---|
| `ranking_id` | string \| integer | yes | Ranking id (from `list_rankings`). |

**Response shape:** `{ ranking{id,title,slug}, tools[{ id, name, slug, url, rank, badge }] }`.
`badge` ∈ `Best` · `Usable` · `Needs work` · `Unstable` · `Failed`.

```json
// tools_in_ranking {"ranking_id": 168}
{
  "ranking": { "id": "168", "title": "Best AI Tools for Parsing Resumes via API (2026)", "slug": "resume-parsing-api" },
  "tools": [
    { "id": "774", "name": "Affinda",       "slug": "affinda",       "url": "https://aidemos.com/tools/affinda",       "rank": 1, "badge": "Best" },
    { "id": null,  "name": "Airparser",     "slug": "airparser",     "url": "https://aidemos.com/tools/airparser",     "rank": 2, "badge": "Best" },
    { "id": "781", "name": "LlamaParse",    "slug": "llamaparse",    "…": "…",                                          "rank": 3, "badge": "Best" },
    { "id": "778", "name": "Extracta Labs", "slug": "extracta-labs", "…": "…",                                          "rank": 4, "badge": "Usable" },
    { "id": "780", "name": "Hrflow",        "slug": "hrflow",        "…": "…",                                          "rank": 5, "badge": "Needs work" }
  ]
}
```

Note the real-world wrinkle: Airparser's `id` is `null` (its ranking entry isn't wired to a tool
record) while its `slug`/`url` still work. Traverse by slug.

```python
ranked = await one("tools_in_ranking", {"ranking_id": 168})
```

```ts
const ranked = await one('tools_in_ranking', { ranking_id: 168 })
```

---

### 3.10 `rankings_for_tool`

Given a tool id, return every ranking it appears in — the reverse edge of `tools_in_ranking`.

| Param | Type | Required | Description |
|---|---|---|---|
| `tool_id` | string \| integer | yes | Tool id (from `list_tools`). |

**Response shape:** `{ tool{id,name,slug}, rankings[{ id, title, slug, url, use_case, rank, badge }] }`

```json
// rankings_for_tool {"tool_id": 774}
{
  "tool": { "id": 774, "name": "Affinda", "slug": "affinda" },
  "rankings": [
    { "id": "168", "title": "Best AI Tools for Parsing Resumes via API (2026)",
      "slug": "resume-parsing-api", "url": "https://aidemos.com/best/resume-parsing-api",
      "use_case": null, "rank": 1, "badge": "Best" }
  ]
}
```

```python
where = await one("rankings_for_tool", {"tool_id": 774})
```

```ts
const where = await one('rankings_for_tool', { tool_id: 774 })
```

---

### 3.11 `get_persona`

Everything tagged with a persona — mirrors a persona landing page. One call replaces five
filtered lists.

| Param | Type | Required | Description |
|---|---|---|---|
| `slug` | string | yes | Persona slug (from `list_personas`), e.g. `"founders"`. |

**Response shape:** `{ persona{id,slug,name}, use_cases[], rankings[], compares[], toolkits[], tools[] }`
— all light refs; rankings include their `winner`.

```json
// get_persona {"slug": "founders"}   (truncated — full response ~9 KB)
{
  "persona": { "id": 7, "slug": "founders", "name": "Founders" },
  "use_cases": [
    { "id": 10, "title": "Convert Meeting Notes into Actionable Project Tasks Automatically",
      "slug": "meeting-notes-to-tasks", "url": "…", "category": "productivity" }, "…"
  ],
  "rankings": [
    { "id": "165", "title": "Best AI Customer Support Chatbots for Automated Support Workflows",
      "slug": "ai-customer-support-chatbots", "url": "…", "use_case": null,
      "winner": { "id": "785", "name": "Fre…", "…": "…" } },
    "… 8 rankings total"
  ],
  "compares": [],
  "toolkits": [],
  "tools": [ { "id": "774", "name": "Affinda", "slug": "affinda", "url": "…", "domain": null }, "… 46 tools" ]
}
```

```python
founders = await one("get_persona", {"slug": "founders"})
```

```ts
const founders = await one('get_persona', { slug: 'founders' })
```

---

### 3.12 `get_tool`

Full tool detail as a JSON + Markdown envelope. `null` for unknown slugs.

| Param | Type | Required | Description |
|---|---|---|---|
| `slug` | string | yes | Tool `seo_slug`, e.g. `"affinda"` (from `list_tools` / `search`). |
| `fields` | string[] | no | Projection: return ONLY these top-level fields (identity `id/name/slug/url` always included). |

Available `fields` (from the live input schema): `id, name, slug, url, heading, website, domain,
category, sub_category, content_type, personas, tags, rating, testing_history, features, pricing,
fit, use_case_track_record, related_pages, related_reads, similar_tools, faq, demo_video,
our_take, in_depth_review_md`.

**Envelope highlights** (Affinda, live):

- `features[]` — per-feature test results: `{ name, score, score_total, verdict, bottom_line }`.
  Scores are populated where the testing was scored (Affinda: `"Clean Resume Parsing": 9/10`) and
  `null` where it was qualitative (Apollo's features all have `score: null` with a prose
  `verdict`). Never assume the numbers exist.
- `pricing` — `{ heading, subheading, plans[{plan, price, notes, highlight, tested}], footnote }`
  with a "pricing checked <date>" footnote. `tested: true` marks the plan the research ran on.
- `fit` — `{ use_if[], skip_if[] }` decision bullets.
- `our_take.body_md`, `in_depth_review_md` — editorial Markdown.
- `use_case_track_record[]` — `{ rank, title, url, note }` per use case tested.
- ⚠️ `rating` is `null` and `testing_history` is `[]` on current live tools — reserved fields,
  don't build on them yet.

```json
// get_tool {"slug": "affinda", "fields": ["pricing"]}
{
  "id": 774, "name": "Affinda", "slug": "affinda", "url": "https://aidemos.com/tools/affinda",
  "pricing": {
    "heading": "Pricing & Access",
    "subheading": "Plans as of May 2026. Tested on the free plan",
    "plans": [
      { "plan": "Basic Testing", "price": "Free",
        "notes": "14-day free trial with all features, parsing limit of 200 documents, expires after 1 month",
        "highlight": true, "tested": true },
      { "plan": "Advanced Testing", "price": "$80 one-time", "notes": "3-month trial period…", "highlight": false, "tested": false },
      { "plan": "Tier 1", "price": "$800/year", "notes": "6,000 parses per year, all features included, API access", "…": "…" },
      { "plan": "Higher Tiers", "price": "Custom pricing", "…": "…" }
    ],
    "footnote": "Pricing checked May 2026. We re-check quarterly. Visit affinda.com for current enterprise pricing."
  }
}
```

```bash
curl -s $MCP -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
  "params":{"name":"get_tool","arguments":{"slug":"affinda","fields":["pricing"]}}}' \
  | jq -r '.result.content[0].text' | jq .pricing
```

```python
pricing = await one("get_tool", {"slug": "affinda", "fields": ["pricing"]})
```

```ts
const pricing = await one('get_tool', { slug: 'affinda', fields: ['pricing'] })
```

---

### 3.13 `get_ranking`

Full ranking detail. `null` for unknown slugs.

| Param | Type | Required | Description |
|---|---|---|---|
| `slug` | string | yes | Ranking slug, e.g. `"resume-parsing-api"` (from `list_rankings`). |
| `fields` | string[] | no | Projection (identity always included). |

Available `fields`: `id, title, slug, url, abstract, read_time, tested_date, category, use_case,
personas, tags, winner, methodology, tools, breakdown, final_take`.

**Envelope highlights:**

- `tools[]` — `{ rank, name, slug, url, badge, tagline, verdict_md, scores[{metric, score, max}] }`.
- `methodology` — `{ title, description_md, criteria[{name, description}] }`: what inputs were
  used and what each score means. This is what makes the numbers citable.
- `breakdown[]` — per-tool `{ name, slug, url, description_md, worked[], struggled[] }` — the
  "why this rank" bullets, including explicit failure notes.
- `winner` — the rank-1 entry.
- `verdict_md` / `final_take` — editorial Markdown.

```json
// get_ranking {"slug": "resume-parsing-api", "fields": ["winner", "tools"]}   (truncated)
{
  "id": 168, "title": "Best AI Tools for Parsing Resumes via API (2026)",
  "slug": "resume-parsing-api", "url": "https://aidemos.com/best/resume-parsing-api",
  "winner": { "rank": 1, "name": "Affinda", "slug": "affinda", "url": "…", "badge": "Best" },
  "tools": [
    {
      "rank": 1, "name": "Affinda", "slug": "affinda", "badge": "Best",
      "tagline": "Most Reliable End-to-End Parser",
      "verdict_md": "Affinda is a professional-grade resume-parsing API with 100+ configurable fields…",
      "scores": [
        { "metric": "Field Extraction Accuracy",    "score": 5, "max": 5 },
        { "metric": "Multi-Format Layout Handling", "score": 5, "max": 5 }
      ]
    },
    { "rank": 2, "name": "Airparser",  "slug": "airparser",  "badge": "Best",       "…": "…" },
    { "rank": 3, "name": "LlamaParse", "slug": "llamaparse", "badge": "Best",       "…": "…" },
    { "rank": 4, "name": "Extracta Labs", "slug": "extracta-labs", "badge": "Usable", "…": "…" },
    { "rank": 5, "name": "Hrflow",     "slug": "hrflow",     "badge": "Needs work", "…": "…" }
  ]
}
```

And the methodology (same ranking, `fields: ["methodology"]`, truncated):

```json
{
  "methodology": {
    "title": "How We Tested",
    "description_md": "Each tool was evaluated as a production-grade parsing API — not on its feature list, but on what it actually returned for three deliberately different resumes. We fixed the success and failure criteria before testing began…\n\nInput 1 — Clean single-column resume… Input 2 — Multi-column sidebar resume… Input 3 — Messy real-world resume…",
    "criteria": [
      { "name": "Field Extraction Accuracy",
        "description": "how correctly the parser pulled the core fields: name, email, phone, work experience, education, skills, and certifications." },
      "…"
    ]
  }
}
```

> ⚠️ Ranked entries whose tool page isn't linked can have `slug: null` (in the diagram-animations
> ranking, "Claude AI", "Vismo Studio" and "Academa AI" have `slug: null` at ranks 2–4 while
> EasyMotion and Kodisc are linked). When `slug` is null, fall back to `search` by `name` if you
> need the tool page — the evidence graph may still know the tool under its own slug
> (`claude-review` exists even though the ranking entry isn't wired to it).

```python
ranking = await one("get_ranking", {"slug": "resume-parsing-api", "fields": ["winner", "tools", "breakdown"]})
```

```ts
const ranking = await one('get_ranking', { slug: 'resume-parsing-api', fields: ['winner', 'tools', 'breakdown'] })
```

---

### 3.14 `get_use_case`

Full use-case (how-to guide) detail. `null` for unknown slugs **and for the 8 legacy prose
use cases** (§3.2).

| Param | Type | Required | Description |
|---|---|---|---|
| `slug` | string | yes | Use-case slug (from `list_use_cases`). |
| `fields` | string[] | no | Projection (identity always included). |

Available `fields`: `id, title, slug, url, abstract, read_time, category, audience, personas,
tags, tools_used, step_guide, what_to_expect, faq, full_md`.

```json
// get_use_case {"slug": "turn-academic-pdfs-into-revision-notes", "fields": ["step_guide", "tools_used"]}
{
  "id": 64, "title": "Turn Academic PDFs Into Exam-Ready Revision Notes Using AI",
  "slug": "turn-academic-pdfs-into-revision-notes", "url": "…",
  "tools_used": [],
  "step_guide": {
    "heading": "Convert PDF into Exam-Ready Notes Using StudyFetch",
    "steps": [
      { "number": 1, "title": "Upload the Complete Chapter PDF",
        "description_md": "Upload the full academic chapter PDF instead of splitting it into sections…",
        "tip": "Always upload the entire chapter to preserve context and improve output quality." },
      { "number": 2, "title": "Generate Structured Notes", "…": "…" },
      "…"
    ]
  }
}
```

`what_to_expect` carries pros/cons; `full_md` is the entire narrative guide as one Markdown
string (the token-heavy field — only request it when you'll actually read it). `tools_used` can
be `[]` even on structured pages.

```python
guide = await one("get_use_case", {"slug": "turn-academic-pdfs-into-revision-notes", "fields": ["step_guide", "faq"]})
```

```ts
const guide = await one('get_use_case', { slug: 'turn-academic-pdfs-into-revision-notes', fields: ['step_guide', 'faq'] })
```

---

### 3.15 `get_evidence`

Query the evidence graph: observation cells (tool × scenario × criterion) with verdict, note,
evidence state, and the real artifacts (screenshots) that prove it. This is the ground truth
behind every ranking.

| Param | Type | Required | Description |
|---|---|---|---|
| `tool` | string | no | Tool slug or name, e.g. `"llamaparse"`. Names are matched case-insensitively against the tool table. |
| `tools` | string[] | no | Several tools at once. |
| `scenario` | string | no | Scenario slug, cross-run `group_tag` (e.g. `"scanned-research-paper"`), or name fragment. |
| `criterion` | string | no | Criterion slug or name fragment, e.g. `"table"`. |
| `verdict` | `"worked"` \| `"mixed"` \| `"struggled"` \| `"failed"` | no | Verdict filter. |
| `evidence` | `"verified"` \| `"observed"` \| `"scored-only"` | no | Evidence-state filter. |
| `limit` | integer 1–200 | no | Max cells (default 50). |

**Response shape:** `{ count, cells[] }` where each cell is:

```json
{
  "id": "f90ab0f6-95ec-4cf6-9cd7-417c8f246f3b",
  "tool": { "name": "Llamaparse", "slug": "llamaparse", "url": "https://aidemos.com/tools/llamaparse" },
  "scenario": {
    "name": "Scanned Research Paper",
    "slug": "ocr-applied-scanned-research-paper",
    "group_tag": "scanned-research-paper",
    "description": "A scanned research-paper archetype built from an OCR-bearing original and rebuilt as an image-only PDF… Designed to stress OCR, multi-column reading order, figure/chart handling, table reconstruction…"
  },
  "criterion": { "name": "Advanced Features (Bonus)", "slug": "advanced-features-bonus" },
  "verdict": "worked",
  "score": null,
  "score_total": null,
  "note": "Extracts chart data into a structured table while preserving the legend-to-value mapping.",
  "evidence_state": "verified",
  "source": "first-party",
  "tested_at": "2026-06-20T12:50:49.514000+00:00",
  "artifacts": [
    { "url": "https://d3epheqghktydj.cloudfront.net/llamaparse-mountain-beetle-tree-mortality-chart-source.png",
      "role": null, "alt": "Bar chart of tree mortality by year and treatment…", "caption": null }
  ]
}
```

**Nullable fields, verified live:** `score`/`score_total` are `null` on all current cells (the
substrate carries verdict + note + artifacts today; scores live at the ranking/feature level).
`scenario`, `criterion`, and `evidence_state` can each be `null` on some cells — guard before
dereferencing. `evidence_state: "verified"` cells carry ≥1 artifact.

> ⚠️ **An unfiltered `get_evidence` does not enumerate the whole graph.** The server fetches the
> 200 most recent raw cells and then drops cells whose tool has no published page — an unfiltered
> call today returns 81 cells covering only 4 recently-tested tools (`easymotion`, `kodisc`,
> `futuresmart-ai-animator`, `claude-review`), while `get_evidence {"tool": "llamaparse"}` alone
> returns 141 cells. **Always filter by `tool`/`tools`/`scenario` when you want coverage of a
> specific tool**; use the unfiltered call only to sample what's been tested lately.

Real filtered example — where did LlamaParse fail?

```json
// get_evidence {"tool": "llamaparse", "verdict": "failed", "limit": 5}  → count: 5, cells e.g.:
// - "Sumitomo Heavy Industries Consolidated Financial Report" × "Table Preservation" (verified, 1 artifact):
//   "Extracts the table of contents as sequential text: entries and page numbers survive, but the
//    TOC relationships are not reconstructed as a structured table."
// - "Scanned Research Paper" × "Table Preservation" (verified, 2 artifacts):
//   "Fails to faithfully reconstruct grouped headers in a scanned table, making parent-child
//    column relationships ambiguous."
```

```bash
curl -s $MCP -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
  "params":{"name":"get_evidence","arguments":{"tool":"llamaparse","verdict":"failed","limit":5}}}' \
  | jq -r '.result.content[0].text' | jq '.cells[] | {scenario: .scenario.name, criterion: .criterion.name, note}'
```

```python
fails = await one("get_evidence", {"tool": "llamaparse", "verdict": "failed", "limit": 5})
```

```ts
const fails = await one('get_evidence', { tool: 'llamaparse', verdict: 'failed', limit: 5 })
```

---

### 3.16 `compare_tools`

Evidence-aligned comparison of two tools with honesty enforced structurally: what was tested on
the **same input** is separated from what merely looks comparable.

| Param | Type | Required | Description |
|---|---|---|---|
| `tool_a` | string | yes | First tool slug, e.g. `"llamaparse"`. |
| `tool_b` | string | yes | Second tool slug, e.g. `"landing-ai"`. |
| `criterion` | string | no | Restrict to one criterion (slug or name fragment). |

**Response shape** (note: the live keys are `unique_to_a` / `unique_to_b`):

```json
{
  "head_to_head":           [ { "same_input": true,  "<slug_a>": {cell}, "<slug_b>": {cell} }, "…" ],
  "related_not_same_input": [ { "same_input": false, "note": "Comparable dimension, but NOT tested on the same input.",
                                "<slug_a>": {cell}, "<slug_b>": {cell} }, "…" ],
  "unique_to_a":            [ {cell}, "…" ],
  "unique_to_b":            [ {cell}, "…" ]
}
```

Each pair entry is keyed by the **actual tool slugs** (e.g. `"llamaparse"` and `"landing-ai"`),
and each cell has the full `get_evidence` cell shape (verdict, note, artifacts, …).

How to read the four buckets:

| Bucket | Meaning | How much to trust it |
|---|---|---|
| `head_to_head` | Both tools ran the **same scenario** (same input document/prompt). | A provable same-input comparison — cite it freely. |
| `related_not_same_input` | Same criterion / scenario group, but **different runs on different inputs**. | Directional only. The `note` field flags this explicitly — never present it as a same-input result. |
| `unique_to_a` / `unique_to_b` | Cells only one tool has. | Coverage asymmetry, not superiority. 116 unique cells for Landing AI vs 8 for LlamaParse mostly means Landing AI was probed on more criteria. |

Live example (truncated):

```json
// compare_tools {"tool_a": "llamaparse", "tool_b": "landing-ai", "criterion": "table"}
// → head_to_head: 54 pairs, related_not_same_input: 0, unique_to_a: 4, unique_to_b: 30
{
  "head_to_head": [
    {
      "same_input": true,
      "llamaparse": {
        "scenario": { "name": "Scanned Research Paper", "…": "…" },
        "criterion": { "name": "Table Preservation", "…": "…" },
        "verdict": "worked",
        "note": "Preserves the multi-level column organization of a scanned table and its grouped header/data relatio…",
        "evidence_state": "verified", "artifacts": ["…"]
      },
      "landing-ai": {
        "scenario": { "name": "Scanned Research Paper", "…": "…" },
        "verdict": "worked",
        "note": "Preserves nested table structure and values in a scanned document, including multi-level layouts.",
        "…": "…"
      }
    },
    "… 53 more pairs"
  ],
  "related_not_same_input": [],
  "unique_to_a": ["… 4 cells"],
  "unique_to_b": ["… 30 cells"]
}
```

Unknown tool → `{"error": "Tool not found: no-such-tool (published tools only)"}` (verified live).

```python
cmp = await one("compare_tools", {"tool_a": "llamaparse", "tool_b": "landing-ai", "criterion": "table"})
```

```ts
const cmp = await one('compare_tools', { tool_a: 'llamaparse', tool_b: 'landing-ai', criterion: 'table' })
```

---

## 4. Practical Workflows

Multi-call walkthroughs, all executed live. Python snippets assume the `one(...)` helper from §3.

### 4.1 Find the best AI resume parser for a use case

**Goal:** user says *"I need to parse resumes into structured JSON via an API."*

```python
# 1) Semantic search — meaning, not keywords. Hybrid (default) also surfaces the ranking page.
hits = await one("search", {"query": "parse resumes into structured JSON", "limit": 8})
# → ranking "resume-parsing-api" (meta.winner: "Affinda") + tools llamaparse/extracta-labs/affinda/…

# 2) The ranking is the right entry point — it compared the candidates on the same inputs.
ranking = await one("get_ranking", {"slug": "resume-parsing-api", "fields": ["winner", "tools", "methodology"]})
# → winner Affinda (Best), then Airparser, LlamaParse (Best), Extracta Labs (Usable), Hrflow (Needs work)
# → methodology: 3 fixed inputs (clean single-column / multi-column sidebar / messy real-world resume)

# 3) Project only what you need from the winner's page.
pricing = await one("get_tool", {"slug": ranking["winner"]["slug"], "fields": ["pricing", "fit"]})
# → free 14-day plan (tested), $800/yr Tier 1 w/ API access; fit.use_if: "building an ATS… structured JSON output"
```

Three calls, ~6 KB total, and the recommendation carries a methodology and a pricing footnote.

### 4.2 Compare two tools with evidence

**Goal:** *"EasyMotion vs Kodisc for text-to-diagram animation — which, and prove it."*

```python
cmp = await one("compare_tools", {"tool_a": "easymotion", "tool_b": "kodisc"})
# live: head_to_head: 8 · related_not_same_input: 9 · unique_to_a: 10 · unique_to_b: 9
```

Interpretation discipline:

- **`head_to_head` (8 pairs)** — e.g. both ran the "RAG Ingestion Pipeline" scenario; EasyMotion
  `worked` ("Generates all six requested RAG-ingestion stages… with the final split to two storage
  targets explicitly drawn", verified + artifact) and Kodisc `worked` on the same input. These
  pairs are your citable comparison.
- **`related_not_same_input` (9 pairs)** — e.g. EasyMotion on "AI Workflow with Human-in-the-Loop"
  vs Kodisc on "Multi-stage AI content moderation pipeline": same `group_tag`
  (`conditional-flow`), different inputs. Each pair carries
  `"note": "Comparable dimension, but NOT tested on the same input."` — quote the caveat if you
  use these at all.
- **`unique_to_a`/`unique_to_b`** — coverage gaps. Don't score them against each other.

Narrow to one dimension with `criterion` (`{"criterion": "branching"}` → 2 head-to-head pairs on
"Branching & Logic").

### 4.3 Retrieve the evidence behind a ranking

**Goal:** the diagram-animations ranking says EasyMotion wins — show me the receipts.

```python
r = await one("get_ranking", {
    "slug": "best-ai-tools-to-generate-diagram-animations-from-text-descriptions",
    "fields": ["winner", "tools", "breakdown"]})
# winner: EasyMotion (Best). Ranked: EasyMotion, Claude AI, Vismo Studio, Academa AI, Kodisc.

for t in r["tools"]:
    slug = t["slug"]                       # ⚠️ null for Claude AI / Vismo / Academa (unlinked entries)
    if not slug:
        continue                           # or search by t["name"] — evidence may exist under its own slug
    ev = await one("get_evidence", {"tool": slug, "evidence": "verified", "limit": 20})
    # ev["cells"][n]["artifacts"][0]["url"] → CloudFront PNG of the actual generated diagram
```

Live sample cell (EasyMotion × "RAG Ingestion Pipeline" × "Text-to-animation accuracy"):
verdict `worked`, `evidence_state: "verified"`, artifact
`https://d3epheqghktydj.cloudfront.net/best-ai-tools-to-generate-diagram-animations-from--rag-ingestion-pipeline-thumbnail.png`.

### 4.4 "Why did Tool A rank above Tool B?"

**Goal:** EasyMotion is #1, Kodisc #5 in the same ranking — why?

```python
r = await one("get_ranking", {
    "slug": "best-ai-tools-to-generate-diagram-animations-from-text-descriptions",
    "fields": ["tools", "breakdown"]})
```

Two complementary layers of the answer, both live data:

1. **Scores** — `tools[]`: EasyMotion 5/5 Ease of use, 4.5/5 Visual clarity, 5/5 Generation time;
   Kodisc 4/5, 3.5/5, 4/5. Taglines say it directly: EasyMotion = "Best practical MP4 generator
   for linear and branched flows"; Kodisc = "Browser-based Manim without setup, but weakest
   readability".
2. **Breakdown bullets** — `breakdown[0]` (EasyMotion) `worked`: "…included all core stages, used
   a readable horizontal layout… correctly branched the Embeddings step into separate Vector
   Database and Metadata Store outputs… produced a direct MP4"; `struggled`: "EasyMotion's
   limitation was conditional logic…".

Then drop to raw cells for the specific weakness:

```python
ev = await one("get_evidence", {"tool": "easymotion", "verdict": "struggled"})
# → 1 cell: "Multi-stage AI content moderation pipeline" × "Branching & Logic":
#   "…decision-diamond nodes and loop-back arrows are not fully supported"
```

The agent's answer is now three-deep: score → editorial reason → raw observation.

### 4.5 Fetch observations where a tool struggled or failed

**Goal:** *"Show me where FutureSmart AI Diagram Animator breaks."*

```python
ev = await one("get_evidence", {"tool": "futuresmart-ai-animator", "verdict": "failed", "limit": 10})
```

Live result (truncated):

```text
- "Multi-stage AI content moderation pipeline" × "Branching & Logic" (verified):
  "Fails to render the requested parallel branch; the output stays as a single linear chain…"
- "RAG Ingestion Pipeline" × "Branching & Logic" (verified):
  "The tool collapses a workflow that needs a storage split into a single linear chain…"
```

Run the same query per verdict (`failed`, `struggled`, `mixed`) to build a complete weakness
profile. Note the asymmetry matters: `failed` = the capability isn't there; `struggled`/`mixed` =
partial. Live counts for LlamaParse: 88 worked / 24 failed / 21 struggled / 8 mixed out of 141
cells — a tool with many `failed` cells can still rank `Best` overall if the failures are on
bonus criteria.

### 4.6 Retrieve screenshots / artifacts

**Goal:** get the actual test screenshots for a claim.

```python
ev = await one("get_evidence", {
    "tool": "llamaparse", "scenario": "scanned-research-paper",   # group_tag works here
    "criterion": "table", "evidence": "verified"})
for c in ev["cells"]:
    for a in c["artifacts"]:
        print(a["url"], "—", a["alt"])
# → https://d3epheqghktydj.cloudfront.net/llamaparse-….png — "Bar chart of tree mortality by year and treatment…"
```

Rules of thumb: filter `evidence: "verified"` when you need images (verified ⇒ ≥1 artifact);
`observed` cells have a researcher note but no artifact; artifact `role`/`caption` are frequently
`null` — `alt` is the reliable description. URLs are public CloudFront PNGs, directly embeddable.

### 4.7 Navigate persona → rankings → tools

**Goal:** *"What should a founder look at?"*

```python
p = await one("get_persona", {"slug": "founders"})       # slug from list_personas
# → 8 rankings (each with winner), 46 tools, 3 use cases

# pick a ranking ref and go deep:
r = await one("get_ranking", {"slug": "cold-email-tools", "fields": ["winner", "tools"]})
# → 1 Apollo (Best) · 2 Clay · 3 Hunter · 4 Snov.io · 5 Saleshandy (all Usable)
```

One `get_persona` call replaces five filtered list calls and mirrors the public
`/for/founders` landing page.

### 4.8 Pricing / API-support check across candidates

**Goal:** shortlist cold-email tools that have API access, with prices — minimum tokens.

```python
r = await one("get_ranking", {"slug": "cold-email-tools", "fields": ["tools"]})
for t in r["tools"]:
    d = await one("get_tool", {"slug": t["slug"], "fields": ["pricing", "fit"]})
    # d["pricing"]["plans"] → prices; d["fit"]["use_if"] → API mentions
```

Live signal from two of the five: Apollo `fit.use_if` includes *"You need Search API and
Enrichment API access for internal prospecting or enrichment automations"*, and its pricing shows
Free $0 (75 credits) → Basic $49/seat/mo → Professional $79 → Organization $119. Snov.io's ranking
tagline is *"Balanced prospecting stack with strong API coverage"*. Where `fit`/`pricing` don't
mention APIs, check `fields: ["features"]` — Apollo has a feature literally named
*"API search and enrichment access"* with verdict *"Automation-friendly."*

Each projected call is ~1–2 KB versus ~8 KB+ for a full envelope — a 5-tool loop costs less than
two unprojected `get_tool` calls.

---

## 5. AI Agent Examples

How an LLM agent with the 16 verbs registered chains them. Each example gives the reasoning
chain, the calls, and how the final answer is grounded in returned fields (every quoted value
below came back from the live server).

### 5.1 "Recommend the best outreach tool with API support and explain why"

**Reasoning chain:**

1. The constraint is *task* ("outreach") + *requirement* ("API support") → start with `search`
   (hybrid) rather than guessing slugs; filter to the requesting user's persona if known.
2. A ranking beats individual tool hits — same-input testing, explicit winner.
3. Verify the API requirement on the winner from `fit`/`features`, not from the ranking tagline.

**Calls:**

```python
hits = await one("search", {"query": "personalized cold email outreach", "persona": "founders", "limit": 5})
# → snov-io, saleshandy, … (tools) — and the agent spots they share the ranking cold-email-tools
r    = await one("get_ranking", {"slug": "cold-email-tools", "fields": ["winner", "tools"]})
# → winner: Apollo (badge Best, tagline "Best all-around outreach prospecting platform")
tool = await one("get_tool", {"slug": "apollo", "fields": ["fit", "pricing", "features"]})
```

**Grounded answer the agent can now write:**

> Recommend **Apollo** — it ranks #1 of 5 in AI Demos' cold-email ranking (badge: Best,
> "Best all-around outreach prospecting platform"). API support is confirmed on its tool page:
> *"You need Search API and Enrichment API access for internal prospecting or enrichment
> automations"* (`fit.use_if`) and the tested feature *"API search and enrichment access —
> Automation-friendly"*. Pricing starts at $0 (75 credits), Basic $49/seat/mo billed annually.
> Caveat from the same testing: *"Prospect relevance still needs manual validation — plausible
> matches are not the same as confirmed prospects."* Source: https://aidemos.com/best/cold-email-tools

Note the caveat: the structured data includes the negative findings, so the agent doesn't have to
choose between recommending and being honest.

### 5.2 "Compare OCR tools for invoice/document extraction"

**Reasoning chain:** semantic search (the need is a meaning, not a product name) → the ranking it
surfaces → `compare_tools` on the two finalists, restricted to the criterion that matters.

```python
hits = await one("search", {"query": "extract tables and text from scanned invoices", "mode": "semantic", "limit": 5})
# → upstage-ai (0.428), landing-ai (0.421), ranking "pdf-to-markdown-apis" (0.417), mistral-ai, extend-ai
r    = await one("get_ranking", {"slug": "pdf-to-markdown-apis", "fields": ["winner", "tools"]})
# → 1 Extend AI (Best) · 2 LlamaParse (Best, "Best for chart-to-table conversion")
#   · 3 Landing AI (Best) · 4 Mistral AI · … · 8 Nutrient.io (Needs work)
cmp  = await one("compare_tools", {"tool_a": "llamaparse", "tool_b": "landing-ai", "criterion": "table"})
# → 54 same-input head_to_head pairs, 0 related_not_same_input
```

**Grounding:** with 54 same-input pairs the agent compares verdict distributions per scenario. On
"Scanned Research Paper" × "Table Preservation" both `worked` — but LlamaParse also has a `failed`
cell on that same scenario/criterion: *"Fails to faithfully reconstruct grouped headers in a
scanned table, making parent-child column relationships ambiguous"* (verified, 2 artifacts), while
Landing AI's worst same-vein result is `mixed` (*"Text placed between columns is only partially
reflected…"*). The agent reports: both preserve nested tables; LlamaParse is stronger at
chart-to-table conversion (its ranking tagline + a verified cell: *"Extracts chart data into a
structured table while preserving the legend-to-value mapping"*); Landing AI degrades more
gracefully on grouped headers. Every clause maps to a cell id and artifact URL.

### 5.3 "Show all observations where Tool X failed"

**Reasoning chain:** this is a direct evidence-graph query — no search, no detail page needed.

```python
fails = await one("get_evidence", {"tool": "llamaparse", "verdict": "failed", "limit": 200})
```

**Grounding:** present each cell as *scenario × criterion → note*, tagging `evidence_state`.
Example output row: *Sumitomo Heavy Industries Consolidated Financial Report × Table Preservation
(verified, screenshot attached): "Extracts the table of contents as sequential text: entries and
page numbers survive, but the TOC relationships are not reconstructed as a structured table."*
If the user's tool has no cells, the response is `{count: 0, cells: []}` — the agent should say
"not yet tested in the evidence graph", not "no failures" (absence of evidence ≠ evidence of
absence; check `get_tool` → `features`/ranking `breakdown` for scored findings instead).

### 5.4 "Retrieve relevant context via semantic search before answering"

**Reasoning chain:** a RAG-style pre-step. Before answering any tool-selection question, one
semantic `search` (cheap, ~2 KB) tells the agent whether AI Demos has tested this space at all —
and which pages to pull.

```python
ctx = await one("search", {"query": user_question, "mode": "semantic", "limit": 8})
kinds = {x["kind"] for x in ctx["results"]}
# ranking present  → get_ranking(slug) is the backbone of the answer
# only tools       → get_tool per candidate, note there is no same-input ranking
# nothing relevant → say so; do NOT force a recommendation from thin results
```

Scores discriminate: in the live semantic call of §5.2 the relevant hits scored 0.41–0.43. A top
hit far below that (≈0.2–0.3) usually means the catalogue doesn't cover the topic — the honest
move is to say so rather than stretch a weak match.

### 5.5 Worked end-to-end transcript

**User:** *"I need to auto-generate an animated architecture diagram from a text description.
What tool should I use? It must handle branching logic."*

```text
AGENT: search {"query": "generate animated diagrams from text descriptions", "mode": "semantic", "limit": 5}
  ← ranking "best-ai-tools-to-generate-diagram-animations-from-text-descriptions" among top hits

AGENT: get_ranking {"slug": "best-ai-tools-to-generate-diagram-animations-from-text-descriptions",
                    "fields": ["winner", "tools", "breakdown"]}
  ← winner: EasyMotion (Best) — "Best practical MP4 generator for linear and branched flows"
  ← scores: Ease of use 5/5, Visual clarity 4.5/5, Generation time 5/5
  ← breakdown[EasyMotion].worked: "…correctly branched the Embeddings step into separate Vector
     Database and Metadata Store outputs…"
  ← breakdown[EasyMotion].struggled: "EasyMotion's limitation was conditional logic…"
     — user needs branching → probe this before recommending.

AGENT: get_evidence {"tool": "easymotion", "verdict": "struggled"}
  ← 1 cell: "Multi-stage AI content moderation pipeline" × "Branching & Logic" (observed):
    "…decision-diamond nodes and loop-back arrows are not fully supported"

AGENT: get_evidence {"tool": "easymotion", "scenario": "branching-pipeline", "evidence": "verified"}
  ← verified cell, RAG Ingestion Pipeline: "Generates all six requested RAG-ingestion stages in one
    animated flowchart… with the final split to two storage targets explicitly drawn."
    + artifact: https://d3epheqghktydj.cloudfront.net/best-ai-tools-to-generate-diagram-animations-
    from--rag-ingestion-pipeline-thumbnail.png

AGENT (final answer):
  "Use EasyMotion — it won AI Demos' 5-tool diagram-animation test (tested 2026-04) with 5/5 ease
   of use and direct MP4 export, and there is a verified, screenshot-backed observation of it
   correctly rendering a branching pipeline (one step splitting into two storage targets) [artifact
   link]. One caveat that matters for you: on a complex *conditional* workflow the researchers
   noted decision-diamond nodes and loop-back arrows are not fully supported. If your diagrams
   need multi-level if/else logic rather than structural branching, the same ranking placed
   Claude AI #2 specifically as 'Best for decision nodes, retries, and complex workflow logic'."
```

Every load-bearing claim in the final answer maps to a specific returned field: rank, score,
verdict, note, or artifact URL. That's the standard to hold agents to.

---

## 6. Best Practices

### Choosing a search mode

- **`hybrid` (default)** — safe general choice; keyword anchoring plus semantic recall. Use unless
  you have a reason not to.
- **`semantic`** — "what covers this need?" questions: paraphrased needs, no product names
  (*"remove background from product photos"*). Finds pages by content, not title. Slightly slower
  (embeds the query).
- **`keyword`** — exact names and slugs (*"llamaparse"*). Precise, zero fuzz — the live keyword
  call returned exactly 1 result where hybrid returned 5.

### Structured lookups beat search

If you already have a slug, call `get_*` directly — `search` is for *finding* slugs, not fetching
known entities. Same for traversal: `rankings_for_tool` / `tools_in_ranking` / `get_persona` are
single deterministic calls; don't reconstruct those joins from lists.

### Combine layers in order

The canonical pattern is **discovery → detail → evidence**: cheap refs to pick a target, one
projected envelope for the facts, evidence cells only for the claims you'll actually cite. Going
straight to `get_evidence` for "which tool is best" wastes tokens (cells are raw observations, not
conclusions); stopping at the ranking means you can't show receipts.

### Token efficiency

- **Use `fields` on all three detail verbs.** `get_tool` full envelope ≈ 8 KB; `fields: ["pricing"]`
  ≈ 1 KB. In loops over candidates this is the difference between 5 KB and 40 KB.
- **Set `limit`** on lists and search; `list_tools` unbounded returns all 128.
- Request Markdown fields (`full_md`, `in_depth_review_md`, `final_take`) only when you will read
  them — they're the heaviest fields in the envelopes.
- `compare_tools` has no limit parameter and can be very large (measured live: llamaparse ×
  landing-ai ≈ **513 KB** unscoped; still ≈ 187 KB with `criterion: "table"`; easymotion × kodisc
  ≈ 58 KB). Always pass `criterion` for heavily-tested pairs, and prefer `get_evidence` with
  tight filters + `limit` when you only need one side's cells.

### Common mistakes (all observed or verified live)

| Mistake | Reality |
|---|---|
| Scraping aidemos.com HTML | Everything on the page is available structured; scraping costs ~10× the tokens and breaks on redesigns. |
| Treating `related_not_same_input` as a same-input comparison | It is explicitly flagged `same_input: false` with a note saying so. Only `head_to_head` is provable. |
| Ignoring `evidence_state` | `verified` has artifacts; `observed` is a note without proof image; the field can also be `null`. Cite accordingly. |
| Assuming category ids are stable ints | They're UUIDs, or `null` on `derived` categories. Use slugs. |
| Assuming entity ids are one type | `774` (int) in one verb, `"774"` (string) in another, `null` on unlinked ranking entries. Compare loosely, join on slugs. |
| Assuming every ranked tool has a `slug` | Unlinked entries have `slug: null` (e.g. "Claude AI" in the diagram ranking). Fall back to `search` by name. |
| Retrying on `[]` | Empty is a valid, final answer — `[]`, `{count: 0, cells: []}`, and `null` (unknown slug) are not transient errors. |
| Treating unfiltered `get_evidence` as the whole graph | It returns only the ~200 most recent raw cells (81 after the published-tool filter, today). Filter by `tool` for real coverage. |
| Expecting `score` on evidence cells | All current live cells have `score: null` — verdict + note is the signal. Numeric scores live on ranking `tools[].scores` and tool `features[]` (where populated; Apollo's features are score-less by design). |
| Building on `rating` / `testing_history` | `null` / `[]` on current live tools — reserved, not populated. |

### Performance

- The endpoint is **stateless** — you can fire a bare `tools/call` without `initialize` (curl
  workflows rely on this). SDK clients do the handshake automatically.
- The first call after idle can be slower: a small server-side cache (≈60 s TTL) warms on use.
  Interactive latency after warm-up is sub-second for discovery verbs; `search` in semantic/hybrid
  mode adds an embedding round-trip.
- Responses are JSON **strings inside MCP content blocks** (`result.content[0].text`) — budget for
  the parse and don't assume the block is already an object.

### Freshness

- **Published-only.** Drafts and unpublished work never appear; the search index is rebuilt on
  reindex, so a just-published page may lag briefly.
- Every list/detail entity carries `updated_at`; evidence cells carry `tested_at`; rankings carry
  a human-readable `tested_as_of` (free-form string — `"2026-04"`, `"June 2026"`, or `null`);
  pricing blocks carry their own "checked <month>" footnotes. Surface these dates — a 2026-04 test
  of a fast-moving tool is a fact *about April 2026*.

---

*Verified against the live production endpoint on 2026-07-02 (74 live calls; server v2.0.0,
reading the `sip` canonical store). For endpoint/client setup and change history, see
[`AIDEMOS_MCP.md`](./AIDEMOS_MCP.md).*