Developer Tools & APIs

Firecrawl

Reliable on JavaScript-heavy and bot-protected pages, but its markdown output usually needs a cleanup step.

Tested on 3 live URLsStrong JS renderingCloudflare bypassNoisy markdown

Strong access, weak extraction cleanup

Firecrawl performed well at the hard part of web extraction: it scraped a static recipe page, a JavaScript-hydrated Nike product page, and a Glassdoor jobs page behind anti-bot protections without manual selectors. The tradeoff is that its output behaved more like a flattened DOM dump than a semantically cleaned extraction, repeatedly mixing useful content with navigation, footer links, filters, and other page chrome. It looks best suited for pipelines that already include a downstream LLM or parser to clean the markdown.

Screen recording of the Firecrawl playground used during the hands-on evaluation.

In-Depth Review

Our detailed analysis of Firecrawl — features, performance, and real-world testing.

AI Demos Team

Expert Reviewer

Verified Review

Feature-by-Feature Breakdown

Single-URL page scraping to Markdown

It preserved article text and markdown structure, but did not meaningfully filter site boilerplate.

▾

Test Summary

Feature tested: Single-URL page scraping to Markdown

Result: Passed — It preserved article text and markdown structure, but did not meaningfully filter site boilerplate.

Firecrawl can take a public URL and return a Markdown version of the page without manual CSS selection. This was tested on Sally’s Baking Addiction’s chewy chocolate chip cookies page, where it preserved the article’s textual structure, including headings, lists, and linked sections, but also pulled large amounts of navigation, sidebar, review, and footer content into the same output.

INPUT

Sally’s Baking Addiction chewy chocolate chip cookies page, tested as a noisy static article to see whether Firecrawl could isolate the main content in clean Markdown.

↓→

image

Output artifact for "Single-URL page scraping to Markdown" test: Firecrawl returned a successful Markdown scrape of the Sally’s Baking Addiction recipe page. The output preserved the page title, heading structure, links, ingr, firecrawl-firecrawl-scrape-dashboard-nike-page.png

Firecrawl returned a successful Markdown scrape of the Sally’s Baking Addiction recipe page. The output preserved the page title, heading structure, links, ingredients, and recipe flow, but it also included major site-wide navigation items, category links, and other boilerplate instead of isolating only the main article body.

Bottom Line

Good raw Markdown conversion from a public URL, weak semantic cleanup.

JavaScript-rendered page extraction

Hydration worked, but cleanup did not.

▾

Test Summary

Feature tested: JavaScript-rendered page extraction

Result: Partial — Hydration worked, but cleanup did not.

Renders client-side JavaScript before extraction. On a Nike single-page product experience, Firecrawl waited for the page to hydrate and successfully captured the product title, price, and the full dynamically loaded size range from M 5 / W 6.5 through M 18 / W 19.5. The output still included raw code artifacts, media attachment matrices, localization links, and image URL trees.

INPUT

Dynamic Nike product page used to test asynchronous client-side rendering and extraction after JavaScript hydration.

↓→

OUTPUT

Firecrawl processed the dynamic link through its normal flow and handled browser rendering server-side. It extracted critical product state, including title, pricing, and the complete size menu, showing that JavaScript execution completed successfully. At the same time, the Markdown contained raw backend/code artifacts such as %ESI_AUDIENCE_SEGMENTATION%, plus global localization links, background asset tags, raw media attachment matrices, and image URL trees.

Bottom Line

Reliable for pulling data that only appears after hydration, but the returned Markdown is still noisy and messy.

JavaScript-rendered page extraction

It successfully waited for client-side rendering and captured dynamic product details, but the final Markdown remained cluttered.

▾

Test Summary

Feature tested: JavaScript-rendered page extraction

Result: Passed — It successfully waited for client-side rendering and captured dynamic product details, but the final Markdown remained cluttered.

Firecrawl can scrape pages that rely on client-side JavaScript hydration. This was tested on a Nike Air Force 1 product page, where it captured dynamic product details and the full size run after rendering, showing that the tool waited for the page’s JavaScript state to load before extracting content.

INPUT

Nike Air Force 1 ’07 men’s shoes page, tested as a JavaScript-heavy product page to check whether Firecrawl could wait for hydration and capture dynamic content.

↓→

image

Output artifact for "JavaScript-rendered page extraction" test: Firecrawl returned a successful Markdown scrape of the Nike product page with the rendered product title and key sections such as Size & Fit and Shipping & Retu, firecrawl-firecrawl-nike-scrape-markdown-output.png

Firecrawl returned a successful Markdown scrape of the Nike product page with the rendered product title and key sections such as Size & Fit and Shipping & Returns. The researcher also observed that the full menu of dynamically loaded size variations was captured, confirming JavaScript execution. However, the output still contained extra localization links, asset references, and raw media-related clutter instead of a tightly cleaned product extract.

Bottom Line

Strong JS rendering support, but not strong content cleanup.

Anti-bot protected page access

It got through the wall, but not cleanly.

▾

Test Summary

Feature tested: Anti-bot protected page access

Result: Partial — It got through the wall, but not cleanly.

Accesses protected public pages with built-in proxy rotation and user-agent handling. On a Glassdoor job listing, Firecrawl bypassed the Cloudflare edge layer and returned live job content including title, employer, salary estimates, and technical skill requirements. The extracted text was still broken up by UI elements such as search controls, action buttons, internal links, and login fields.

INPUT

Public Glassdoor job page used to test anti-bot resistance and extraction quality behind an interstitial/protected environment.

↓→

OUTPUT

Firecrawl bypassed the cloud proxy/firewall layer and returned text from the protected page without manual intervention. It pulled active software engineering job listings, company names, salary estimates, and required skill arrays. But the output mixed this core content with raw UI button text such as apply/search elements, search filter blocks, internal page links, and login-related fields.

Bottom Line

Strong access layer for protected public pages, but the extracted Markdown still needs post-processing to become usable.

Anti-bot page access

It got through a protected jobs page and pulled useful job data, but the extracted text was still mixed with interface noise.

▾

Test Summary

Feature tested: Anti-bot page access

Result: Passed — It got through a protected jobs page and pulled useful job data, but the extracted text was still mixed with interface noise.

Firecrawl can access and extract content from pages protected by anti-bot layers. This was tested on a Glassdoor software engineer jobs page, where it successfully returned live jobs-page content despite Cloudflare-style protections and heavy page chrome.

INPUT

Glassdoor software engineer jobs listing page, tested to see whether Firecrawl could get past anti-bot protections and extract usable content from a noisy jobs interface.

↓→

image

Output artifact for "Anti-bot page access" test: Firecrawl returned a successful Markdown scrape of the Glassdoor jobs page, including jobs-page text and navigation links. The researcher reported that it bypas, firecrawl-firecrawl-glassdoor-scrape-markdown-output.png

Firecrawl returned a successful Markdown scrape of the Glassdoor jobs page, including jobs-page text and navigation links. The researcher reported that it bypassed the page’s protection layer and pulled active job listings, company names, salary information, and technical skill details, but the resulting text was broken up by navigation controls, filters, internal links, and login-related layout elements.

Bottom Line

Very good at getting data out of protected pages, but the output is not clean enough to use as-is.

Zero-selector Markdown extraction

Accurate text capture, but poor semantic filtering.

▾

Test Summary

Feature tested: Zero-selector Markdown extraction

Result: Partial — Accurate text capture, but poor semantic filtering.

Converts a public URL into Markdown without manual DOM selection. On a noisy recipe blog, Firecrawl preserved the article structure, ingredients, and step-by-step baking instructions with strong textual fidelity, but it also scraped the full primary navigation tree, sidebar/history components, thousands of user review nodes, and the footer into the same Markdown output.

INPUT

Static but boilerplate-heavy recipe blog page used to test noise reduction and clean Markdown extraction.

↓→

OUTPUT

The URL processed successfully in the interface with zero custom CSS selection. Markdown formatting was structurally correct, including headings and lists, and the core article layout, ingredients table, and baking workflow were extracted accurately. However, semantic filtering was effectively absent: the output included multi-level primary navigation links, historical sidebar modules, thousands of review entries, and footer content mixed into the article body.

Bottom Line

Good at flattening page text into Markdown; not good at separating main content from site chrome.

Pricing

Subscription plans based on monthly API credits.

Free

$0/month

1,000 credits per month.

Hobby

$16/month

5,000 credits per month.

Standard

$83/month

100,000 credits per month.

Growth

$333/month

500,000 credits per month.

Scale

$599/month

1,000,000 credits per month.

Enterprise

Custom

Unlimited credits and a dedicated SLA.

Pricing was reported directly in the research notes.

Is This Right For You?

A side-by-side guide based on our hands-on testing.

✓ Use This If

●You need a scraper that can handle JavaScript-rendered pages without writing selectors.

●You need access to bot-protected public pages like jobs boards and can tolerate noisy output.

●You already have a downstream LLM, parser, or cleanup step that can remove navigation and layout clutter from markdown.

✕ Skip This If

●You need near-ready clean Markdown with navbars, footer links, review blocks, and filters already stripped out.

●You want semantic main-content extraction rather than a broad DOM flattening approach.

●You need validated evidence for schema-driven structured JSON extraction, because that path was not tested in this report.

Developer Tools & APIsAPIstext

Yes. In the Nike product-page test, Firecrawl successfully waited for client-side rendering and captured dynamic product information, including the rendered title, size guidance, shipping details, and the full loaded size range.

Yes in this test. On a Glassdoor software engineer jobs page, Firecrawl returned a successful result and the researcher reported that it bypassed the protection layer well enough to extract live jobs-page content, company names, salary information, and skill details.

The Markdown formatting itself was strong, but the semantic cleanup was weak across all three tests. Firecrawl repeatedly included navigation menus, footer links, filters, sidebars, review blocks, localization links, and other layout noise alongside the main content.

No. The researcher ran all three tests zero-shot through Firecrawl's interface without custom CSS selectors or manual field mapping.

No. The broader benchmark includes structured-data extraction in scope, and the Firecrawl interface showed a JSON option, but this hands-on report only documented Markdown scraping results. There is no direct evidence here for schema-driven JSON extraction quality.

The researcher observed raw Markdown being available directly in Firecrawl's interface through copyable result panels. The report did not document a separate file export workflow.

The report lists a Free plan at $0/month for 1,000 credits, Hobby at $16/month for 5,000 credits, Standard at $83/month for 100,000 credits, Growth at $333/month for 500,000 credits, Scale at $599/month for 1,000,000 credits, and Enterprise with custom pricing.