Developer Tools & APIs

Nutrient.io

A developer-first PDF-to-markdown API that handles straightforward OCR and hierarchy well, but loses fidelity on complex tables, charts, and handwritten visual content.

Visit Nutrient.io

Tested on 3 PDF typesAPI workflowGood on clean OCR + hierarchyComplex tables and charts weak

Good text extraction, weak document fidelity

Nutrient.io worked as a hosted API for turning mixed PDFs into markdown, and it did a respectable job preserving headings, section structure, and readable OCR on straightforward pages. But in this use case, the hard parts were exactly where it slipped: complex financial tables lost header relationships, charts were flattened into text or garbled OCR, handwritten signatures were omitted, and one scanned title page came back in the wrong reading order. It looks usable for developer workflows that mostly need text and basic structure, but not for high-trust markdown conversion of complex PDFs without cleanup.

Walkthrough of PDF to Markdown workflow

In-Depth Review

Our detailed analysis of Nutrient.io — features, performance, and real-world testing.

AI Demos Team

Expert Reviewer

Verified Review

Feature-by-Feature Breakdown

Programmatic PDF-to-markdown extraction

Works via API, but the tested workflow depended on code rather than the web UI.

▾

Test Summary

Feature tested: Programmatic PDF-to-markdown extraction

Result: Passed — Works via API, but the tested workflow depended on code rather than the web UI.

Feature tested: Programmatic PDF-to-markdown extraction

Result: Passed

Verdict: Works via API, but the tested workflow depended on code rather than the web UI.

Expected behavior: Nutrient accepts multi-page PDFs and returns markdown output files programmatically. In testing, it processed an 84-page hybrid earnings report, an 18-page table-heavy financial report, and a scanned research paper. The researcher noted that the web UI timed out, so the successful path was the API-key workflow shown in Nutrient's documentation.

Test case: PDF document → Text/code file

Input type: PDF document

Input used: Input artifact (PDF document): 84-page hybrid earnings report with native text, charts, tables, and a scanned signatures page. — llamaparse-hybrid-earnings-pdf-1.pdf

Observed output: Output artifact (Text/code file): Nutrient returned a markdown file for the hybrid earnings report. The report notes that the web UI timed out, so this result was obtained through the API workfl — nutrient-io-nutrient-hybrid-earningspdf-output-2.md

Input artifact: Input artifact (PDF document): 84-page hybrid earnings report with native text, charts, tables, and a scanned signatures page. — llamaparse-hybrid-earnings-pdf-1.pdf

Output artifact: Output artifact (Text/code file): Nutrient returned a markdown file for the hybrid earnings report. The report notes that the web UI timed out, so this result was obtained through the API workfl — nutrient-io-nutrient-hybrid-earningspdf-output-2.md

What changed: PDF document transformed into Text/code file

Test case: PDF document → Text/code file

Input type: PDF document

Input used: Input artifact (PDF document): 18-page table-heavy financial report. — llamaparse-sumitomo-financial-pdf-1.pdf

Observed output: Output artifact (Text/code file): Nutrient returned parsed markdown for the table-heavy financial report, which the researcher could copy or download. — nutrient-io-nutrient-financialpdf-output-1.md

Input artifact: Input artifact (PDF document): 18-page table-heavy financial report. — llamaparse-sumitomo-financial-pdf-1.pdf

Output artifact: Output artifact (Text/code file): Nutrient returned parsed markdown for the table-heavy financial report, which the researcher could copy or download. — nutrient-io-nutrient-financialpdf-output-1.md

What changed: PDF document transformed into Text/code file

Test case: PDF document → Text/code file

Input type: PDF document

Input used: Input artifact (PDF document): Scanned research paper used to test OCR and layout handling. — llamaparse-scanned-research-pdf-1.pdf

Observed output: Output artifact (Text/code file): Nutrient returned parsed markdown for the scanned research report, again through the API-based workflow. — nutrient-io-nutrient-scannedpdf-output-1.md

Input artifact: Input artifact (PDF document): Scanned research paper used to test OCR and layout handling. — llamaparse-scanned-research-pdf-1.pdf

Output artifact: Output artifact (Text/code file): Nutrient returned parsed markdown for the scanned research report, again through the API-based workflow. — nutrient-io-nutrient-scannedpdf-output-1.md

What changed: PDF document transformed into Text/code file

Why it matters / Conclusion: If you are comfortable calling an API, Nutrient can return markdown for mixed PDFs. If you need a dependable browser flow, this research did not show one: the UI timed out and the tested path was code-first.

Nutrient accepts multi-page PDFs and returns markdown output files programmatically. In testing, it processed an 84-page hybrid earnings report, an 18-page table-heavy financial report, and a scanned research paper. The researcher noted that the web UI timed out, so the successful path was the API-key workflow shown in Nutrient's documentation.

pdf

llamaparse-hybrid-earnings-pdf-1.pdf

84-page hybrid earnings report with native text, charts, tables, and a scanned signatures page.

↓→

markdown

nutrient-io-nutrient-hybrid-earningspdf-output-2.md

Loading file...

Nutrient returned a markdown file for the hybrid earnings report. The report notes that the web UI timed out, so this result was obtained through the API workflow instead.

pdf

llamaparse-sumitomo-financial-pdf-1.pdf

18-page table-heavy financial report.

↓→

markdown

nutrient-io-nutrient-financialpdf-output-1.md

Loading file...

Nutrient returned parsed markdown for the table-heavy financial report, which the researcher could copy or download.

pdf

llamaparse-scanned-research-pdf-1.pdf

Scanned research paper used to test OCR and layout handling.

↓→

markdown

nutrient-io-nutrient-scannedpdf-output-1.md

Loading file...

Nutrient returned parsed markdown for the scanned research report, again through the API-based workflow.

Bottom Line

If you are comfortable calling an API, Nutrient can return markdown for mixed PDFs. If you need a dependable browser flow, this research did not show one: the UI timed out and the tested path was code-first.

Reading order, hierarchy, and OCR text recovery

Good on straightforward pages, but not fully reliable on complex scanned layouts.

▾

Test Summary

Feature tested: Reading order, hierarchy, and OCR text recovery

Result: Partial — Good on straightforward pages, but not fully reliable on complex scanned layouts.

Feature tested: Reading order, hierarchy, and OCR text recovery

Result: Partial

Verdict: Good on straightforward pages, but not fully reliable on complex scanned layouts.

Expected behavior: Nutrient was strongest when the task was recovering readable text with section structure intact. It preserved heading-to-body relationships on a native-digital annual-report page, recovered dense prose and numeric details from a financial-report page, and handled a scanned two-column research section cleanly. The main weakness was page-level ordering on a more complex scanned first page, where abstract material appeared before the title block.

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Target 2015 Annual Report page titled 'A Growth Story Again' with heading, paragraph text, and bullets. — landing-ai-target-annual-report-growth-story-page.png

Observed output: Output artifact (Image): On the annual-report page titled 'A Growth Story Again,' Nutrient preserved the page heading, introductory paragraph, and bullet hierarchy in readable order, so — nutrient-io-target-annual-report-parsed-document-hierarchy.png

Input artifact: Input artifact (Image): Target 2015 Annual Report page titled 'A Growth Story Again' with heading, paragraph text, and bullets. — landing-ai-target-annual-report-growth-story-page.png

Output artifact: Output artifact (Image): On the annual-report page titled 'A Growth Story Again,' Nutrient preserved the page heading, introductory paragraph, and bullet hierarchy in readable order, so — nutrient-io-target-annual-report-parsed-document-hierarchy.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Financial-report prose page covering assets, liabilities, net assets, and cash flow. — nutrient-io-financial-summary-condition-page-9.png

Observed output: Output artifact (Image): On the financial-report page about assets, liabilities, net assets, and cash flow, Nutrient recovered the numbered sections and key JPY amounts as readable text — nutrient-io-financial-summary-ocr-hierarchy-page-8.png

Input artifact: Input artifact (Image): Financial-report prose page covering assets, liabilities, net assets, and cash flow. — nutrient-io-financial-summary-condition-page-9.png

Output artifact: Output artifact (Image): On the financial-report page about assets, liabilities, net assets, and cash flow, Nutrient recovered the numbered sections and key JPY amounts as readable text — nutrient-io-financial-summary-ocr-hierarchy-page-8.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned two-column page headed 'STUDY AREA'. — landing-ai-scanned-two-column-text-study-area.png

Observed output: Output artifact (Image): On the scanned two-column 'STUDY AREA' page, Nutrient kept the section heading attached to its content and converted the visible column text into coherent parag — nutrient-io-study-area-parsed-section-hierarchy.png

Input artifact: Input artifact (Image): Scanned two-column page headed 'STUDY AREA'. — landing-ai-scanned-two-column-text-study-area.png

Output artifact: Output artifact (Image): On the scanned two-column 'STUDY AREA' page, Nutrient kept the section heading attached to its content and converted the visible column text into coherent parag — nutrient-io-study-area-parsed-section-hierarchy.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned research-note first page with title, authors, abstract start, and margin notes. — nutrient-io-usda-research-note-title-page.png

Observed output: Output artifact (Image): On the scanned research-note first page, Nutrient placed the ABSTRACT and keywords before the title and author block. That makes the text readable, but it is a — nutrient-io-ocr-first-page-abstract-text.png

Input artifact: Input artifact (Image): Scanned research-note first page with title, authors, abstract start, and margin notes. — nutrient-io-usda-research-note-title-page.png

Output artifact: Output artifact (Image): On the scanned research-note first page, Nutrient placed the ABSTRACT and keywords before the title and author block. That makes the text readable, but it is a — nutrient-io-ocr-first-page-abstract-text.png

What changed: Image transformed into Image

Why it matters / Conclusion: Nutrient can produce clean, usable text from both digital and scanned pages when the layout is straightforward. But the title-page ordering miss means you should still spot-check complex scanned layouts before trusting downstream ingestion.

Nutrient was strongest when the task was recovering readable text with section structure intact. It preserved heading-to-body relationships on a native-digital annual-report page, recovered dense prose and numeric details from a financial-report page, and handled a scanned two-column research section cleanly. The main weakness was page-level ordering on a more complex scanned first page, where abstract material appeared before the title block.

image

Input artifact for "Reading order, hierarchy, and OCR text recovery" test: Target 2015 Annual Report page titled 'A Growth Story Again' with heading, paragraph text, and bullets., landing-ai-target-annual-report-growth-story-page.png

Target 2015 Annual Report page titled 'A Growth Story Again' with heading, paragraph text, and bullets.

↓→

image

Output artifact for "Reading order, hierarchy, and OCR text recovery" test: On the annual-report page titled 'A Growth Story Again,' Nutrient preserved the page heading, introductory paragraph, and bullet hierarchy in readable order, so, nutrient-io-target-annual-report-parsed-document-hierarchy.png

On the annual-report page titled 'A Growth Story Again,' Nutrient preserved the page heading, introductory paragraph, and bullet hierarchy in readable order, so the section stayed structurally coherent in the extracted output.

image

Input artifact for "Reading order, hierarchy, and OCR text recovery" test: Financial-report prose page covering assets, liabilities, net assets, and cash flow., nutrient-io-financial-summary-condition-page-9.png

Financial-report prose page covering assets, liabilities, net assets, and cash flow.

↓→

image

Output artifact for "Reading order, hierarchy, and OCR text recovery" test: On the financial-report page about assets, liabilities, net assets, and cash flow, Nutrient recovered the numbered sections and key JPY amounts as readable text, nutrient-io-financial-summary-ocr-hierarchy-page-8.png

On the financial-report page about assets, liabilities, net assets, and cash flow, Nutrient recovered the numbered sections and key JPY amounts as readable text blocks, showing that it can preserve dense report prose and section boundaries.

image

Scanned two-column page headed 'STUDY AREA'.

↓→

image

Output artifact for "Reading order, hierarchy, and OCR text recovery" test: On the scanned two-column 'STUDY AREA' page, Nutrient kept the section heading attached to its content and converted the visible column text into coherent parag, nutrient-io-study-area-parsed-section-hierarchy.png

On the scanned two-column 'STUDY AREA' page, Nutrient kept the section heading attached to its content and converted the visible column text into coherent paragraphs instead of interleaving both columns.

image

Input artifact for "Reading order, hierarchy, and OCR text recovery" test: Scanned research-note first page with title, authors, abstract start, and margin notes., nutrient-io-usda-research-note-title-page.png

Scanned research-note first page with title, authors, abstract start, and margin notes.

↓→

image

Output artifact for "Reading order, hierarchy, and OCR text recovery" test: On the scanned research-note first page, Nutrient placed the ABSTRACT and keywords before the title and author block. That makes the text readable, but it is a, nutrient-io-ocr-first-page-abstract-text.png

On the scanned research-note first page, Nutrient placed the ABSTRACT and keywords before the title and author block. That makes the text readable, but it is a real reading-order error for a page that mixes title matter, abstract, and body content.

Bottom Line

Nutrient can produce clean, usable text from both digital and scanned pages when the layout is straightforward. But the title-page ordering miss means you should still spot-check complex scanned layouts before trusting downstream ingestion.

Table extraction

Mixed to weak: simpler tables survive, but complex financial and scanned tables lose important structure.

▾

Test Summary

Feature tested: Table extraction

Result: Partial — Mixed to weak: simpler tables survive, but complex financial and scanned tables lose important structure.

Feature tested: Table extraction

Result: Partial

Verdict: Mixed to weak: simpler tables survive, but complex financial and scanned tables lose important structure.

Expected behavior: Nutrient can preserve the rough shape of simpler tables, including one scanned table with grouped columns, but it struggled as complexity increased. Across the hybrid earnings report, the table-heavy financial report, and the scanned research paper, the recurring failure mode was loss of row/column alignment and multi-level header relationships. The result was markdown that still contained many values, but often not in a form a human or pipeline could trust without cleanup.

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned table showing original and post-harvest diameters across four treatments. — mistral-ai-scanned-treatment-diameter-table.png

Observed output: Output artifact (Image): For the scanned treatment table, Nutrient preserved the basic grouped columns and row labels well enough for the table to remain mostly readable. It still intro — nutrient-io-parsed-table-stand-structure-before-after-cutting.png

Input artifact: Input artifact (Image): Scanned table showing original and post-harvest diameters across four treatments. — mistral-ai-scanned-treatment-diameter-table.png

Output artifact: Output artifact (Image): For the scanned treatment table, Nutrient preserved the basic grouped columns and row labels well enough for the table to remain mostly readable. It still intro — nutrient-io-parsed-table-stand-structure-before-after-cutting.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Target annual-report financial summary table with year columns and multiple financial line items. — landing-ai-target-annual-report-financial-summary-table-2.png

Observed output: Output artifact (Image): On the Target financial summary, Nutrient recovered many row labels and values, but the table was not faithfully reconstructed. Currency markers and columns bec — nutrient-io-target-annual-report-parsed-complex-table.png

Input artifact: Input artifact (Image): Target annual-report financial summary table with year columns and multiple financial line items. — landing-ai-target-annual-report-financial-summary-table-2.png

Output artifact: Output artifact (Image): On the Target financial summary, Nutrient recovered many row labels and values, but the table was not faithfully reconstructed. Currency markers and columns bec — nutrient-io-target-annual-report-parsed-complex-table.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Segment-performance table with multi-level headers and adjustment columns. — nutrient-io-financial-segment-table-cropped.png

Observed output: Output artifact (Image): On the segment table, Nutrient preserved some cell values but lost the source table's multi-level header organization. Parent-child column relationships were no — nutrient-io-segment-financial-table-by-business-unit.png

Input artifact: Input artifact (Image): Segment-performance table with multi-level headers and adjustment columns. — nutrient-io-financial-segment-table-cropped.png

Output artifact: Output artifact (Image): On the segment table, Nutrient preserved some cell values but lost the source table's multi-level header organization. Parent-child column relationships were no — nutrient-io-segment-financial-table-by-business-unit.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Quarterly consolidated income-statement table comparing previous and present first-quarter periods. — nutrient-io-quarterly-consolidated-income-statement-table.png

Observed output: Output artifact (Image): On the quarterly income-statement table, Nutrient captured the heading and early line items, but the comparison columns and later content were only partially re — nutrient-io-parsed-quarterly-income-statement-text.png

Input artifact: Input artifact (Image): Quarterly consolidated income-statement table comparing previous and present first-quarter periods. — nutrient-io-quarterly-consolidated-income-statement-table.png

Output artifact: Output artifact (Image): On the quarterly income-statement table, Nutrient captured the heading and early line items, but the comparison columns and later content were only partially re — nutrient-io-parsed-quarterly-income-statement-text.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned table titled 'Trees killed per acre by cutting block, year, cause, and diameter.' — nutrient-io-table-trees-killed-per-acre.png

Observed output: Output artifact (Image): On the complex scanned table, Nutrient lost structural boundaries as table complexity increased. Rows were clipped, some labels were misread, and the relationsh — nutrient-io-parsed-table-trees-killed-per-acre-1.png

Input artifact: Input artifact (Image): Scanned table titled 'Trees killed per acre by cutting block, year, cause, and diameter.' — nutrient-io-table-trees-killed-per-acre.png

Output artifact: Output artifact (Image): On the complex scanned table, Nutrient lost structural boundaries as table complexity increased. Rows were clipped, some labels were misread, and the relationsh — nutrient-io-parsed-table-trees-killed-per-acre-1.png

What changed: Image transformed into Image

Why it matters / Conclusion: Nutrient is acceptable for simpler tables, but it was not dependable on the exact table-heavy cases this use case cares about most: financial summaries, multi-level headers, and dense scanned matrices.

Nutrient can preserve the rough shape of simpler tables, including one scanned table with grouped columns, but it struggled as complexity increased. Across the hybrid earnings report, the table-heavy financial report, and the scanned research paper, the recurring failure mode was loss of row/column alignment and multi-level header relationships. The result was markdown that still contained many values, but often not in a form a human or pipeline could trust without cleanup.

image

Scanned table showing original and post-harvest diameters across four treatments.

↓→

image

Output artifact for "Table extraction" test: For the scanned treatment table, Nutrient preserved the basic grouped columns and row labels well enough for the table to remain mostly readable. It still intro, nutrient-io-parsed-table-stand-structure-before-after-cutting.png

For the scanned treatment table, Nutrient preserved the basic grouped columns and row labels well enough for the table to remain mostly readable. It still introduced OCR mistakes in the first numeric column, turning 7.8, 7.7, 7.4, and 7.5 into 78, 77, 74, and 75.

image

Target annual-report financial summary table with year columns and multiple financial line items.

↓→

image

Output artifact for "Table extraction" test: On the Target financial summary, Nutrient recovered many row labels and values, but the table was not faithfully reconstructed. Currency markers and columns bec, nutrient-io-target-annual-report-parsed-complex-table.png

On the Target financial summary, Nutrient recovered many row labels and values, but the table was not faithfully reconstructed. Currency markers and columns became uneven, and the relationship between rows and values weakened enough that the output read more like a flattened grid than a clean financial table.

image

Segment-performance table with multi-level headers and adjustment columns.

↓→

image

Output artifact for "Table extraction" test: On the segment table, Nutrient preserved some cell values but lost the source table's multi-level header organization. Parent-child column relationships were no, nutrient-io-segment-financial-table-by-business-unit.png

On the segment table, Nutrient preserved some cell values but lost the source table's multi-level header organization. Parent-child column relationships were no longer explicit, which makes the extracted structure harder to trust for analysis.

image

Input artifact for "Table extraction" test: Quarterly consolidated income-statement table comparing previous and present first-quarter periods., nutrient-io-quarterly-consolidated-income-statement-table.png

Quarterly consolidated income-statement table comparing previous and present first-quarter periods.

↓→

image

Output artifact for "Table extraction" test: On the quarterly income-statement table, Nutrient captured the heading and early line items, but the comparison columns and later content were only partially re, nutrient-io-parsed-quarterly-income-statement-text.png

On the quarterly income-statement table, Nutrient captured the heading and early line items, but the comparison columns and later content were only partially represented. The result is a truncated, simplified version of the source table rather than a faithful markdown reconstruction.

image

Scanned table titled 'Trees killed per acre by cutting block, year, cause, and diameter.'

↓→

image

Output artifact for "Table extraction" test: On the complex scanned table, Nutrient lost structural boundaries as table complexity increased. Rows were clipped, some labels were misread, and the relationsh, nutrient-io-parsed-table-trees-killed-per-acre-1.png

On the complex scanned table, Nutrient lost structural boundaries as table complexity increased. Rows were clipped, some labels were misread, and the relationships between treatment, year, cause, diameter classes, and totals no longer held together.

Bottom Line

Nutrient is acceptable for simpler tables, but it was not dependable on the exact table-heavy cases this use case cares about most: financial summaries, multi-level headers, and dense scanned matrices.

Chart and visual-content handling

Weak: charts lose their semantics, and handwritten visual content is not retained.

▾

Test Summary

Feature tested: Chart and visual-content handling

Result: Failed — Weak: charts lose their semantics, and handwritten visual content is not retained.

Feature tested: Chart and visual-content handling

Result: Failed

Verdict: Weak: charts lose their semantics, and handwritten visual content is not retained.

Expected behavior: Nutrient did not meaningfully preserve non-text visuals in this research. For charts, it sometimes recovered some labels or values, but not the axes, series relationships, or chart structure that make the figure interpretable. For a scanned signatures page, it extracted surrounding text and signer details but did not capture the handwritten signature marks themselves.

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Waterfall chart showing SG&A rate movement from 2013 to 2015. — llamaparse-sga-rate-waterfall-chart-1.png

Observed output: Output artifact (Image): Nutrient recovered some numbers and labels from the waterfall chart, but did not reconstruct axes, legend relationships, or chart type information. The chart wa — nutirent_hybrid_earningspdf_parsed_waterfall_chart.png

Input artifact: Input artifact (Image): Waterfall chart showing SG&A rate movement from 2013 to 2015. — llamaparse-sga-rate-waterfall-chart-1.png

Output artifact: Output artifact (Image): Nutrient recovered some numbers and labels from the waterfall chart, but did not reconstruct axes, legend relationships, or chart type information. The chart wa — nutirent_hybrid_earningspdf_parsed_waterfall_chart.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned line graph of average radial growth by cutting-block treatment from 1972 to 1981. — nutrient-io-figure-3-average-radial-growth-by-treatment.png

Observed output: Output artifact (Image): For the scanned line graph, Nutrient produced mostly garbled OCR text. The figure caption remained partly recognizable, but the plotted relationships and chart — nutrient-io-parsed-chart-forest-growth-cutting-blocks.png

Input artifact: Input artifact (Image): Scanned line graph of average radial growth by cutting-block treatment from 1972 to 1981. — nutrient-io-figure-3-average-radial-growth-by-treatment.png

Output artifact: Output artifact (Image): For the scanned line graph, Nutrient produced mostly garbled OCR text. The figure caption remained partly recognizable, but the plotted relationships and chart — nutrient-io-parsed-chart-forest-growth-cutting-blocks.png

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Scanned signatures page with handwritten signatures plus printed names and titles. — landing-ai-target-annual-report-signatures-page-2.png

Observed output: Output artifact (Image): On the scanned signatures page, Nutrient captured the heading, signer names, titles, and dates, but not the handwritten signature marks themselves. The output a — nutrient-io-target-signatures-ocr-extraction.png

Input artifact: Input artifact (Image): Scanned signatures page with handwritten signatures plus printed names and titles. — landing-ai-target-annual-report-signatures-page-2.png

Output artifact: Output artifact (Image): On the scanned signatures page, Nutrient captured the heading, signer names, titles, and dates, but not the handwritten signature marks themselves. The output a — nutrient-io-target-signatures-ocr-extraction.png

What changed: Image transformed into Image

Why it matters / Conclusion: If charts, figures, or handwritten marks matter to the fidelity of your markdown, Nutrient did not preserve them well enough in this test set.

Nutrient did not meaningfully preserve non-text visuals in this research. For charts, it sometimes recovered some labels or values, but not the axes, series relationships, or chart structure that make the figure interpretable. For a scanned signatures page, it extracted surrounding text and signer details but did not capture the handwritten signature marks themselves.

image

Waterfall chart showing SG&A rate movement from 2013 to 2015.

↓→

OUTPUT

Nutrient recovered some numbers and labels from the waterfall chart, but did not reconstruct axes, legend relationships, or chart type information. The chart was reduced to ordinary text rather than preserved as a meaningful visual representation.

image

Input artifact for "Chart and visual-content handling" test: Scanned line graph of average radial growth by cutting-block treatment from 1972 to 1981., nutrient-io-figure-3-average-radial-growth-by-treatment.png

Scanned line graph of average radial growth by cutting-block treatment from 1972 to 1981.

↓→

image

Output artifact for "Chart and visual-content handling" test: For the scanned line graph, Nutrient produced mostly garbled OCR text. The figure caption remained partly recognizable, but the plotted relationships and chart, nutrient-io-parsed-chart-forest-growth-cutting-blocks.png

For the scanned line graph, Nutrient produced mostly garbled OCR text. The figure caption remained partly recognizable, but the plotted relationships and chart layout were not preserved in usable form.

image

Scanned signatures page with handwritten signatures plus printed names and titles.

↓→

image

Output artifact for "Chart and visual-content handling" test: On the scanned signatures page, Nutrient captured the heading, signer names, titles, and dates, but not the handwritten signature marks themselves. The output a, nutrient-io-target-signatures-ocr-extraction.png

On the scanned signatures page, Nutrient captured the heading, signer names, titles, and dates, but not the handwritten signature marks themselves. The output also repeated some structured lines, reducing completeness and cleanliness.

Bottom Line

If charts, figures, or handwritten marks matter to the fidelity of your markdown, Nutrient did not preserve them well enough in this test set.

Pricing & Access

TESTED

Free

5,000 credits/month

Starter

$59/month

25,000 credits/month

Pro

$500

500,000 credits/month

Custom

Custom credit volume Volume discounts Dedicated support

Is This Right For You?

A side-by-side guide based on our hands-on testing.

✓ Use This If

●You need a hosted API that returns markdown files for mixed PDFs and you are comfortable working from an API key and code instead of relying on the web UI.

●Your documents are mostly straightforward report pages where readable OCR text and basic heading hierarchy matter more than perfect reconstruction of tables or charts.

●You can tolerate manual review of scanned title pages and visually complex sections before sending the markdown downstream.

✕ Skip This If

●You need complex financial tables preserved with reliable multi-level headers and row-to-value alignment.

●You need charts retained as meaningful visual elements instead of flattened labels or garbled OCR text.

●You need handwritten signatures or other non-text visuals preserved as part of the extracted document.

●You need flawless reading order on complex scanned layouts without spot-checking.

Developer Tools & APIsAPIstext

Yes. In this research, Nutrient accepted a scanned research paper and returned markdown output. It also OCR'd scanned pages inside a hybrid earnings report. The quality was mixed: straightforward scanned text came through reasonably well, but page-order mistakes, chart failures, and complex-table errors remained.

It did well on several straightforward pages. The annual-report page titled 'A Growth Story Again' kept its heading, paragraph, and bullets in order, and a scanned two-column 'STUDY AREA' page was turned into coherent paragraphs with the heading preserved. But on a scanned research-note first page, Nutrient placed the abstract before the title and authors, so reading order is not fully reliable on complex layouts.

Only inconsistently. A simpler scanned treatment table remained mostly readable, though OCR changed values like 7.8 to 78. But harder tables were weaker: the hybrid earnings-report financial summary lost alignment, the segment table lost multi-level header relationships, the quarterly income statement came back partially truncated, and a dense scanned mortality table broke down badly.

Not well in this test. The hybrid waterfall chart reportedly came back as flattened text without chart semantics, and the scanned line graph produced mostly garbled OCR text. In both cases, labels or values may survive, but the visual structure does not.

Only the surrounding text. On the scanned signatures page, Nutrient extracted the section heading, signer names, titles, and dates, but it did not extract the handwritten signature marks themselves.

The researcher's tested path was the API. The report says the UI hit a timeout, so the markdown output was produced through an API key and code snippet from Nutrient's documentation.

No. This Nutrient section includes a hybrid earnings report, a table-heavy financial report, and a scanned research paper, but it does not include a multilingual input or a degraded-scan stress test.