Image Generation

Leonardo AI Review: Tested Hands-On (2026)

Name: Leonardo AI
Availability: InStock
Author: AI Demos

A simple reference-image generator that creates polished new scenes, but it did not keep the same character reliably in this test.

Visit Leonardo AI

6 outputs testedSingle-image workflowStrong scenes, weak identityExpression misses

TL;DR — our verdictUpdated May 2026 · 9 test artifacts

Beautiful scenes, weak character lock

Where it wins

You want attractive environments, outfits, and compositions from a single reference image.
You can tolerate a polished lookalike instead of exact facial continuity.
You value a simple upload-prompt-download workflow more than strict identity preservation.

Main limitation

You need the same face to stay clearly recognizable across multiple scenes.

Pricing (verified plans)

Free version 150 credits/dayEssential $10/monthPremium $24/monthUltimate $48/month

Strongest test artifacts

OUTPUT →Output: Cinematic animated clip →Output: Animated 3D cinematic scene →

Feature scores on this page: 5.4/10 (5 scored features)

✨

Our take

Leonardo AI was the weakest tool in this research for consistent characters. It reliably produced attractive environments, outfits, and compositions from a single reference image, but it repeatedly beautified and altered the face enough to break identity. The interrogation tests also exposed a tool-level weakness with emotional prompting: both "angry, guarded" scenes came back calm and neutral, and the near-profile stress test rotated the face so far away that identity could not be verified at all.

Hands-on workflow recording from the Leonardo AI test.

In-Depth Review

Our detailed analysis of Leonardo AI — features, performance, and real-world testing.

AI Demos Team

Expert Reviewer

Verified Review

Feature-by-Feature Breakdown

Reference-based character consistency

Leonardo could restage a reference image into new scenes, but it did not preserve the same face reliably.

4/10

▾

Test Summary

Feature tested: Reference-based character consistency

Result: Failed (4/10) — Leonardo could restage a reference image into new scenes, but it did not preserve the same face reliably.

Feature tested: Reference-based character consistency

Result: Failed (4/10)

Verdict: Leonardo could restage a reference image into new scenes, but it did not preserve the same face reliably.

Expected behavior: Leonardo's core capability here is generating new images from a single uploaded reference photo while changing the scene, pose, outfit, or environment. It was tested with a frontal portrait across a warm café close-up, desert horse-riding scene, and interrogation room; with a 3/4 restaurant portrait across interrogation room and street market scenes; and with a near-profile portrait in a rooftop golden-hour stress test.

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Input 1 was a full frontal portrait with all facial features clearly visible. Prompted scene: a cozy warm café close-up with a braid, sweater, and natural seate — Input 1

Observed output: Output artifact (Image): Leonardo rendered the café environment, warm mood, sweater, braid, and pose cleanly, but the face changed enough to read as a lookalike rather than the same per — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-warm-cafe.jpg

Input artifact: Input artifact (Image): Input 1 was a full frontal portrait with all facial features clearly visible. Prompted scene: a cozy warm café close-up with a braid, sweater, and natural seate — Input 1

Output artifact: Output artifact (Image): Leonardo rendered the café environment, warm mood, sweater, braid, and pose cleanly, but the face changed enough to read as a lookalike rather than the same per — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-warm-cafe.jpg

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): The same Input 1 frontal portrait was used. Prompted scene: a desert horse-riding image at sunset with action, outfit change, and a cinematic environment. — Input 1-1.Input 1

Observed output: Output artifact (Image): Leonardo produced a detailed desert setting, correct horse-riding action, and strong cinematic composition, but the subject no longer resembled the reference in — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-horseride.jpg

Input artifact: Input artifact (Image): The same Input 1 frontal portrait was used. Prompted scene: a desert horse-riding image at sunset with action, outfit change, and a cinematic environment. — Input 1-1.Input 1

Output artifact: Output artifact (Image): Leonardo produced a detailed desert setting, correct horse-riding action, and strong cinematic composition, but the subject no longer resembled the reference in — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-horseride.jpg

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Input 2 was a 3/4 restaurant portrait with softer lighting and partially hidden facial detail. Prompted scene: a crowded street market with a sari, walking pose — Input 2-3.Input 2

Observed output: Output artifact (Image): Leonardo handled the market environment, sari styling, walking pose, and overall realism well. Hair volume stayed closer to the reference here than in other sce — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-market.jpg

Input artifact: Input artifact (Image): Input 2 was a 3/4 restaurant portrait with softer lighting and partially hidden facial detail. Prompted scene: a crowded street market with a sari, walking pose — Input 2-3.Input 2

Output artifact: Output artifact (Image): Leonardo handled the market environment, sari styling, walking pose, and overall realism well. Hair volume stayed closer to the reference here than in other sce — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-market.jpg

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Input 3 was a near-profile portrait with one eye partly occluded and the face turned roughly 80-90 degrees. Prompted scene: rooftop golden hour, black top, beig — input 3.webp

Observed output: Output artifact (Image): Leonardo followed the rooftop setting, outfit, skyline, golden-hour lighting, and full-body pose instructions, but it rotated the face too far away from camera. — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input3-rooftop.jpg

Input artifact: Input artifact (Image): Input 3 was a near-profile portrait with one eye partly occluded and the face turned roughly 80-90 degrees. Prompted scene: rooftop golden hour, black top, beig — input 3.webp

Output artifact: Output artifact (Image): Leonardo followed the rooftop setting, outfit, skyline, golden-hour lighting, and full-body pose instructions, but it rotated the face too far away from camera. — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input3-rooftop.jpg

What changed: Image transformed into Image

Why it matters / Conclusion: Leonardo was good at making attractive scene variations from one image, but not at keeping the same person recognizably intact across those variations.

Leonardo's core capability here is generating new images from a single uploaded reference photo while changing the scene, pose, outfit, or environment. It was tested with a frontal portrait across a warm café close-up, desert horse-riding scene, and interrogation room; with a 3/4 restaurant portrait across interrogation room and street market scenes; and with a near-profile portrait in a rooftop golden-hour stress test.

INPUT

Input 1 was a full frontal portrait with all facial features clearly visible. Prompted scene: a cozy warm café close-up with a braid, sweater, and natural seated pose.

↓→

image

Output artifact for "Reference-based character consistency" test: Leonardo rendered the café environment, warm mood, sweater, braid, and pose cleanly, but the face changed enough to read as a lookalike rather than the same per, best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-warm-cafe.jpg

Leonardo rendered the café environment, warm mood, sweater, braid, and pose cleanly, but the face changed enough to read as a lookalike rather than the same person. Natural curls from the reference were replaced with smoother, more stylized hair, and the skin was polished into a commercial-photo look.

INPUT

The same Input 1 frontal portrait was used. Prompted scene: a desert horse-riding image at sunset with action, outfit change, and a cinematic environment.

↓→

image

Output artifact for "Reference-based character consistency" test: Leonardo produced a detailed desert setting, correct horse-riding action, and strong cinematic composition, but the subject no longer resembled the reference in, best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-horseride.jpg

Leonardo produced a detailed desert setting, correct horse-riding action, and strong cinematic composition, but the subject no longer resembled the reference in face shape, proportions, or overall identity. The result reads as a different fantasy character rather than the same person in a new scene.

INPUT

Input 2 was a 3/4 restaurant portrait with softer lighting and partially hidden facial detail. Prompted scene: a crowded street market with a sari, walking pose, and lively environment.

↓→

image

Output artifact for "Reference-based character consistency" test: Leonardo handled the market environment, sari styling, walking pose, and overall realism well. Hair volume stayed closer to the reference here than in other sce, best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-market.jpg

Leonardo handled the market environment, sari styling, walking pose, and overall realism well. Hair volume stayed closer to the reference here than in other scenes, but the face is turned away enough that full identity verification is difficult, so this was only a partial success for character consistency.

INPUT

Input 3 was a near-profile portrait with one eye partly occluded and the face turned roughly 80-90 degrees. Prompted scene: rooftop golden hour, black top, beige trousers, full-body pose with both arms raised.

↓→

image

Output artifact for "Reference-based character consistency" test: Leonardo followed the rooftop setting, outfit, skyline, golden-hour lighting, and full-body pose instructions, but it rotated the face too far away from camera., best-ai-tools-to-generate-consistent-characters-ac-leonardo-input3-rooftop.jpg

Leonardo followed the rooftop setting, outfit, skyline, golden-hour lighting, and full-body pose instructions, but it rotated the face too far away from camera. Almost no usable facial detail remained, so the test's core requirement—keeping a difficult near-profile identity still recognizable—was unmet.

Bottom Line

Leonardo was good at making attractive scene variations from one image, but not at keeping the same person recognizably intact across those variations.

Expression and atmosphere control

Leonardo repeatedly missed emotionally intense prompts and defaulted to calm, polished portraits.

3/10

▾

Test Summary

Feature tested: Expression and atmosphere control

Result: Failed (3/10) — Leonardo repeatedly missed emotionally intense prompts and defaulted to calm, polished portraits.

Feature tested: Expression and atmosphere control

Result: Failed (3/10)

Verdict: Leonardo repeatedly missed emotionally intense prompts and defaulted to calm, polished portraits.

Expected behavior: This capability was tested by giving Leonardo the same interrogation-room prompt with two different reference images. The goal was to see whether it could preserve identity while also delivering an angry, guarded expression and harsh institutional lighting.

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Input 1 was a full frontal portrait. Prompted scene: interrogation room with formal clothing, harsh overhead lighting, and an angry, guarded expression. — Input 1-2.Input 1

Observed output: Output artifact (Image): This was Leonardo's best identity result from Input 1: the face remained broadly recognizable. But the core emotional instruction failed. The subject looks calm — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-interrogation.jpg

Input artifact: Input artifact (Image): Input 1 was a full frontal portrait. Prompted scene: interrogation room with formal clothing, harsh overhead lighting, and an angry, guarded expression. — Input 1-2.Input 1

Output artifact: Output artifact (Image): This was Leonardo's best identity result from Input 1: the face remained broadly recognizable. But the core emotional instruction failed. The subject looks calm — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input1-interrogation.jpg

What changed: Image transformed into Image

Test case: Image → Image

Input type: Image

Input used: Input artifact (Image): Input 2 was a 3/4 warm indoor portrait. Prompted scene: the same interrogation-room setup with the same angry, guarded expression request. — Input 2-4.Input 2

Observed output: Output artifact (Image): Leonardo again returned a neutral expression instead of the requested intensity, confirming the miss was not specific to one reference image. The room reads mor — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-interrogation.jpg

Input artifact: Input artifact (Image): Input 2 was a 3/4 warm indoor portrait. Prompted scene: the same interrogation-room setup with the same angry, guarded expression request. — Input 2-4.Input 2

Output artifact: Output artifact (Image): Leonardo again returned a neutral expression instead of the requested intensity, confirming the miss was not specific to one reference image. The room reads mor — best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-interrogation.jpg

What changed: Image transformed into Image

Why it matters / Conclusion: Across two different references, Leonardo failed the same emotional prompt in the same way, which points to a tool-level limitation in expression control.

This capability was tested by giving Leonardo the same interrogation-room prompt with two different reference images. The goal was to see whether it could preserve identity while also delivering an angry, guarded expression and harsh institutional lighting.

INPUT

Input 1 was a full frontal portrait. Prompted scene: interrogation room with formal clothing, harsh overhead lighting, and an angry, guarded expression.

↓→

image

This was Leonardo's best identity result from Input 1: the face remained broadly recognizable. But the core emotional instruction failed. The subject looks calm and neutral rather than angry or guarded, the room is too neat, and the lighting lacks the harsh institutional feel described in the prompt.

INPUT

Input 2 was a 3/4 warm indoor portrait. Prompted scene: the same interrogation-room setup with the same angry, guarded expression request.

↓→

image

Output artifact for "Expression and atmosphere control" test: Leonardo again returned a neutral expression instead of the requested intensity, confirming the miss was not specific to one reference image. The room reads mor, best-ai-tools-to-generate-consistent-characters-ac-leonardo-input2-interrogation.jpg

Leonardo again returned a neutral expression instead of the requested intensity, confirming the miss was not specific to one reference image. The room reads more like a bright office than an interrogation setting, and the reference's dense natural curls were flattened into straighter, oilier-looking hair.

Bottom Line

Across two different references, Leonardo failed the same emotional prompt in the same way, which points to a tool-level limitation in expression control.

Image-to-Cinematic Video Generation (2D)

Strong — smooth motion and stable rendering

7.5/10

▾

Test Summary

Feature tested: Image-to-Cinematic Video Generation (2D)

Result: Passed (7.5/10) — Strong — smooth motion and stable rendering

Transforms static 2D illustrations into cinematic animated clips with smooth camera movement and visually stable rendering.

IMAGE

Slow cinematic close-up of a smiling girl holding clover leaves in a spring garden. Her hair flows in the breeze as cherry blossom petals drift around her and soft sunlight flickers through the trees. Warm, dreamy Studio Ghibli atmosphere with smooth, natural motion.

↓→

VIDEO

Bottom Line

The animation holds through the clip — but pause at the 4-second mark to check the hands against what the prompt asked for, and watch the environmental motion with sound on; the silence behind the bush movement is what actually lowers the cinematic feel here.

Cinematic Animation from 3D Scenes

Average — smooth movement but realism inconsistency

6.5/10

▾

Test Summary

Feature tested: Cinematic Animation from 3D Scenes

Result: Passed (6.5/10) — Average — smooth movement but realism inconsistency

Animates cinematic 3D scenes with environmental motion and cinematic transitions.

IMAGE

Input artifact for "Cinematic Animation from 3D Scenes" test: Slow cinematic dolly through a lively street market at sunset. People interact naturally, a donkey cart moves through the center, palm trees sway, birds fly ove, 3d image-1.png

Slow cinematic dolly through a lively street market at sunset. People interact naturally, a donkey cart moves through the center, palm trees sway, birds fly overhead, and the setting sun casts warm golden light and long shadows.

↓→

VIDEO

Bottom Line

The sunset sky stays visible throughout — but the birds mentioned in the prompt never appear at any point; and while the crowd motion looks natural overall, watch the two figures on the left footpath in the first 3 seconds — their walk breaks the scene in a way the wider crowd masks, and pause on the cart rider's face to see where the feature loss actually lands.

Realistic Image Animation

Moderate — stable motion but weak realism

6/10

▾

Test Summary

Feature tested: Realistic Image Animation

Result: Passed (6/10) — Moderate — stable motion but weak realism

Generates cinematic animation from realistic wildlife and photography-style images.

IMAGE

Slow cinematic push-in on a majestic tiger walking toward a rock at sunset. After settling into a dominant pose, the tiger breathes calmly, then lets out a powerful roar as clouds drift and trees sway in the warm evening breeze.

↓→

VIDEO

Bottom Line

Compare the realistic input image to the output — then watch the first 3 seconds to see where the realism actually lands.

Free version tested

The report only documented Leonardo's free-credit usage, not full paid-plan pricing.

TESTED

Free version

150 credits/day

Tested tier. Each generation used 40 credits.

Essential

$10/month

8,500 monthly tokens, 25.5k token bank, private generations, train 10 models/month, enhanced quality, 2 concurrent jobs, queue up to 5, unlimited collections.

Premium

$24/month

25k monthly tokens, 75k token bank, unlimited relaxed image generation, train 20 models/month, 3 concurrent jobs, queue up to 10, up to 6 reference images.

Ultimate

$48/month

60k monthly tokens, 180k token bank, unlimited image & video generation, train 50 models/month, 6 concurrent jobs, queue up to 20, Ultra quality access.

Leo for Teams

Custom Pricing

Shared token pool, unlimited team generations, shared collections, enterprise security, priority support, centralized billing and workspaces.

Research notes: 150 credits per day on the free version, with 40 credits per generation.

Is This Right For You?

A side-by-side guide based on our hands-on testing.

✓ Use This If

●You want attractive environments, outfits, and compositions from a single reference image.

●You can tolerate a polished lookalike instead of exact facial continuity.

●You value a simple upload-prompt-download workflow more than strict identity preservation.

✕ Skip This If

●You need the same face to stay clearly recognizable across multiple scenes.

●You need angry, guarded, or other high-intensity expressions to follow the prompt.

●You plan to use difficult angles like near-profile references and still need verifiable facial consistency.

Image GenerationOtherimage

Not reliably in this test. Leonardo produced six outputs from three references, and the identity usually drifted once the scene changed. The best result was only moderately faithful, the warm café image looked like a polished lookalike, the horse-riding scene became a different character, and the rooftop stress test turned the face so far away that identity could not be verified.

The interrogation-room result from Input 1 was the strongest identity match. The character was still recognizable, but the face was visibly refined and over-smoothed, so it was still not a perfect preservation of the reference.

Poorly in this research. Both interrogation-room prompts explicitly asked for an angry, guarded expression, and both outputs came back calm and neutral. Because the same failure happened on two different reference images, the report treats it as a tool-level limitation rather than an input-quality problem.

Not consistently. The report found hair texture was often the first identity cue Leonardo overrode. In the warm café scene, natural curls were replaced with smoother, more stylized hair, and in the Input 2 interrogation output, dense curls were flattened into straighter-looking hair.

Leonardo followed the rooftop setting, outfit, skyline, and warm lighting, but it rotated the face beyond the intended near-profile angle until almost no facial detail remained visible. That made the core goal of verifying the same identity impossible.

Yes. Leonardo accepted a single uploaded reference image in every tested scene without errors. The workflow was straightforward: upload the image, enter the prompt, choose aspect ratio and generation count, generate, and download the resulting images.

The researcher tested Leonardo's free version, which provided 150 credits per day. Each generation used 40 credits. The report did not include full paid-plan pricing.