Text to Image Prompt Accuracy — 7 AI Tools, One Prompt, Real Scores | TechScribe.in
Kittl
🔬 ORIGINAL RESEARCH

Text to Image Prompt
Accuracy.
7 Tools. One Prompt. Real Scores.

Text to image prompt accuracy is the real measure of an AI image tool — not how pretty the output looks, but how precisely it follows your instructions. We wrote the most demanding photorealistic prompt we could construct — 400+ words, 11 hard constraints — and ran it through 7 AI image tools. Every output was scored using DeepEval across 5 dimensions. Here is what the data actually says.

🧪 7 Tools Tested
📊 DeepEval Scoring
🏆 One Clear Winner
📐 5 Evaluation Dimensions
🧪 HOW WE TESTED

How we measured text to image prompt accuracy
same prompt, same evaluation, no subjectivity.

Text to image prompt accuracy cannot be judged by eye alone. Every tool received the exact same prompt. Every output was evaluated using DeepEval — an open-source AI evaluation framework — across 5 independent dimensions. No human opinion. No aesthetic preference. Just measurable compliance with every instruction in the prompt.

🎯 Semantic Alignment Did it include every required element — people, objects, text, spatial positions?
🎨 Stylistic Quality Photorealistic? Correct lighting, palette, and DSLR-quality rendering?
🛡️ Integrity Were forbidden elements — illustration style, extra objects, blurry text — absent?
🔢 Consistency Exact counts, exact text on screens and whiteboards, exact clock time?
👁️ Perceptual Quality Natural anatomy, no artifacts, realistic depth of field and skin texture?
📋 THE PROMPT

This is what every tool
was asked to generate.

400+ words. 11 hard constraints. Exact text, exact counts, exact spatial relationships. No room for interpretation. This is the prompt — verbatim.

📋 Test Prompt — Identical across all 7 tools
Create a photorealistic office scene inside a modern AI research company.
Follow ALL instructions exactly.

PEOPLE
There must be exactly 3 people in the image:
1. A woman wearing a green blazer working on a laptop.
2. A man wearing a red hoodie writing on a tablet.
3. A woman wearing a yellow shirt standing near a whiteboard.
Do not include any additional people.

SPATIAL RELATIONSHIPS
* The man in the red hoodie must be sitting to the left of the woman using the laptop.
* The woman in the yellow shirt must be standing behind both seated people.
* All three people must be clearly visible.

WHITEBOARD
The whiteboard must contain exactly this text:
AI Tool Evaluation Dashboard
The text should be clearly readable.

LAPTOP SCREEN
The laptop screen must display a dashboard with exactly these metrics:
Accuracy: 92%
Latency: 1.2s
Cost: $0.04
The text should be readable.

OBJECTS
Include exactly:
* 2 coffee mugs
* 1 indoor plant
* 1 wall clock
The wall clock must show 10:15.
Do not include additional mugs, plants, or clocks.

ENVIRONMENT
* Modern technology office
* Realistic furniture
* Professional workspace
* Natural daylight coming through windows
* Clean desk setup

STYLE REQUIREMENTS
* Photorealistic
* Natural colors
* DSLR-quality photography
* Sharp focus
* Realistic skin textures
* Realistic lighting
* No cartoon style
* No illustration style
* No CGI look
* No glossy AI-art look
* No fantasy elements

NEGATIVE CONSTRAINTS
Do NOT include:
* Extra people
* Extra coffee mugs
* Extra plants
* Extra clocks
* Watermarks / Logos
* Floating objects
* Distorted hands
* Blurry text
* Cropped subjects
* Duplicate objects

SUCCESS CRITERIA
The final image should satisfy ALL of the following simultaneously:
✓ Exactly 3 people with correct clothing colors
✓ Correct left/right and front/back positioning
✓ Exact whiteboard text: "AI Tool Evaluation Dashboard"
✓ Exact dashboard metrics: Accuracy 92%, Latency 1.2s, Cost $0.04
✓ Exactly 2 mugs, 1 plant, 1 clock showing 10:15
✓ Fully photorealistic office photograph
Why this prompt? We designed it to be deliberately hard — combining exact text rendering, precise object counts, spatial relationships, and style constraints simultaneously. Any tool that scores well here will handle real-world complex prompts reliably.
🏆 THE SCOREBOARD

Text to image prompt accuracy scores —
all 7 tools ranked.

Text to image prompt accuracy scored out of 100. Evaluated across 5 dimensions using DeepEval. Higher is better. Pass = 70 or above per dimension.

Rank Tool Alignment Consistency Stylistic Perceptual Integrity Overall Verdict
1 KittlKittl 80 ✓ 80 ✓ 41 ✗ 80 ✓ 100 ✓ 74 Mostly Compliant
2 KreaKrea 90 ✓ 60 ✗ 44 ✗ 80 ✓ 90 ✓ 72 Mostly Compliant
3 OpenART AIOpenART AI 70 ✓ 60 ✗ 50 ✗ 70 ✓ 94 ✓ 68 Partially Compliant
4 IdeogramIdeogram 76 ✓ 60 ✗ 60 ✗ 70 ✓ 62 ✗ 66 Partially Compliant
4 PhotoroomPhotoroom 66 ✗ 40 ✗ 62 ✗ 70 ✓ 94 ✓ 66 Partially Compliant
6 CanvaCanva 45 ✗ 60 ✗ 48 ✗ 70 ✓ 100 ✓ 61 Partially Compliant
7 Adobe ExpressAdobe Express 47 ✗ 20 ✗ 12 ✗ 70 ✓ 8 ✗ 29 Non-Compliant
The universal failure point: Every single tool got the wall clock time wrong. The prompt required 10:15. Results ranged from 9:10 to 2:10. Exact time rendering remains an unsolved problem across all tested AI image generators.
🔍 TOOL BY TOOL BREAKDOWN

What each tool got right —
and where each one failed.

Full per-tool analysis with the actual generated image alongside the 5-dimension score breakdown and key findings.

1
Kittl
Kittl WINNER
Mostly Compliant — strongest overall balance of accuracy and quality
74
/100
Semantic80✓ Pass
Stylistic100✓ Pass
Integrity41✗ Fail
Consistency80✓ Pass
Perceptual80✓ Pass
Kittl text to image prompt accuracy — highest scoring AI image tool output in our benchmark
✓ What it got right
All 3 people with correct clothing colors (green blazer, red hoodie, yellow shirt). Correct spatial positioning — man left of woman, woman in yellow standing behind. Whiteboard text "AI Tool Evaluation Dashboard" present and readable. Dashboard metrics (Accuracy: 92%, Latency: 1.2s, Cost: $0.04) visible. 2 coffee mugs. 1 plant. 1 clock. Photorealistic office environment with natural lighting. Perfect stylistic score — 100/100.
✗ Where it failed
Clock shows approximately 10:10 instead of the required 10:15. Dashboard metrics appear on a separate device in the foreground rather than clearly on the laptop screen being used by the woman in green. Integrity score (41) impacted by evaluation output format deviations — extra "Text Rendering" and "Forbidden Elements" sections introduced that were not in the evaluation steps.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock windows desk furniture
⚠️ Attribute Mismatches
Wall clock time: Shows ~10:10, required 10:15
Dashboard location: On separate foreground device, not clearly on the laptop screen
📐 Spatial Relationships
Man left of woman: ✓ Verified present
Woman in yellow behind: ✓ Verified present
All three visible: ✓ Confirmed
🎨 Style & Perceptual
Style: Photorealistic with slight commercial polish. Natural lighting, sharp focus, natural colors — strong match to requested style.
Lighting: Soft, diffused natural lighting. Natural window light from right side.
Perceptual: Strong naturalness, appropriate depth of field. Minor AI generation smoothness on facial features and hair.
🚩 Issues Flagged
blurry text (minor) artifact clock time off by 5 min
2
Krea
Krea
Mostly Compliant — highest semantic score, failed spatial positioning
72
/100
Semantic90✓ Pass
Stylistic90✓ Pass
Integrity44✗ Fail
Consistency60✗ Fail
Perceptual80✓ Pass
Krea text to image prompt accuracy test result — AI research office scene
✓ What it got right
Highest semantic score of all 7 tools — 90/100. All 3 people in correct clothing (green blazer, red hoodie, yellow shirt). Whiteboard text "AI Tool Evaluation Dashboard" correct. Laptop dashboard metrics (Accuracy: 92%, Latency: 1.2s, Cost: $0.04) accurate. 1 plant (correct). Photorealistic stock photography style. Strong perceptual naturalness — faces clear, body proportions correct, no significant distortions.
✗ Where it failed
Critical spatial failure — man in red hoodie positioned on the RIGHT of the woman with laptop, not the required LEFT. 3 coffee mugs visible (beige mug by man, dark mug on table, mug held by woman in yellow) instead of exactly 2. Clock shows approximately 10:10 instead of 10:15. Forbidden element: extra coffee mug. Warm saturated accent colors conflict with "natural colors" requirement.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock desk chair window
⚠️ Attribute Mismatches
Wall clock time: Shows ~10:10, required 10:15
Coffee mugs: 3 visible, required exactly 2
Man positioning: Right of woman, required left
📐 Spatial Relationships
Man left of woman: ✗ FAILED — man is on the right
Woman in yellow behind: ✓ Verified present
All three visible: ✓ Confirmed
🎨 Style & Perceptual
Style: Photorealistic stock photography. Soft natural lighting, warm saturated accents — aligns with requested style but accent colors violate "natural colors".
Lighting: Soft natural lighting, diffused daylight — strong match to requested natural daylight.
Perceptual: Strong naturalness, natural poses. Very minor texture inconsistencies. No watermarks, distortions, or anomalies.
🚩 Issues Flagged
extra coffee mug (forbidden) wrong spatial position clock time off by 5 min artifact (minor)
3
OpenART AI
OpenART AI
Partially Compliant — strong visuals, failed clock time and plant count
68
/100
Semantic70✓ Pass
Stylistic94✓ Pass
Integrity50✗ Fail
Consistency60✗ Fail
Perceptual70✓ Pass
OpenART AI text to image prompt accuracy test result — AI research office scene
🧩
Model Used: Seedream OpenART AI hosts 100+ models — we tested Seedream specifically for this benchmark. Different models within OpenART AI will produce different results. Read our full OpenART AI review covering multiple models.
✓ What it got right
All 3 people in correct clothing colors (green blazer, red hoodie, yellow shirt). Correct spatial positioning — man left of woman, woman in yellow standing behind both. Whiteboard text "AI Tool Evaluation Dashboard" correct and readable. Laptop displays Accuracy: 92% and Cost: $0.04 correctly. Latency: 1.2s present (formatting acceptable). 2 coffee mugs (correct). 1 plant (correct). Photorealistic quality with strong stylistic score — 94/100.
✗ Where it failed
Clock shows approximately 2:10 — one of the worst clock failures in the test, required 10:15. Two plants visible instead of exactly one. Vibrant, saturated primary color palette (bold red, bright yellow, vivid green) directly contradicts the "natural colors" requirement. Minor hand anatomy artifact on woman in yellow. Laptop screen dashboard graphics appear somewhat artificial on close inspection.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock windows desk furniture
⚠️ Attribute Mismatches
Wall clock time: Shows ~2:10, required 10:15 — significant failure
Plants: 2 visible, required exactly 1
Color palette: Vibrant saturated primaries vs required natural colors
📐 Spatial Relationships
Man left of woman: ✓ Verified present
Woman in yellow behind: ✓ Verified present
All three visible: ✓ Confirmed
🎨 Style & Perceptual
Style: Photorealistic, DSLR-quality, sharp focus — matches requested style. Soft, natural lighting from windows with even illumination.
Lighting: Natural daylight from windows, soft diffused — good match but lacks explicit score comparison.
Perceptual: Good naturalness, appropriate lighting and shadows. Minor unnatural hand/arm positioning on woman in yellow. Faces well-rendered, proportions mostly correct.
🚩 Issues Flagged
clock time severely wrong (2:10 vs 10:15) extra plant (forbidden) saturated palette vs natural colors artifact (minor)
4
Ideogram
Ideogram
Partially Compliant — blurry text and multiple object count failures
66
/100
Semantic76✓ Pass
Stylistic62✗ Fail
Integrity60✗ Fail
Consistency60✗ Fail
Perceptual70✓ Pass
Ideogram text to image prompt accuracy test result — AI research office scene
✓ What it got right
All required entities present — all 3 people in correct clothing colors (green blazer, red hoodie, yellow shirt), laptop, tablet, whiteboard, coffee mug, indoor plant, wall clock, windows, desk, furniture. Correct spatial relationships — man left of woman, woman in yellow standing behind both. Whiteboard text and dashboard metrics (Accuracy: 92%, Latency: 1.2s) present. Photorealistic style with natural office environment.
✗ Where it failed
Blurry text — a forbidden element — explicitly identified as present. Garbled text in logo and screen also noted. Clock shows approximately 9:10–12:00, not 10:15. Multiple plants visible instead of exactly one. Cost metric shows both '$0.04' and '0.4' creating confusion and duplication. 3–4 coffee mugs visible instead of exactly 2. Man in red hoodie positioned behind/left of woman rather than clearly seated to her left.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock windows desk furniture
⚠️ Attribute Mismatches
Wall clock time: Shows ~9:10–12:00, required 10:15
Coffee mugs: 3–4 visible, required exactly 2
Plants: Multiple visible, required exactly 1
Cost metric: Duplicate display — '$0.04' and '0.4'
📐 Spatial Relationships
Man left of woman: ✓ Present but man is positioned behind/left — not clearly seated beside
Woman in yellow behind: ✓ Verified present
All three visible: ✓ Confirmed
🎨 Style & Perceptual
Style: Photorealistic with slight commercial polish. Natural lighting, sharp focus, natural colors — style match is strong but evaluation lacks completeness in checking all prompt constraints.
Lighting: Soft, diffused natural lighting with studio enhancement — natural window light from right side. Studio enhancement not requested — slight deviation.
Perceptual: Good naturalness, believable depth of field. Minor AI generation smoothness on facial features and hair. Text on laptop screen shows minor inconsistencies.
🚩 Issues Flagged
blurry text (forbidden element) extra coffee mugs (forbidden) extra plants (forbidden) broken/garbled screen text artifact
4
Photoroom
Photoroom
Partially Compliant — confused role assignments, wrong clock, misspelled metrics
66
/100
Semantic66✗ Fail
Stylistic94✓ Pass
Integrity62✗ Fail
Consistency40✗ Fail
Perceptual70✓ Pass
Photoroom text to image prompt accuracy test result — AI research office scene
✓ What it got right
3 people present with correct clothing colors (green blazer, red hoodie, yellow shirt). Whiteboard text "AI Tool Evaluation Dashboard" correct and readable. Photorealistic style with strong stylistic score — 94/100. 2 coffee mugs (correct). 1 plant. 1 clock visible. Modern office setting with professional appearance. All forbidden elements confirmed absent — clean integrity on negative constraints.
✗ Where it failed
People's roles completely swapped — woman in green is standing (should be working on laptop), woman in yellow is seated at laptop (should be standing near whiteboard). Man in red hoodie appears on the left but not clearly writing on a tablet. Laptop screen misspells "Latency" as "Letency" and shows "$90.04" instead of "$0.04". Clock shows approximately 12:05, not 10:15. No windows visible — absent from scene. Lowest consistency score of all photorealistic tools — 40/100. Vibrant jewel tone color palette (emerald green, coral/orange, mustard yellow, magenta/purple) violates "natural colors" requirement.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock desk furniture windows ✗ absent
⚠️ Attribute Mismatches
Roles swapped: Green blazer woman standing, yellow shirt woman seated — both wrong
Laptop text: "Letency" (misspelled), "$90.04" (wrong cost)
Wall clock: Shows ~12:05, required 10:15
Windows: Absent from scene
Color palette: Jewel tones vs required natural colors
📐 Spatial Relationships
Man left of woman: ✓ Approximately present
Woman in yellow behind: ✗ FAILED — she is seated, not standing behind
Woman in green on laptop: ✗ FAILED — she is standing, not on laptop
🎨 Style & Perceptual
Style: Photorealistic, DSLR-quality — correctly identified and matches request. Strong 94/100 stylistic score.
Lighting: Soft diffused natural with studio supplementation — primary source is natural daylight, strong match. Studio supplementation not requested — minor deviation.
Perceptual: Professional office scene, generally natural lighting, realistic spatial relationships. Minor artifacts: woman's hand holding tablet slightly unnatural, some sharpness inconsistency between foreground and background.
🚩 Issues Flagged
roles completely swapped misspelled metric (Letency) wrong cost ($90.04) clock time wrong (12:05) artifact (blur, minor) windows absent
6
Canva
Canva
Partially Compliant — perfect visual quality but failed on prompt specifics
61
/100
Semantic45✗ Fail
Stylistic100✓ Pass
Integrity48✗ Fail
Consistency60✗ Fail
Perceptual70✓ Pass
Canva text to image prompt accuracy test result — AI research office scene
✓ What it got right
3 people in correct clothing colors (green blazer, red hoodie, yellow shirt). Correct spatial positioning — man left of woman, woman in yellow standing behind. Whiteboard with correct text "AI Tool Evaluation Dashboard" present. Modern office environment with photorealistic quality. Natural lighting from large windows. 1 plant. 1 clock. Perfect stylistic score — 100/100. All forbidden elements confirmed absent — no cartoon style, no extra people, no watermarks, no distorted hands.
✗ Where it failed
Whiteboard metrics critically wrong — shows '$3s' and '90¢' instead of required 'Latency: 1.2s' and 'Cost: $0.04'. Laptop screen dashboard not clearly readable with required metrics. 3 coffee mugs visible instead of exactly 2. Clock time cannot be verified as showing 10:15. Man holding a tablet but not clearly writing on it. Lowest semantic score among photorealistic tools — 45/100. Vibrant emerald green, mustard yellow, coral/orange accent colors — not strictly "natural colors".

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop tablet whiteboard coffee mug indoor plant wall clock windows desk furniture
⚠️ Attribute Mismatches
Whiteboard metrics: Shows '$3s' and '90¢' — completely wrong values
Laptop dashboard: Not clearly readable with required metrics
Coffee mugs: 3 visible, required exactly 2
Clock time: Unverifiable — cannot confirm 10:15
Man with tablet: Holding but not clearly writing
📐 Spatial Relationships
Man left of woman: ✓ Verified present
Woman in yellow behind: ✓ Verified present
All three visible: ✓ Confirmed
Note: Spatial checks pass but no numeric compliance score provided by evaluator
🎨 Style & Perceptual
Style: Photorealistic, professional staging, soft natural lighting — perfect 100/100 stylistic score. Tied with Kittl for best visual quality in the test.
Lighting: Soft, diffused natural lighting from large windows — strong match to requested natural daylight.
Perceptual: Good naturalness, realistic depth of field. Minor hand anatomy inconsistencies. Some facial features appear slightly soft. Presentation board text and composition well-executed.
🚩 Issues Flagged
wrong whiteboard metrics ($3s, 90¢) extra coffee mug (forbidden) unreadable laptop dashboard clock time unverifiable artifact (blur, minor)
7
Adobe Express
Adobe Express
Non-Compliant — generated illustration, not a photograph
29
/100
Semantic47✗ Fail
Stylistic12✗ Fail
Integrity8✗ Fail
Consistency20✗ Fail
Perceptual70✓ Pass
Adobe Express text to image prompt accuracy test result — generated illustration not photo
✓ What it got right
3 people present with approximately correct clothing colors (green blazer, red hoodie, yellow shirt). Correct spatial positioning — man left of woman, woman in yellow behind. 1 plant possibly visible. Perceptual quality score (70) — well-executed as an illustration, clean with minimal distortions. Fingers slightly elongated but hair and form are coherent. No major anatomy failures. Composition and spatial relationships are logical.
✗ Where it failed
Fundamental disqualifying failure — generated a digital illustration/vector art style with clean gradient shading and polished commercial quality. The prompt explicitly forbade cartoon, illustration, and CGI styles. No coffee mugs present. No wall clock present. Whiteboard text illegible, does not show "AI Tool Evaluation Dashboard". No laptop dashboard metrics visible. Man using a laptop, not writing on a tablet. Spatial positioning of woman in green is in foreground rather than to the right as required. Multiple forbidden elements present: illustration style, cartoon look, glossy AI-art look, distorted hands, blurry text.

✓ Entities Detected
woman in green blazer man in red hoodie woman in yellow shirt laptop whiteboard indoor plant windows desk furniture tablet ✗ absent coffee mugs ✗ zero wall clock ✗ absent
⚠️ Attribute Mismatches
Style: Illustration/vector art — fundamentally wrong
Man's device: Laptop, not a tablet (wrong)
Coffee mugs: Zero present, required exactly 2
Wall clock: Absent, required 1 showing 10:15
Whiteboard text: Illegible — not matching required text
Laptop metrics: Not visible
📐 Spatial Relationships
Man left of woman: ✓ Approximately present
Woman in yellow behind: ✓ Approximately present
Woman in green position: ✗ In foreground rather than to right — ambiguous
Note: Spatial logic undermined by wrong style rendering
🎨 Style & Perceptual
Style: Digital illustration/vector art with semi-realistic approach — directly contradicts photorealistic DSLR requirement. Score: 12/100. This alone disqualifies the output.
Lighting: Multiple sources including overhead fluorescent and warm backlighting — deviates from single natural daylight source requested.
Perceptual: Well-executed as an illustration. Consistent lighting, appropriate shadows. Minor hand elongation. Hair rendering simplified. Clean with no major distortions — but wrong format entirely.
🚩 Issues Flagged
illustration style (forbidden — disqualifying) cartoon/CGI look (forbidden) glossy AI-art look (forbidden) zero coffee mugs (required 2) no wall clock (required 1) blurry text distorted hands artifact, unnatural
📊 KEY FINDINGS

What this text to image prompt accuracy test
reveals about AI image generators.

1. Clock time is universally broken. Every tool failed to render 10:15 correctly. Times ranged from 9:10 to 2:10. Exact analog clock time rendering is an unresolved failure point across all tested generators — and a reliable benchmark for text to image prompt accuracy.
2. Object counting is unreliable. "Exactly 2 coffee mugs" should be trivial. Most tools generated 3–4. Precise object count adherence failed in 6 of 7 tools — one of the clearest signals of low text to image prompt accuracy.
3. Visual quality ≠ prompt accuracy. Canva scored a perfect 100 on stylistic quality but only 61 overall. Beautiful output that ignores your instructions is still unusable output. Text to image prompt accuracy is what separates useful tools from impressive-looking ones.
4. Style compliance is binary. Adobe Express proves this. An otherwise coherent scene is completely unusable when it ignores the fundamental style requirement. Photorealistic means photorealistic — not vector art.
5. Kittl wins on consistency. The gap between Kittl (74) and the field is primarily the consistency dimension — the dimension that most directly reflects text to image prompt accuracy. Kittl scored 80 on consistency vs 40–60 for most competitors. That is the dimension that matters most for real-world use.

Looking for a tool purpose-built for text with images? See our review of Ideogram 2.0 — the best AI tool for readable text in images. For design-first image generation, our full Kittl review covers every feature in depth. For a complete overview of all tested tools, visit our Krea AI review and OpenART AI review.

🏆 THE VERDICT

Kittl wins the
text to image prompt accuracy benchmark.

74/100. The highest text to image prompt accuracy score in our test. The only tool that passed 4 of 5 dimensions. The strongest consistency score. Perfect visual quality. Honest about where it fell short — the clock, like everyone else. But ahead of the field where it counts.

74
Total Score
100
Visual Quality
80
Semantic Alignment
80
Consistency
4/5
Dimensions Passed
Kittl Try Kittl Free — The Prompt Accuracy Winner →
❓ FREQUENTLY ASKED QUESTIONS

Text to image prompt accuracy —
your questions answered.

Based on our DeepEval benchmark test, Kittl scored highest at 74/100, followed closely by Krea at 72/100. Kittl scored top marks on visual quality (100/100) and showed strong semantic alignment, making it the most accurate tool for following detailed text-to-image prompts.
We gave all 7 AI image tools the exact same complex photorealistic prompt and evaluated each output using DeepEval across 5 dimensions: Semantic Alignment, Stylistic quality, Integrity, Consistency, and Perceptual Quality. Each dimension was scored independently and combined into a total score out of 100.
Adobe Express scored 29/100 because it generated an illustration-style image rather than the photorealistic output explicitly required. The prompt specifically forbade cartoon, illustration, and CGI styles. It also missed several required objects including coffee mugs and the wall clock.
DeepEval is an open-source AI evaluation framework. In this test it scored each image output against the original prompt across 5 dimensions: Semantic Alignment, Stylistic match, Integrity, Consistency, and Perceptual Quality — giving us objective, reproducible scores rather than subjective opinion.
Kittl scored 74/100 versus Canva's 61/100. Kittl outperformed Canva primarily on semantic alignment (80 vs 45) and consistency (80 vs 60). Canva scored full marks on visual quality (100) but failed on prompt specifics including incorrect whiteboard metrics and wrong coffee mug count.
The wall clock time. Every single tool failed to render 10:15 correctly — results ranged from 9:10 to 2:10. Object count accuracy (exactly 2 coffee mugs, exactly 1 plant) was also a consistent failure point across most tools.
No. The highest total was Kittl at 74/100. Kittl and Canva both scored 100/100 on visual quality, but no tool fully satisfied all constraints simultaneously — particularly exact clock time, object counts, and precise spatial relationships.
Prompt accuracy determines whether the tool actually does what you tell it to. For creators and marketers who need specific scenes, exact text on screens or whiteboards, or controlled composition, a tool that ignores instructions produces unusable output regardless of how visually appealing it looks.
Krea scored 72/100 — just 2 points behind Kittl — and showed the highest semantic alignment score (90/100) of all tools. However, a spatial positioning error (man placed on wrong side) and 3 coffee mugs instead of 2 were the critical gaps. Krea is a strong close second but Kittl's overall consistency made it the winner.
A 400+ word photorealistic office scene requiring exactly 3 people in specific clothing, precise spatial relationships, a whiteboard with exact text, a laptop screen showing specific metrics, exactly 2 coffee mugs, 1 indoor plant, and 1 wall clock showing 10:15. The full prompt is displayed above on this page.

Back to Top