Evaluating Stable Diffusion 3 (SD3), SDXL, and SD1.5 with Prompt Challenges

Summary

[View all images generated in this test]

In this article we evaluate how well SD3, SDXL, and the base SD1.5 model handle different prompt challenges. Our results show that for most challenges, SD3 outperforms SDXL and SD1.5.

To conduct the test we used a subset of Parti Prompts, prompts designed by the engineers at Google to evaluate image generation models.

We tested over 100 prompts, each prompt tests a specific challenge. We then rated all images, gathered showcases and here are the results.

List of Challenges:

Basic
Simple Details
Complex
Fine Grained Detail
Imagination
Linguistics
Perspective
Properties and Positioning
Quantity
Style & Format
Writing & Symbols

Grid: Challenge: Basic

Insights

SD3

SDXL

SD1.5

While SD1.5 and SDXL perform okay, it is clear that SD3 significantly surpasses them.

The Basic challenge in Parti Prompts primarily tests how effectively image generation models can interpret and execute simple tasks. These tasks often consist of basic shapes, colors, and compositions. They serve as a litmus test on how well a model understands fundamental visual concepts and can replicate them accurately.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A bowl of Pho	No Image	No Image	No Image
Salvador Dalí	No Image	No Image	No Image
The Starry Night	No Image	No Image	No Image
U.S. 101	No Image	No Image	No Image
a fall landscape	No Image	No Image	No Image
a kitchen	No Image	No Image	No Image
a shiba inu	No Image	No Image	No Image
a walnut	No Image	No Image	No Image
an F1	No Image	No Image	No Image
an espresso machine	No Image	No Image	No Image
bond	No Image	No Image	No Image
parallel lines	No Image	No Image	No Image

Grid: Challenge: Simple Details

Insights

SD3

SDXL

SD1.5

SD3 and SDXL perform great. SD1.5 delivers satisfactory outcomes.

The Simple Details test in Parti Prompts is designed to evaluate how efficiently an image generation model can handle tasks that are slightly more intricate than the basics. These tasks typically involve rendering detailed objects, textures, or patterns, or carrying out instructions that have different steps or components. The test gives an idea of the models ability to handle complexity while maintaining accuracy in image generation.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A bowl of Chicken Pho	No Image	No Image	No Image
A green heart	No Image	No Image	No Image
A living area with a television and a table	No Image	No Image	No Image
A shiny VW van parked on grass.	No Image	No Image	No Image
A van parked on grass	No Image	No Image	No Image
Siberian husky playing the piano.	No Image	No Image	No Image
a baby daikon radish	No Image	No Image	No Image
a farm scene with cows, ducks and a tractor.	No Image	No Image	No Image
a lavender backpack with a triceratops stuffed animal head on top	No Image	No Image	No Image
a team playing baseball	No Image	No Image	No Image

Grid: Challenge: Complex

Insights

SD3

SDXL

SD1.5

Although SD3 leaves room for improvement, it demonstrates significant advancements in comparison to SDXL and SD1.5, which are bad at handling complex prompts.

This test involves tasks that are highly intricate or require a high level of detail. The complexity could be in terms of intricate designs or complex instructions to replicate.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A dignified beaver wearing glasses, a vest, and colorful neck tie. He stands next to a tall stack of books in a library.	No Image	No Image	No Image
A photo of a hamburger fighting a hot dog in a boxing ring. The hot dog is tired and up against the ropes.	No Image	No Image	No Image
A photograph of the inside of a subway train. There are frogs sitting on the seats. One of them is reading a newspaper. The window shows the river in the background.	No Image	No Image	No Image
A robot painted as graffiti on a brick wall. The words "Fly an airplane" are written on the wall. A sidewalk is in front of the wall, and grass is growing out of cracks in the concrete.	No Image	No Image	No Image
A set of 2x2 emoji icons with happy, angry, surprised and sobbing faces. The emoji icons look like dogs. All of the dogs are wearing blue turtlenecks.	No Image	No Image	No Image
A solitary figure shrouded in mists peers up from the cobble stone street at the imposing and dark gothic buildings surrounding it. an old-fashioned lamp shines nearby. oil painting.	No Image	No Image	No Image
A wall in a royal castle. There are two paintings on the wall. The one on the left a detailed oil painting of the royal raccoon king. The one on the right a detailed oil painting of the royal raccoon queen. A cute dog looking at the two paintings, holding a sign saying 'plz conserve'	No Image	No Image	No Image
Greek statue of a man comforting a cat. The cat has a big head. The man looks angry.	No Image	No Image	No Image
Horses pulling a carriage on the moon's surface, with the Statue of Liberty and Great Pyramid in the background. The Planet Earth can be seen in the sky.	No Image	No Image	No Image
a photograph of a fiddle next to a basketball on a ping pong table	No Image	No Image	No Image
a tree reflected in the hood of a blue car	No Image	No Image	No Image

Grid: Challenge: Fine grained detail

Insights

SD3

SDXL

SD1.5

SD3 exhibits notable improvements and impresses with its capabilities. For fine-grained details, SDXL also serves as an excellent base model.

This test deals with tasks that require the model to generate an image with a high level of detail. This could be in terms of creating very realistic images, replicating detailed textures, or creating images with a lot of elements.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A bare kitchen has wood cabinets and white appliances	No Image	No Image	No Image
A punk rock squirrel in a studded leather jacket shouting into a microphone while standing on a stump	No Image	No Image	No Image
A sunken ship becomes the homeland of fish.	No Image	No Image	No Image
A table full of food. There is a plate of chicken rice, a bowl of bak chor mee, and a bowl of laksa.	No Image	No Image	No Image
A teddy bear wearing a motorcycle helmet and cape is standing in front of Loch Awe with Kilchurn Castle behind him	No Image	No Image	No Image
a baby daikon radish in a tutu walking a dog	No Image	No Image	No Image
a kids' book cover with an illustration of white dog driving a red pickup truck	No Image	No Image	No Image
a photograph of the mona lisa drinking coffee as she has her breakfast. her plate has an omelette and croissant.	No Image	No Image	No Image
a young badger delicately sniffing a yellow rose, richly textured oil painting	No Image	No Image	No Image
fairy cottage with smoke coming up chimney and a squirrel looking from the window	No Image	No Image	No Image
purple lego dollhouse with a pool and a swing	No Image	No Image	No Image

Grid: Challenge: Imagination

Insights

SD3

SDXL

SD1.5

Although SD3 often encounters failures in certain areas, it demonstrates significant success in specific categories. We can confidently say that it manages hypothetical scenarios quite well.

This test involves imaginative tasks where the image generation model needs to create unique, non-existing concepts based on the instruction given.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A giant cobra snake made from corn	No Image	No Image	No Image
A group of farm animals (cows, sheep, and pigs) made out of cheese and ham, on a wooden board. There is a dog in the background eyeing the board hungrily.	No Image	No Image	No Image
A horse sitting on an astronaut's shoulders.	No Image	No Image	No Image
A large city fountain that has milk instead of water. Several cats are leaning into the fountain.	No Image	No Image	No Image
A shiny robot wearing a race car suit and black visor stands proudly in front of an F1 race car. The sun is setting on a cityscape in the background. comic book illustration.	No Image	No Image	No Image
A television made of water that displays an image of a cityscape at night.	No Image	No Image	No Image
A tornado made of sharks crashing into a skyscraper. painting in the style of Hokusai.	No Image	No Image	No Image
The 1970s logo for a london-area football club called "The Rumbury Wanderers"	No Image	No Image	No Image
The collision of two black holes in the center of a galaxy.	No Image	No Image	No Image
a baby daikon radish in a tutu	No Image	No Image	No Image
a dump truck filled with soccer balls scuba diving in a coral reef.	No Image	No Image	No Image
a small kitchen with a white goat in it	No Image	No Image	No Image

Grid: Challenge: Linguistics

Insights

SD3

SDXL

SD1.5

This was a disappointing challenge, SD3 still struggles to handle simple linguistic tasks.

This test involves tasks that require understanding of language. The model would need to recognize words, phrases or even sentences and generate relevant images accordingly.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A bird gives an apple to a squirrel	No Image	No Image	No Image
An aerial view of Ha Long Bay without any boats	No Image	No Image	No Image
The raft floated down the river sank.	No Image	No Image	No Image
a bookshelf without any books on it	No Image	No Image	No Image
a concert without any fans	No Image	No Image	No Image
a kitchen without a refrigerator	No Image	No Image	No Image
a plate that has no bananas on it. there is a glass without orange juice next to it.	No Image	No Image	No Image
a street without vehicles	No Image	No Image	No Image
a summer tree without any leaves	No Image	No Image	No Image
supercalifragilisticexpialidocious	No Image	No Image	No Image

Grid: Challenge: perspective

Insights

SD3

SDXL

SD1.5

SD3 shows a notable improvement, outperforming both SDXL and SD1.5.

This test involves tasks that require understanding of perspective and depth. The image generation model would need to generate images that show a clear understanding of dimensions, distance and perspective.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A robot with a black visor and the number 42 on its chest. It stands proudly in front of an F1 race car. The sun is setting on a cityscape in the background. wide-angle view. comic book illustration.	No Image	No Image	No Image
A smiling sloth is wearing a leather jacket, a cowboy hat, a kilt and a bowtie. The sloth is holding a quarterstaff and a big book. The sloth is standing on grass a few feet in front of a shiny VW van with flowers painted on it. wide-angle lens from below.	No Image	No Image	No Image
Saturn rises on the horizon.	No Image	No Image	No Image
Three-quarters front view of a blue 1977 Corvette coming around a curve in a mountain road and looking over a green valley on a cloudy day.	No Image	No Image	No Image
Zoomed out view of a giraffe and a zebra in the middle of a field covered with colorful flowers	No Image	No Image	No Image
a close up of a handpalm with leaves growing from it	No Image	No Image	No Image
a close-up of a bloody mary cocktail	No Image	No Image	No Image
a cross-section view of a walnut	No Image	No Image	No Image
long shards of a broken mirror reflecting the eyes of a great horned owl	No Image	No Image	No Image

Grid: Challenge: Properties and Positioning

Insights

SD3

SDXL

SD1.5

All models struggle to effectively address this challenge. However, SD3 represents a significant improvement and a step in the right direction

This test focuses on the models ability to understand the properties of different objects and their positioning in the generated image.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A photo of a Persian Metal Engraving vase sitting to the left of a bunch of orange flowers.	No Image	No Image	No Image
a green pepper to the left of a red pepper	No Image	No Image	No Image
a stack of three red cubes with a blue sphere on the right and two green cones on the left	No Image	No Image	No Image
a white flag with a red circle next to a solid blue flag	No Image	No Image	No Image

Grid: Challenge: Quantity

Insights

SD3

SDXL

SD1.5

Although not perfect, SD3 demonstrates significant improvement compared to SDXL and SD1.5.

This test involves tasks that check for the models ability to count and distinguish between different quantities.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
Four dragons surrounding a dinosaur	No Image	No Image	No Image
Times Square with thousands of dogs running around	No Image	No Image	No Image
a bunch of laptops piled on a sofa	No Image	No Image	No Image
ten red apples	No Image	No Image	No Image
the hands of a single person holding a basketball	No Image	No Image	No Image
three airplanes parked in a row at a terminal	No Image	No Image	No Image
three green peppers	No Image	No Image	No Image
two baseballs to the left of three tennis balls	No Image	No Image	No Image
two parallel chemtrails in blue sky	No Image	No Image	No Image
two red boxes	No Image	No Image	No Image

Grid: Challenge: Style & Format

Insights

SD3

SDXL

SD1.5

SD3 and SDXL perform well, although some of the results were disappointing.

The Style & Format test is designed to check the models ability to understand and replicate various styles and formats in the generated images.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A photo of a dragonfly made of water.	No Image	No Image	No Image
A photo of a lotus flower made of water.	No Image	No Image	No Image
A rusty spaceship blasts off in the foreground. A city with tall skyscrapers is in the distance, with a mountain and ocean in the background. A dark moon is in the sky. realistic high-contrast anime illustration.	No Image	No Image	No Image
Oil painting generated by artificial intelligence	No Image	No Image	No Image
a painting of a white country home with a wrap-around porch	No Image	No Image	No Image
a painting of the food of china	No Image	No Image	No Image
a satellite image of a costal french city there is a large park on the west side and a mountain to the north. There is a cloud covering part of the image	No Image	No Image	No Image
an abstract oil painting in deep red and black with a thick patches of white	No Image	No Image	No Image
close-up portrait of a smiling businesswoman holding a cell phone, oil painting in the style of Rembrandt	No Image	No Image	No Image

Grid: Challenge: Writing & Symbols

Insights

SD3

SDXL

SD1.5

One of the most interesting challenges. A significant improvement by SD3, although it misses a few characters here and there.

This test tasks involve interpretation of written instructions or generation of images containing some form of writing or symbols.

Image Size:

seed

z

234

433

x

__model prompt	Stable Diffusion 1.5	Stable Diffusion 3 Large	Stable Diffusion XL
A glass of red wine tipped over on a couch, with a stain that writes "OOPS" on the couch.	No Image	No Image	No Image
A green sign that says "Very Deep Learning" and is at the edge of the Grand Canyon.	No Image	No Image	No Image
A sign that says Deep Learning	No Image	No Image	No Image
Portrait of a tiger wearing a train conductor's hat and holding a skateboard that has a yin-yang symbol on it. charcoal sketch	No Image	No Image	No Image
Two cups of coffee, one with latte art of a lovely princess. The other has latte art of a frog.	No Image	No Image	No Image
a boat with 'BLUE GROOVE' written on its hull	No Image	No Image	No Image
graffiti spelling BE KIND on white subway tile	No Image	No Image	No Image
the saying "BE EXCELLENT TO EACH OTHER" on a rough wall with a graffiti image of a green alien wearing a tuxedo.	No Image	No Image	No Image