Black Forest Labs released a new version of Flux last week — 1.1 Pro — claiming it outperformed all other AI image generators on most benchmarks. My early tests found its realism was impressive, speed was fast and prompt adherence exceptional.
However, for a long time, Midjourney has held the 'most realistic AI image model' crown, not just in my tests but generally. With the release of v6.1 in July it even improved skin texture on people and atmospheric rendering.
To find out whether it had finally been toppled I came up with a series of tests for both Midjourney 6.1 and Flux 1.1 Pro running through Freepik (one of our AI image platforms of the year).
Here's how our Midjourney vs Flux comparison turned out.
Creating the prompts
I created 7 conceptual ideas to put Flux 1.1 Pro and Midjourney 6.1 through their paces. I then used best practice tips for each model to craft customized prompts. This ensures each model will be able to shine as they benefit from slightly different prompting techniques.
For example, when prompting Flux you want to focus on detailed elements such as the subject, style, composition, lighting and color pallet. You should also include technical details such as camera settings and atmosphere to improve realism.
For Midjourney you want to use descriptive keywords, adjectives and examples. Even more so than Flux you should focus on camera types, settings and environmental descriptions to improve realism while avoiding terms like "photorealistic" or "hyperrealistic," as Midjourney responds better to detailed visual descriptions.
For the test, the settings were largely set to default, although I switched Midjourney to RAW and turned off personalization. This was to better match Freepik's implementation of Flux 1.1 Pro.
Scoring the tests
I’ve come up with five areas to judge the two models including on detail and texture, lighting and shadows, composition and depth, color pallet and finally overall realism. You can read the full rubric including the breakdown of points for each of the five criteria in a Google Doc.
The first four of these criteria will be technical and carry the most points. They will be judged by ChatGPT using its impressive vision AI capabilities although I will have final say. The fifth is the subjective tests which will be judged by me without AI involvement.
When on default settings Freepik creates just one image using Flux 1.1 Pro, so of the four pictures created by Midjourney, I have always opted for the first image to keep it fair.
1. A mountain scene
First up we are going to see how well each model handles a landscape. Specifically a mountain at sunrise. We’re specifying the angle and want it with a high dynamic range.
Flux 1.1 Pro Prompt: A sweeping mountain landscape at sunrise, captured from a high-angle perspective using a wide-angle lens. The early morning light casts long shadows across the rugged terrain, with mist rolling over the valleys. The scene features sharp detail in the rocks, lush greenery, and clouds forming over distant peaks. Warm oranges and pinks dominate the sky, creating a dramatic and serene atmosphere. High dynamic range (HDR) captures the subtle transitions between light and shadow.
Midjourney 6.1 Prompt: Shot using a wide-angle lens, a dramatic mountain range at sunrise, the golden light spilling over the ridges, casting long, defined shadows. The camera is positioned high up, creating depth with layers of mist weaving through the valleys. The texture of the jagged rocks contrasts with the softness of the clouds and greenery below. Soft orange and pink hues fill the sky, bringing warmth to the scene. Clear focus on the details of the terrain with subtle mist movement.
ChatGPT analysis: "The Midjourney image had a more natural composition and believable depth, but both models struggled with textures that felt too smooth, leading to a slight "game-like" appearance. Midjourney scored higher due to better lighting and depth management."
I largely agreed with ChatGPT here, while both were compelling images and in some respects, Flux had some more realistic elements, overall the Midjourney image 'felt' more real.
2. A city in a blur
For the second test we’re going to create a more complex scene, moving from the mountains to the city where the AI has to depict cars and pedestrians, different lights and even water.
Flux 1.1 Pro Prompt: A busy city street at dusk, shot with a medium focal length lens to capture the motion of cars and pedestrians. Neon signs glow brightly, reflecting off wet pavement from a recent rain. Motion blur highlights the speed of passing vehicles, while people walking on the sidewalk are crisp and detailed. The lighting creates a mix of cool blues and vibrant reds, giving the scene a fast-paced, energetic vibe. Street-level composition emphasizes the urban density, with buildings towering in the background.
Midjourney 6.1 Prompt: Captured with a 50mm lens at street level, a bustling city street at dusk, where motion blur from cars zipping by contrasts with sharply defined pedestrians walking on the wet pavement. Neon lights from signs and storefronts reflect vivid colours on the ground, creating a dynamic mix of cool blues and bright reds. The composition highlights the movement, with buildings stretching upward, showing the depth and energy of the city.
ChatGPT analysis: "Despite both images having an artificial quality, Midjourney’s lighting and reflections felt more grounded, whereas Flux’s version felt overly digital and lacked realism in texture and blur effects."
I think the AI may have been a little harsh on Flux but what really sold Midjourney was the way it handled the non-blurred parts of the image. It created a more realistic depiction of the scene.
3. Human emotion
With this test we’re going full human. Here each of the models needs to capture emotion, specifically a middle-aged man in deep sorrow, shot with an 85mm lens.
Flux 1.1 Pro Prompt: A close-up portrait of a middle-aged man, illuminated by soft, natural window light. His face is worn with deep lines, and his eyes are filled with emotion, reflecting both sorrow and contemplation. Shot with a 85mm lens to focus sharply on the subject’s facial features while softly blurring the background. The lighting is gentle, creating subtle highlights and shadows that emphasize the texture of his skin. The muted colour palette of warm browns and greys enhances the intimate and reflective mood.
Midjourney 6.1 Prompt: Shot with an 85mm lens, a close-up portrait of an older man, his weathered face illuminated by soft, diffused natural light from a nearby window. His eyes carry deep emotion, expressing a mix of sorrow and reflection. The camera focuses tightly on his facial features, with the background blurred into soft, warm browns and greys. The lighting enhances the details of his skin texture, creating a quiet, intimate atmosphere.
ChatGPT analysis: "Midjourney’s detail on skin texture and facial expressions felt more lifelike compared to Flux, which had a slightly artificial sheen to the subject. Midjourney had better lighting and emotional depth, making it the clear winner."
I have seen both Midjourney and Flux create significantly better and more realistic depictions of humans than we see here. Midjourney is better, and by some margin but I suspect if it was a younger person Flux would have won.
4. A look to the future
Every AI model loves a good futuristic scene and most of them focus on lots of neon lights. Here I want it to be more subtle. It has to capture a high-tech concept and holograms.
Flux 1.1 Pro Prompt: A sleek, futuristic laboratory, filled with advanced holographic displays and robotic arms. The scene is captured using a 35mm lens with a shallow depth of field, focusing on a floating, transparent hologram in the centre. Cool blue and white lighting floods the room, reflecting off shiny metallic surfaces, while soft bokeh effects blur the background. The composition draws attention to the cutting-edge technology, with smooth reflections and sharp details in the illuminated interfaces.
Midjourney 6.1 Prompt: Shot with a 35mm lens at a low angle, a futuristic laboratory brimming with advanced holographic displays and robotic systems. The focus is on a floating, translucent hologram at the centre of the frame, with the background softly blurred. Cool blue and white lights reflect off sleek metallic surfaces, casting a futuristic glow across the scene. Sharp details in the holograms and the tech interfaces create a high-tech, cutting-edge feel.
ChatGPT analysis: "While both images were highly stylized, Midjourney achieved a more balanced composition and natural lighting, whereas Flux appeared flat and artificial in comparison."
This was the most subjective of the tests as both look good. For me though, the Midjourney image looks like it 'may' have been taken by some future camera that can perfectly image a hologram in a still photograph, so won on realism.
5. The fashion show
I will admit that association realism with the fashion industry might be a stretch, but with this prompt that is exactly what we’re doing. Set in a studio under intense light, the AI has to capture the shadows, reflections and of course — the beautiful gown.
Flux 1.1 Pro Prompt: A high-fashion runway show in a luxurious studio, shot with a 70-200mm telephoto lens to compress the scene and focus on the elegant model walking down the runway. Studio lights with high contrast illuminate the shimmering fabric of her gown, highlighting its intricate details and texture. The camera captures her confident stride with a soft focus on the audience in the background. The lighting is crisp, creating dramatic shadows and reflections off the polished runway floor.
Midjourney 6.1 Prompt: Captured with a 70-200mm telephoto lens, a luxury fashion runway, where a model in an intricate gown struts confidently under bright studio lights. The fabric catches the light, showcasing detailed patterns and textures. The composition focuses on the model, with a soft blur on the background audience. Sharp lighting contrasts emphasise the richness of the scene, with reflections bouncing off the polished runway.
ChatGPT analysis: "Both images exhibited a plastic-like shine, but Midjourney’s lighting and details were more polished. However, both lacked the necessary realism to feel like actual photographs."
The results here demonstrate why its worth taking a second look at the image. I liked both but was ready to give it to Flux for having the model facing forward, but then I took a look and the face is distorted and the skin looks like plastic.
6. An alien landscape
What better way to utilize the creative powers of artificial intelligence than to depict something that isn’t, couldn’t, or is unlikely to ever be seen for real? Let’s go for a deep purple planet with multiple moons but shot on a 16mm lens.
Flux 1.1 Pro Prompt: A vast, alien desert under a deep purple sky, lit by three moons. The landscape features jagged, towering rock formations and a shimmering crystal lake in the distance. Shot with a 16mm wide-angle lens to capture the expansive, surreal environment. The lighting casts long, eerie shadows across the landscape, with glowing bioluminescent plants dotting the ground. A vivid colour palette of purples, greens, and blues gives the scene a dreamlike, otherworldly atmosphere.
Midjourney 6.1 Prompt: Shot with a wide-angle 16mm lens, a vast alien landscape under a purple, star-filled sky with three moons. Jagged rock formations rise sharply, while a shimmering crystal lake glows in the distance. The lighting casts eerie shadows across the land, highlighting bioluminescent plants scattered throughout. The rich colours of purple, blue, and green give the scene an ethereal, dreamlike quality, capturing the scale and mystery of the environment.
ChatGPT analysis: "Midjourney created a more immersive scene with better depth and lighting, though it still retained a digital quality. Flux’s rendition was far less convincing, lacking the subtle interactions of light with the environment.
I'm not sure I even need to comment on this one as the difference is obvious. I actually think ChatGPT was fairer on Flux than it deserved. Its image doesn't just look like AI, it looks like early AI before realism was possible.
7. The moving vehicle
Finally, something that is a test of the real-world photographer — fast-moving cars. For this prompt we’re asking each model to use a 50mm lens to capture a vehicle speeding on a mountain road and ensure they capture background motion blur while keeping the car in focus.
Flux 1.1 Pro Prompt: A sleek, red sports car speeding along a winding mountain road at sunset, shot using a panning technique with a 50mm lens to create a motion blur in the background while keeping the car sharply in focus. The sun is low on the horizon, casting golden light across the car’s reflective surface, while the rocky cliffs and trees blur in dynamic streaks of motion. The vivid colours and precise detailing on the car’s body create a sense of speed and power.
Midjourney 6.1 Prompt: Captured with a 50mm lens using a panning technique, a bright red sports car races down a winding mountain road at sunset. The motion blur of the background contrasts with the sharp, focused details of the car, as golden light reflects off its sleek body. The rocky cliffs and blurred trees emphasise the sense of speed and movement, while the car itself remains crisp and powerful against the dynamic, vibrant backdrop.
ChatGPT analysis: "Despite both images feeling artificial due to their "too perfect" composition, Midjourney’s scene was more believable, while Flux’s image looked like a generation behind in terms of rendering quality."
Midjourney and Flux both created images that looked like they were from a game rather than a real photograph. The textures were too perfect. The difference is the Flux image looks like its from a PlayStation 3 game and Midjourney from PS5.
Winner: Midjourney
The reason I maintained an entirely 'me' judged section of the Rubric is because image appreciation, particularly around aesthetics, is largely personal. It isn't easily quantified and comes down to a sense of feeling — which AI can't offer.
I gave my judgment to ChatGPT after having it analyze the image against the previous four sections of the rubric. I then reviewed its 'thinking' and didn't disagree with its findings, or notice any glaring errors in its analysis.
Midjourney 6.1, with RAW style, outperforms the base Flux 1.1 Pro model with no LoRAs or additional customization. Flux 1.1 Pro is still relatively new, so it is also possible prompting is more different to earlier versions of Flux than I thought, but the difference is enough that I'm happy to give the win to Midjourney.
The first thing that stood out to me was just how fast Flux 1.1 Pro is. Each image took less than a second to produce, whereas Midjourney images took several seconds but the Midjourney images are higher resolution by default.
The second thing I noticed was how different the two models are in their output. In previous and similar tests I've had similar results, even when using different prompts to get the same result — as the models improve they seem to be diverging more.
This wasn't a particularly close race between the two models. I suspect Ideogram and the latest Stable Diffusion models would have given similar results to Midjourney on some of the prompts and I need to work on Flux prompting.
What this has taught me though is the value of customizing prompts to each model to play to its strengths. This suggests that as the AI image sector evolves people might prefer to stick to a product they know, much like they do with other software.