Midjourney Changed My Mind on AI-Generated Art

Experimenting with Midjourney to create concept art

June 24, 2022 Writing Tools AI Art Midjourney

When the initial news cracked about DALL-E and there was some uproar on Twitter about The End of Illustrators, I was ambivalent. I work in games and have worked directly with illustrators for a long time, and also generally know what computers are capable of. Yeah a computer could generate an image, but being able to talk to someone who made that image, and have them bring their own creativity and expertise to bear to create the right image - that’s really hard to replace. Feedback is really important. Not only that, but illustrators excel in consistency. Being able to generate the “correct” image that fits in as part of a pre-existing sequence other images is a big deal, and a problem computers still haven’t totally cracked yet.

TBH I wouldn’t be surprised if Google starts generating images in search

A month or so passes and that thread on DALL-E 2 and Kermit went viral, and I was still mostly ambivalent. My main thinking was that this is just a “better” version of Google image search. The internet is so full of so many other images, the idea that an AI could generate “a brown bear on a bicycle” wasn’t exactly exciting. I could probably just google the same thing and find a passable version.

More complicated prompts that DALL-E could handle just felt like… whatever? I don’t care if I can get an accurate picture of “An astronaut riding a horse in a photorealistic style”. I don’t really ever want that. It’s clip art or just a more energy intensive way to find an image I’m sure already exists. It’s just DALL-E trying to show off. I also think humans are bad at producing the thing DALL-E “wants” - ie. how am I suppose to generate a “new” image if all I have at my disposal is my “old” words?

My general ambivalence towards the whole endeavor recently shifted. I was talking to another game developer about their project and their pitching process, and they mentioned how they were trying to use these new tools to generate art for their pitch deck. My first understanding was that they were using it to generate “final” art, but he clarified to say that they were using it to create concept shots.

A light clicked on in my head when I heard this. Of course. Instead of using DALL-E or any other tool to generate the final art, you’re using it to generate the half-step to that final piece. Usually, this process is characterized by handing an illustrator over a giant moodboard of images and saying something like “something like a mixture of these 500 things?” and then working with them to come up with even just an approximation of your thought.

Actual moodboard I’ve handed someone. Cantata’s was way bigger.

What if instead we could skip that first step and generate actual style frames? A company I used to work for would pay actual thousands of dollars for people to generate style frames for projects. These were often very exact and very good, but they were never used as-is in final projects. They were a guide to drive the actual style of the in-game art.

If we could generate that guideline, we can skip out a lot of the initial messy process (and $$$ cost) and get to the good stuff quicker with images that are more in line with what we want. To be clear, this will still result in moodboards, but instead of starting with 500 loosely connected references of other things with lots of noise, you can show a handful of options that (if the AI is “good”) are dead on for what you’re looking for.

So after our conversation, I immediately signed up for DALLE-2 and Midjourney. And a few weeks after that, I got access to Midjourney. And damn, things are never going to be the same. Midjourney (or some version of it) is going to be part of my creative process from here on out.

Creative Collaboration

I don’t really know how DALL-E’s interface works, but below is a screenshot of what it looks like to generate an image for Midjourney (for now). They are brokering the whole beta program through Discord, and you use bot commands to dispatch prompts to their servers. The bot auto-updates your prompt messages with higher res thumbnails of your prompt until the render completes. A prompt will give you four sample outputs.

So this, at a baseline, is what you probably “expect” from AI-generated art. The lofi DALL-E memes going around the past few weeks show something similar, with a 3x3 design to show possible options for that prompt:

The fidelity of Midjourney here is definitely “better” than bad DALL-E, but it’s still, you know, bad. It looks “generated” from the thumbnail.

But here’s where it gets interesting. Those buttons below the image in the Midjourney post? They stand for [U]pscale and [V]ariant. The little Redo button will rerun the whole prompt and generate new images. My experience with these buttons for the past few days has felt like the future.

See, when I thought about “AI-Generated Art”, what I thought was that you get a direct mapping to a piece of art based on your prompt. Learn how to craft the perfect prompt, and you can craft the perfect piece of art. However after my time with Midjourney, I know that that is not the case. Or at least the full story. With those buttons, you reorient such that you aren’t trying to get the exact perfect prompt the first time, but are instead able to refine the interpretation of your prompt towards a desired end goal.

So looking at the output above, I kind of like the despondent image in 3. I like that it looks like there is a laptop there, which makes sense for a “game developer”. I also like the composition and the light on the wall. Let’s see what I get when I click V3.

Okay so the computer has decided the light is maybe actually a whiteboard. That’s interesting. I kind of like what’s going on in 2/3, so I’ll generate variants of both of those.

2 Variant

3 Variant

I’ll generate the final image for the end of this post by continuing to step through options here (it’s also the post’s thumb), but you can see that even in just a few minutes I’m already in a creative flow state, trying to narrow in on my own loosely formed idea.

And the thing about this feedback cycle that is so special is that it feels like an actual communication with the computer. You, the human, understand certain visual cues that you want the computer to dial in on. The computer has some gestalt formed about what it thinks the desired goal of your prompts and previous choices are. By clicking the variant button and targeting a specific image, you are basically saying “more like this, less like this”. It feels like you’re moving through a dichotomous key chart towards your imagined end state, but aren’t really sure what you’re looking for to get there.

In case you easily get your bears and frogs messed up

Free Editable Dichotomous Key Examples | EdrawMax Online

It’s pretty incredible. The computer truly feels like a creative collaborator here in a way I’ve never experienced. The closest parallel I can think of is something like modular synthesizers, where you’re working with this machine, and it’s feeding back on itself, in order to try and make something new.

Generative Creativity

Beyond this, the computer started to pick up on and elevate traits in images I wasn’t even necessarily looking for, but were ones I was subconciously finding desirable. The main prompt I was working with last night had the word “mysterious” in it. After the first few iterations, I forgot that the word “mysterious” was in the prompt. However, after every new generation, the computer was still keeping that in its gestalt, and was trying different ways of manifesting it. What started to happen (in a way I can’t exactly show because I want to keep the identity of the thing I’m working with a secret), was that other stuff started manifesting in my generated images, things I think could easily be related to the word “mysterious”. Not only this, but they were manifesting in integrated ways, like 50 iterations deep in a prompt, in a way that felt consistent with the rest of the idea of the image itself. Frankly I was astounded. Not only had Midjourney helped me find the image I was looking for, but it went above and beyond the initial prompt to give me something even greater. It felt like an actual collaborator.

This feels like a paradigm shift in the field, enabling non-art people like me to help come up with art that is more related to what I’m looking for, and can be easily used for pitching. There will also definitely be people that use AI art as is, but at least for now I’m not sure that’s a big issue. I imagine that set of people weren’t necessarily looking to hire illustrators or concept artists in the first place.

If I was to ever share these images publicly as part of a funded project, I would 100% have an actual artist touch them up (if not redo them all together) in their own style. The edges and details still fall off, but the suggestion of the idea is strong of for certain use cases, and I suspect we’ll all start using stuff like this way more often.

So finally, a game developer writing a blog post:

A final note. In a Photogrammetry Report I saw a while ago, tech artists mentioned how, all things considered, photogrammetry didn’t represent a massive decrease in asset creation time (assuming you’re capturing novel assets and not re-using stuff). You still had to import all the source photos, align the images, clean up the point clouds, retopo, etc. This does feel a bit similar. I’ve seen artists speed paint some truly amazing concept work in less than 10 minutes. I probably was working on the prompt last night for about 20-30 minutes till I got somewhere I thought was “good”. The Rockwell piece above probably took about 5-10 minutes of moving through lots of variations.

Great example of what “prompt interfaces” could look like from @graycrawford

That said, I think this will get faster, ideally at the speed of thought. Being able to markup variations to point out desired characteristics or places of emphasis would help me not have to cycle through tons of variations to get non ghastly eyes. Being able to describe stuff “as you see it” to help guide furter variations, etc. Right now it feels like lots of raw material, and we’re only starting to develop the tools.

Date

June 24, 2022

Up next

Nostalgia and the Triforce% Run Musing on speedrunning's relationship to games and the production of "new" nostalgias

Previously

Game Design Mimetics (Or, What Happened To Game Design?) Exploring recent trends in game design to try and figure out why everything is Fine and why that's terrible.