kory bieg on text-to-image generators & the future of AI


Artificial intelligence, architecture and design

 

Artificial intelligence is dramatically transforming the nature of creative processes, with computers already playing key roles in countless fields including architecture, design, and fine arts. Indeed, the advent of text-to-image generating software such as DALL-E 2, Imagen, and Midjourney, shows that AI machines and computers are already being used as brushes and canvases. While some choose to focus on the benefits of artificial intelligence, like the increased speed and efficiency it offers, others worry that it will take eventually replace designers as a whole. In any case, one thing is certain: AI is here to stay, and it will radically impact the way we create — how it will do so remains to be seen.

 

Kory Bieg, Principal at architecture, design, and research office OTA+ and Program Director UT Austin School of Architecture, is one of the several creatives that have experimented with AI generative tools. Mainly using Midjourney, Bieg has been producing architectural designs that have taken over Instagram with their striking visuals. To learn more about text-to-image AI generators, Midjourney, and the future of AI, designboom spoke with Kory Bieg. Read the interview in full below.

interview with kory bieg on text-to-image generators & the future of AI in design
camouflage series

all images courtesy of Kory Bieg | @kory_bieg

 

 

interview with Kory Bieg 

 

designboom (DB): How would you describe Midjourney to someone studying/practicing architecture who has never used it?

 

Kory Bieg (KB): Midjourney is a text-to-image Artificial Intelligence that uses your text (called a prompt) to construct an image of anything you want; buildings, landscapes, people, machines, really, anything. It pulls from a huge database of pictures that are tagged with a bunch of parameters, like object names, style, and even the type of view or whether it’s day, night, or raining — whatever you can think of to describe an image. Once you submit your prompt, Midjourney searches its database to find pictures that match your text, and then combines the relevant parts into a collage-like cloud of pixels that it thinks represents what you are after. You are given four images with the option to run it again, or vary any of the images to produce four new options. You can also upscale any or all of the images, which increases the resolution of the images while also adding detail. You continue to vary and upscale images until you are happy with the results.

interview with kory bieg on text-to-image generators & the future of AI in design
camouflage series

 

DB: Can you describe how you create parameters, visual vocabularies? Do you edit and refine the result afterwards? What does this process look like?

 

KB: AI is amazing at constructing images in almost any style you choose. You can draw portraits in the style of Vincent Van Gogh and design buildings in the style of Zaha Hadid. And that is why I completely avoid the use of style descriptors in all of my prompts! If you are interested in emulating the style of someone else, I have found it to be much more productive and interesting to use words that someone like Van Gogh or Hadid might use to describe their own work. Words like ‘organic’ will start to produce results that might have the same qualitative elements as a project designed by Hadid, while not limiting the database of pictures from which the AI is pulling by using her name as a descriptor. By describing your desired image in more generic terms, you will uncover more interesting tangents that lead to new design territory.

 

As an architect, I am also very interested in form and how to gain some control over it using AI systems like Midjourney. In my Vault Series, I used common typological and architectural descriptors to achieve some of the formal attributes I was hoping for in the output. I also used words associated with natural patterns and structures that might relate in some way to the architectural terms I was using in order to form synthetic hybrids that are not entirely natural or architectural, but reside somewhere in between. In my Text Series, I experimented with using letters to control form. Letters are made up of straights, angles, curves, arches, dots and bends, so by asking Midjourney to create images of buildings in the form of letters, you can expect certain formal conditions to appear in the imagery. Maybe it’s the academic in me, but every time I start a new prompt, I have an agenda and a set of questions I am trying to answer.

 

One drawback of Midjourney is that you can’t edit the prompt once you’ve started a thread. To change something or add additional text, you would have to edit the original prompt and run a new series. However, the images produced by Midjourney don’t require additional post-production, because things like lighting, saturation, and mood are all embedded in the images it creates. That said, you can easily manipulate these standard image files in any photo editing software.

interview with kory bieg on text-to-image generators & the future of AI in design
vault series

 

DB: Speaking of visual vocabularies, did you have a rough vision of what this series would look like before starting? Were you surprised by the results?

 

KB: Midjourney is a creative AI, so the results will never be exactly what you expect. This is partly the result of this type of AI, which understands the context objects are typically found in, what parameters are often associated with certain object types, or even what other objects an object in your prompt is usually found with. So if you ask Midjourney to make an image of a building, it will likely add doors, windows, a street in the foreground, and a sky, without you having to include that detail in the prompt. Regardless of what you include in your text, Midjourney will always be filling in some blanks and adding detail that you might not have intended. This is also where it can be very exciting. By combining elements in the image output that you might not have considered with whatever you feed the AI, you often find yourself on tangents that are incredibly rich and often outside your comfort zone (usually a good thing). That said, if you want to be very specific about the building’s context (like placing it in a forest, rather than a city), you can include that information in the prompt. If you want it to be a cloudy sky, then specify that. If you want the doors to be made of spider webs, it can do that, you just have to type it in.

 

When I think about what parameters to include in my prompts, I do consider what objects Midjourney might add and whether I want to be more specific about them in my text. At the same time, I think about how these standard elements might be replaced with nonstandard alternatives to create new overlaps between otherwise unrelated elements. For example, in one series I described the roof of a building as a sponge. The output was exactly that; a building with a sponge for the roof. What Midjourney does so well, is that it doesn’t just erase the roof and plop a picture of a sponge in its place. Rather, it maps the structure of a sponge into the form of the roof, even molding it around things like windows and roof vents, so the two synthetically integrate with the overall form of the building.

interview with kory bieg on text-to-image generators & the future of AI in design
vault series

 

DB: How does midjourney differ from DALL-E, which generates an image based on a short text input?

 

KB: DALL-E and Midjourney (and there are others — Disco Diffusion, Imagen, Stability AI) all use the same diffusion-based approach to generating images, but they also have very different features, and the output can vary dramatically. DALL-E, for example, has a great feature that allows you to erase part of an image and replace it with a new prompt. So, if you have a chair in your image and you want a clown sitting on it, you can erase a portion of the image, add text describing the clown, and DALL-E adds in a clown. DALL-E also lets you upload an image (Midjourney can do this too, but not as well), and with the click of a button, it will return three image variations. As a test, I uploaded a photograph of a real, built house I designed, and DALL-E gave me three very convincing variations with different roof forms and with windows and doors in new locations. Each image looked just like the original photograph. On the other hand, Midjourney gives you more control over the image-making process. It allows you to define the size and proportion of the image, how much leeway you want to give the AI (you can increase or decrease how much ‘style’ it gives the image, and you can control the amount of ‘chaos,’ or how far the AI deviates from your text). Midjourney is great as a tool for sketching and discovering entirely new design territory, while DALL-E is great for varying and editing an existing image. DALL-E, of course, also generates entirely new images from your text, but it doesn’t seem to be as geared toward speculative buildings and architecture, yet.

interview with kory bieg on text-to-image generators & the future of AI in design
vault series

 

DB: Are there any similarities to parametric/algorithmic modeling tools like a Grasshopper plug-in?

 

KB: There is no doubt that these text-to-image AI’s will become an integral part of the design process, but they are an entirely new category of tool and don’t replace any of the tools we currently use as designers. I think the only overlap between Midjourney and parametric modeling tools is if a Midjourney user wants to create an image that has parametric qualities. You can include words like modular, repetitive, and parametric and they will affect the output to look somewhat like a building designed using a plug-in like Grasshopper. One interesting side note is that Midjourney does recognize images in its database that were created by various software programs, like VRay or Octane Render, and it will attempt to use the style of images made with them for the images it returns.

interview with kory bieg on text-to-image generators & the future of AI in design
vault series

 

DB: Is the process more useful as a ‘working partner’ or a tool for creating rough and quick iterations?

 

KB: I often start a prompt with only a few words, just to see the effect one or two words might have on the output. I then add a few more words and run the prompt again. I do that until I am happy with the way certain words are mixing and how they are influencing the output. Once I settle on the text and other parameters of the image, like aspect ratio and the type of view (all words to include in the prompt), I vary and upscale over and over again. I probably do more variations than most people, but in my experience, the most compelling results are often discovered when following one thread for a long time. Moreover, once you get to a certain point in the lineage, it seems to start producing gem after gem, and it is well worth the time it takes to get there.

 

So yes, Midjounrey is great for creating very quick iterations and testing ideas out before spending too much time on any one thread, but the images I post are usually days, or even weeks in the making. Of course, when compared to the sort of design process we are all used to, a few days is no time at all.





Source link