Midjourney CEO David Holtz: AI should be an extension of ourselves.

Midjourney CEO David Holtz: AI should be an extension of us.

On July 7th, David Holtz, CEO of Midjourney, spoke at the 2023 World Artificial Intelligence Conference, stating that AI will become a new carrier and engine for creativity and imagination. Through AI, we have the potential to amplify the raw imagination of the entire human race. Regarding the company name Midjourney, Holtz explained that it comes from the concept of the middle way in the Taoist work “Zhuangzi” and that he believes classical Chinese literature has brought some of the most beautiful and profound ideas.

Currently, Midjourney is developing version 5.3 and will provide the ability to scale and pan generated images to automatically produce new images from different angles in version 6. The randomness of the generated images can also be controlled to help the author find a balance between eerie beauty and confusing imagery. In the future, Midjourney’s goal is to develop three-dimensional, real-time, and dynamically adjustable generated images.

He is uncertain about the possible direction of technology in the future, but the fusion of models (combining models for images/text) may be a more likely development direction. He believes that the potential of this technological advancement in AI has not yet been fully realized and that progress that is ten or a hundred times stronger than now is inevitable.

He believes that most of the technological advances so far have been aimed at making people better and amplifying their abilities. Therefore, AGI may not be necessary, and AI as an extension of humans, empowering humans, is a better choice.

Following is the transcript of his speech:

Hello, everyone. I am David Holtz, CEO, and founder of Midjourney. I am honored to be invited by the Shanghai municipal government to participate in this world artificial intelligence conference and look forward to joining today’s event.

One of the most critical technologies in the world is the engine. The engine is a machine used to generate, transfer, or amplify. We use engines in various factories to build various means of transportation, such as cars, planes, and ships. Now it’s time to see artificial intelligence as a new type of engine.

At MidJourney, we are trying to use this engine to create a new carrier that is not a means of transportation but a carrier that carries our thoughts and imagination.

Just as you can use a football to turn the world, but you still need legs to play football. We hope to create a new type of carrier that you can use for imagination, not just movement generation. Before we create it, we must first imagine what we can become, where we can go, and what is possible. I think the tools we make are more focused on amplifying the primal power of imagination than anything else. We have the opportunity to amplify not just any individual but the imagination of the entire human race. I have visited China many times with Leap Motion (gesture recognition device), and Leap Motion’s first office is in Shanghai. Shanghai has a special feeling that I really like. It seems to be a combination of San Francisco, Los Angeles, New York, and some old European cities. It has the power of ancient history and culture, as well as an uncut feeling of the future. This is really cool, and these are my two favorite things.

In fact, I’m basically a sci-fi fan, and the craziest setting I’ve seen comes from classical Chinese literature. I think ancient Chinese literature has the most beautiful and profound thoughts in human history. The name “MidJourney” actually comes from a translation of my favorite ancient Taoist text, from Zhuangzi. For example, “Zhuangzi’s Dream of the Butterfly”, “Zifeyu”, “Pao Ding Jie Niu”, “Useless Wood”, “Empty Boat”, I like them. The reason I like the name MidJourney is that I think people sometimes forget the past and may feel lost and uncertain about the future. But I feel more that we are actually on a mid-journey, we come from a rich and beautiful past, and ahead is a wilderness and an incredible future.

We recently released version 5.2 of Mid Journey, and we are now developing version 5.3. After that, I hope to release a major updated version, which I hope to call version 6. The latest feature we introduce is about image scaling, and when you zoom out, you can create different stories and environments, and change them around the central theme. This week we will release a similar feature that allows you to move the camera and then change the prompts as you move the camera horizontally, and then tell the story. We also released this quirky control system that combines these new features to better control image generation.

You can also combine it with style control. “Style control” is a bit confusing, but the idea is that you want to tell AI how beautiful you want to produce, and how much risk you want to take to create this beauty. Even if it is unconventional, chaotic, and quirky, sometimes the results are really outstanding.

Sometimes you need to take risks, which allows people to balance the randomness of risk and beauty, or balance how much attention is paid to the general and common beauty of the image. We also introduced something we call “turbo mode.” Turbo mode is to use as many GPUs as possible to make the image very fast. This speeds up the generation by 4 to 5 times. This mode makes you feel like you are generating images with 64 or more than 100 GPUs. To achieve this computing power, your computer probably has to be worth $500,000. This sounds a bit crazy, and we are still developing crazier technologies. Although most of them are still brewing, we believe that Midjourney will develop into not only creating two-dimensional images, but also creating three-dimensional images, dynamic images, and you can even interact with the pixels themselves. In the future, perhaps you can real-time flow back and reshape what you drew.

People need just one large AI processor, and then it can dream up all kinds of different worlds, and the dreams can interact with our thinking. And we are, in a sense, dreaming with it (AI), and that would be really cool. The successive discoveries of the Diffusion model, the Transformer model, and the Clip model have actually enabled AI to enter the image space. About two years ago, when no image AI service was available yet, all of our researchers were talking in San Francisco, and I remember I said these models, especially the Diffusion model, would definitely bring something completely different. And there is the generative adversarial network technology, which is the basic technology that people used to make image generation before.

I only remember everyone nodding in an unusual way immediately, saying that the Diffusion model is really different. At that time, the atmosphere was very serious, and I had a strong feeling that I had to be involved and bring a more humane user interface to this technology.

But about the future, it’s hard to know how technology will develop. Sometimes we talk about how to turn language models into Diffusion models, that is, using Diffusion models to make text. Or image models will become more like language models. How is this achieved? The technical term for this approach is self-regressive Transformer, or AI will evolve towards hybrid models. But it’s really hard to say. I think we’re just at the beginning of this transformation, but I’m 100 percent sure there’s a lot more progress to be made. Progress that is ten times, or even a hundred times, what we have now is likely inevitable.

This progress is not only reflected in performance, but also in the user interfaces and products that allow us to better use these technologies. Individuals and collectives can create really cool things that can solve problems better. Douglas Engelbart was the first person to create a text editor. Initially, people programmed computers by punching holes in cards or cards. But Douglas began to think, what if we programmed computers to program computers, which sounded crazy at the time. His idea was that programming computers on computers could speed up the cycle, make us better, make computers stronger, and amplify everything. This idea eventually came to fruition. Despite the different cultures we have, such as AI, human-machine interface, intelligent application culture, I think most of the technological advances so far have come from trying to make people better, trying to amplify human capabilities.

We haven’t really seen the AI age come yet, where we have independent AIs solving problems. But if we focus too much on developing in that direction, we may miss out on many opportunities that exist in technology today. I not only think about what AI can do, but also how to create liquidity and entanglement between different things. Because a tool shouldn’t feel like a person, it should feel like an extension of yourself, your body, your mind. I’m thinking about how to build these technologies to weave people and AIs together, so it feels less like you’re collaborating with an artist and more like you’re imagining something and it appears on the screen. Many people describe my journey as destinations that feel like a part of their own thinking. I think that’s what most AIs should be like, it should be an extension of ourselves.

So I want to thank Mr. Chen and the entire audience again. WAIC is really cool, and I hope I can personally participate in the future as part of this event. I look forward to more cooperation with China, I remember all the wonderful firsthand experiences I had there, and I hope everyone there can enjoy the fun of interaction as well. Thank you.