your current location is:Home > TechnologyHomeTechnology

David Holz said that AI image generation technology is water and will become the driving force for the development of human civilization

  • joy
  • 2022-08-04 14:54:22
  • 341 read
focus1There are dozens of top image generation AIs in the world, and they are expensive to develop and require ...

focus

  • 1There are dozens of top image generation AIs in the world, and they are expensive to develop and require a lot of data for training;

  • 2A small company named Midjourney has developed an AI image generator of the same name, hoping to conduct in-depth exploration of AI image generation technology in the next ten years;

  • 3Midjourney sees AI image generation technology as an "engine of imagination" that creates images through understanding of language;

  • 4The founder of Midjourney believes that AI image generation technology is more like water. Although it is sometimes dangerous, it is still the driving force for the development of human civilization.

artificial intelligence (AI)-generated art is quietly starting to reshape culture. The ability of machine learning (ML) systems to generate images from textual cues has dramatically improved in quality, accuracy, and expressiveness over the past few years. Now, these tools are moving from research labs into the hands of everyday users. Of course, while they create a new language of visual expression, they may also bring new troubles.

Currently, there are only dozens of top image generation AIs in the world. They are tricky and expensive to develop, requiring access to millions of images to train the system (finding patterns in pictures and replicating them), and a lot of computation (which varies in cost and can at most require consume millions of dollars).

Now, when these system-generated images appear on magazine covers or are used to create memes, they are mostly seen as novelties. But artists and designers are integrating this type of software into their workflows, and in a very short time, AI-generated and AI-enhanced art will become ubiquitous. Questions about copyright (who owns the image and who created it) and potential dangers (biased output or AI-generated misinformation) must be dealt with quickly.

As the technology becomes mainstream, though, a 10-person research lab called Midjourney will be able to take some credit for itself. The lab developed the eponymous AI image generator via the Discord chat server. While the name may not be familiar to many, you may have seen the output of the Midjourney system on your social media. All you have to do is join Midjourney's Discord, enter a prompt, and the system will generate a picture for you.

David Holz, founder of Midjourney, said in an interview: "A lot of people ask us, why don't we make an iOS app that generates images? But people want to co-create content, if you do that on iOS. To do that, you have to create your own social network. It’s hard, so if you want your own social experience, Discord is really great.”

Sign up for a free account and you can get 25 points, all pictures are generated in public chat rooms. After that, you'll have to pay $10 or $30 a month, depending on how many images you want to make and whether they're part of your privacy. This week, though, Midjourney expanded access to its models, allowing anyone to create their own Discord server with their AI image generator. As Holz puts it, "We're going from a Midjourney universe to a Midjourney multiverse." He thinks the results will be incredible: The burst of AI-enhanced creativity is still just the tip of the iceberg.

Holz recently gave an exclusive interview to talk about his ambitions for Midjourney, such as why he wants to create an "engine of imagination" and why he thinks AI is more like water than tiger.

The following is the full text of the interview:

Q: You'd better start by introducing yourself and Midjourney. What is your background? How did you get involved? What is Midjourney, a company or a community? How would you describe it?

Holz: My name is Holz, and I consider myself a serial entrepreneur. Let me briefly introduce myself: I worked in design in high school and studied mathematics and physics in college. I did a PhD in fluid mechanics while working at NASA and the Max Planck lab. At one point I felt so lost and put everything aside. So I moved to San Francisco and started a tech company called Leap Motion around 2011. We sell hardware devices that do motion capture in the hand and create a lot of space for gestural interfaces.

I started Leap Motion and ran it for 12 years, but eventually, I started looking for a different environment than a big VC-backed company, so I left and started Midjourney. Right now, our company is still small, with about 10 people, no investors, and no economic motivation. We have no pressure to sell or become a public company. It's just to have a place for the next 10 years to do some cool projects.

We're working on a lot of different projects, and it's going to be a wide variety of research labs. But there are also themes, things like reflection, imagination, and coordination. We became known for image creation. We don't think this is really art or a deep fake, but how do we expand the human imagination? what does that mean? What does it mean when computers are better at visual imagination than 99% of humans? That doesn't mean we'll stop imagining. Cars are faster than humans, but that doesn't mean we stop walking. When we transport large quantities of items over long distances, we need engines, whether it's a plane, a boat, or a car. We see this technology as an "engine of the imagination," so it's a very positive and meaningful thing to do.

Q: Many labs and companies are working on similar techniques for turning text into images. Google has Imagen, OpenAI has DALL-E, and some small projects like Craiyon. Where did this technology come from, where do you see it going in the future, and how does Midjourney's vision differ from others in this space?

Holz: There have been two major breakthroughs in AI, resulting in image-generating tools. One is understanding language and the other is the ability to create images. When you combine these things, you can create images through an understanding of language. We're seeing these technologies emerge, and we're seeing trends in these technologies, they're going to be better than humans at making images, and they're going to be very fast. For the next year or two, you'll be able to produce content in real time: 30 frames per second, high resolution. Although expensive, it is still possible. Then, in 10 years, you'll be able to buy an Xbox with a huge AI processor, and all games come from dreams.

From a raw technology standpoint, these are facts and there is no way around them. But what does it really mean from a human point of view "all games come from dreams, everything is malleable, we're going to have AR headsets" - what does it really mean? So the human factor is unfathomable. And the software that's really available to us is not there at all, and I think that's our focus.

Last September, we started testing the original technology, and we quickly discovered something truly different. We quickly discovered that most people don't know what they want. When you ask "it's a machine with which you can imagine anything, what you want" they say "dog." Are you serious? They choose "pink dog". So you give them A picture of a dog, they say "okay" and go off to something else.

However, if you put them in a certain group, they'd call it "dogs," others would call it "space dogs" or "Aztec space dogs," and then all of a sudden, people understood all kinds of Possibilities, you create this enhanced imagination: an environment where people can learn and test this new ability. We found that people really like to imagine together, so we created Midjourney's social features. We have a huge Discord community, one of the largest, with about a million people imagining things together in a shared space.

Q: Do you think this human collective and machine collective are parallel? As a balance to these AI systems?

Holz: There is no real collective of machines. Every time you ask the AI to draw a picture, it really doesn't remember or know anything it has ever drawn. It has no will, no purpose, no intention, no storytelling ability. All selves, wills and stories are under our control. Like an engine, it can't decide on its own where to go, but people can. It's a bit like the human hive mind, technology with superpowers.

In this community, there are 1 million people making pictures, and they are all imitating each other. Everyone can see everyone else's pictures by default. You have to pay extra to get out of the community. Usually, if you do, it means you are some type of business user. So everyone is influencing each other and all these new aesthetics are coming. It's almost like aesthetic accelerationism, not AI aesthetics. They're new, interesting, human aesthetics, and I think they're going to spread around the world.

Q: Does this openness also help ensure safety? Because there is a lot of talk about AI image generators being used to generate potentially harmful things like gory violent images and misinformation. How do you stop this from happening?

Holz: Yeah, it's amazing. When you put someone's name on all the pictures he takes, they're a lot more strict about how they use the photos. This is very helpful. But unfortunately, we still run into some problems from time to time. For example, someone makes a living by causing outrage on social media, someone pays for privacy and then spends 1 month creating the most shocking image and then trying to tweet it. And then we have to say firmly: this is not our purpose, this is not the type of community we want.

Whenever we see these kinds of images, we take immediate action and, if necessary, we ban them. We also collect a lot of inappropriate words and prohibit similar things.

Q: What about real faces? Because this is another vector for creating misinformation. Can this model generate real faces?

Holz: It would generate celebrity faces and stuff like that. We have the default style and look, it's full of artistry, and it's so pretty that it's hard to shake, which means you can't really force it into a deepfake right now. Maybe if you spend 100 hours trying, you can find the right combination of words to make it look very real, but you have to work really hard to make it look like a photo. Personally, I don't think the world needs more deepfakes, but it does need more beautiful things, so we're focused on making everything beautiful.

Q: Where did you get the training data for the model?

Holz: Our training data comes from the internet almost like any other company. Almost every large AI model has access to all the data it can, including text and images. From a scientific point of view, we're in the early days of this field and everyone has access to about the same amount of data, they're put together in one giant file, and then you start it up to train something huge, and no one really knows that Which of the heap data is really important.

For example, our most recent update looks better, and you might think we did this by adding many paintings to the training data. But we don't, we just use user data according to people's (using models) preferences. No one is invested in it for art. But scientifically, we're still in very, very early stages. The entire studio has probably only trained 24 of these models. So this is still experimental science.

Q: How much did it cost to train your model?

Holz: I would say that when it comes to training models in this area, I can't reveal the exact cost, but I can say something in general. Now, it costs about $50,000 to train an image model each time. You can't get it right the first time, so you have to try it three times or 10 or 20 times. Given this, training costs are quite high. That's more than most universities would cost, but not so much that it would cost a billion dollars or build a supercomputer.

I believe the cost of training and maintenance will be lower, but the cost of running it is actually quite high right now. Each photo costs money, and each image is generated on $20,000 servers that we have to rent by the minute. I don't think there's ever been a service where consumers can use petabytes of operations without thinking in 15 minutes. I'd say it has more computing power than anything the average consumer has ever been exposed to, which is probably 10 times as much. This is actually kind of crazy.

Q: When it comes to training data, a contentious aspect is the question of ownership. Current U.S. law says you can’t copyright AI-generated art, but we don’t quite know if people can copyright the images used for training data. Artists and designers strive to develop their own specific styles, but what if their work could now be replicated by AI robots? Have you discussed this issue many times?

Holz: We do have a lot of artists in our community, and I would say that they are generally positive about this tool, and they think it will make them more efficient and greatly improve their lives. We kept asking them "How are you? How are you feeling?" We also recorded with 1,000 people during office hours, sat for 4 hours, and answered questions.

It's really interesting that a lot of famous artists who use this platform are saying the same thing. They said: "I feel like Midjourney is an art student, it has its own style, and when you call on my name to create an image, it's like having an art student take inspiration from my art to create something. Always As an artist, I want people to be inspired by my work."

Q: But there's definitely a huge self-selection bias in there, because the artists who are active on the Midjourney Discord must be excited about it. Those who say "this is crap, I don't want my art to be eaten by these giant machines." Would you allow these people to disappear from your system?

Holz: We don't have the process yet, but we're open to it. So far, I'd say there aren't that many artists here. This is not a deep dataset. And those who succeeded gave us the answer, "We're not intimidated by this." Now, it's still in its infancy, and I think it makes more sense to adapt and stay alive. So we've been talking to people. In fact, the first request we get from artists right now is that they want it to steal their style better so they can better use it as part of their own artistic process. This surprised me.

Other AI image generators may be different because they try to make something look like something completely real. But we have more default styles, so it really looks like an art student inspired by something else. The reason we do this is, you always have defaults, and if you say "dog" we can give you a picture of a dog, but that's boring. From a human point of view, why would you do this? Going to Google Image Search, we tried to make things look more artistic.

Q: This is something you mentioned a few times in the conversation, Midjourney's default art style, I'm really fascinated by the idea that each AI image generator is a microcosm of its own culture, with its own preferences and expressions . How would you describe Midjourney's unique style and how did you consciously develop it?

Holz: This is something special! We tried a lot of things, and every time we tried something new, we rendered a thousand images. That's not really the intention, it should look pretty. It's supposed to react to concrete things and vague things, we definitely don't want it to just look like a photo. We might make a realistic version at some point in the future, but we don't want it to be the default. The perfect photo makes me a little uncomfortable right now, although I can see why you might want a more realistic photo.

I think its style is going to be a little whimsical and abstract and weird, it tends to blend things in ways you might not ask, in surprising and beautiful ways. It tends to use lots of blues and oranges, and also has its own favorite colors and faces. If you give it very vague instructions, it will definitely apply its own favorite factor. So, we don't know what's going to happen next, but it likes to paint a particular woman's face, and we don't know where it came from, so we can only call it "Miss Journey." There is also a person's face, a little square, very imposing, he has also appeared, but he has no name. But it's like an artist with his own face and color preferences.

Q: Speaking of these defaults, a big challenge in image generation is dealing with bias. Research has shown that if you ask an AI image model to generate an image of a CEO, you may find that it is always a white male image; when you ask it to output a nurse, it is dominated by women, and often people of color. How did you approach this challenge? Is this a big problem for Midjourney, or is it a concern for companies that want to profit from these systems?

Holz: "Miss Journey" is definitely a problem, not a feature, and we're working on something right now that will try to break down those faces in order to bring more variety. But this also has downsides. For example, we have a version that totally destroys "Miss Journey," but if you really want Arnold Schwarzenegger to play Danny DeVito, then it also totally destroys destroy this requirement. The tricky thing is getting it to work without breaking all the expressive style. Because it's easy to have a switch that adds variety, but it's hard to turn it on only when it's supposed to. What I can say is that it has never been easier to make a picture with any variety you want.

Q: Let's recap a little bit, you've said a lot that you don't think the work you're doing at Midjourney is practical. I mean, obviously you've been personally involved, but your motivations are more abstract, mostly about the relationship between humans and AI; about how we can use AI in this humanistic way that you call it. Some in the AI field tend to think of the technology in its grandest terms; they liken it to a god, to a sentient being. How do you feel about this?

Holz: For a while, I've been trying to figure out "what is Midjourney's AI image generator" because you could say it's like an "imagination engine," but there are other things. The first temptation is to look at it through the lens of art and ask: Is this like the invention of photography? Because when the photo was invented, painting got even weirder because anyone can take a face, so why should I paint this now?

Is that so? No, it's not like that. This is definitely something new. Now, it feels like the engine was invented. Like, you're making a bunch of images every minute, and you're tumbling down an imaginary path, and it feels good. But if you go a step further, and instead of taking four pictures at a time, you take a thousand or ten thousand, it's different. One day in the future, I did this, I took 40,000 photos in a few minutes, and all of a sudden, I had such a vast nature in front of me, it took me 4 hours to see them all, in the process , I felt like I was about to drown. I felt like a child, looking into the depths of the pool, knowing I couldn't swim, but feeling the depth of the water. Suddenly, Midjourney feels less like an engine and more like a torrent. It took me weeks to digest, I thought and thought, and then I realized: You know what? It's actually like water.

Right now, people completely misunderstand what AI really is. They saw it as a ferocious and dangerous tiger who might eat us. Of course, there are dangers from water, and you could drown, but the dangers of running rivers are nothing like the dangers of tigers. You can swim in the water, you can build boats, you can dam the water to generate electricity. Water is dangerous, but it is also the driving force of civilization that understands how to live with and harness its power to make life better for humanity. It's an opportunity, it has no will, no malice, and yes, you could drown in it, but that doesn't mean we should cut off water. It's really a good thing when you discover new sources of water.

Q: Is Midjourney a new water source?

Holz: Of course! I think as a species we've discovered a new source of water, and what Midjourney is trying to figure out is: How do we use it to serve people? How do we teach people to swim? How do we build ships? How do we plug it up? How do we go from drowning fearful people to future surfing kids? We're making surfboards, not water. I think there is something deep here that is worth exploring.


TAG: No label

Article Comments (0)

    • This article has not received comments yet, hurry up and grab the first frame~


Top