Midjourney, an image-generating company, has overcome a common problem faced by other image generators: ensuring that each hand in their photos has five fingers. While this may seem insignificant, it makes a difference in creating professional images with commercial value, as opposed to just average Instagram pictures.
Read more:
For $10 a month, Midjourney's latest image generator could give you access to incredibly life-like photos of events that could never happen or are very hard to imagine.
How about Putin and Zelensky having a drink together? How will Israel look in 2050? Would you like to see Donald Trump under arrest? It's all possible.
"It looks very realistic," says Uri Bejarano, a prominent Israeli image AI specialist. "It can generate these photos in seconds."
Not everyone is a fan of the realism conveyed by Midjourney's latest version. Bejarano thinks it comes at the expense of the artistic qualities that the previous version excelled at. "I and some others think that version 5 is too realistic, so many still prefer using version 4."
Another concern people have about this technology is the fear of misuse. If used with malintent, it can have a significant impact on public opinion, threaten the stability of nations, and even influence electoral outcomes.
The generators enable users to type their desired photo description. The more specific the input, the more accurate the generated photo will be. However, Midjourney has a limitation in terms of control.
For instance, if you input "a cow in space," the resulting photo composition may not match your exact expectations. The cow could be a different size or located in a different position within the photo. To achieve your envisioned result, you would need to input several additional prompts.
"Generative models are wild animals," says Ofir Bibi, VP of Research for Lightricks which specialized in the development of image-generator apps. "It's not like ChatGPT where you can easily ask for corrections. That's why Midjourney could take multiple attempts before getting the desired result
Lightricks has developed a feature called AI Scenes, which enables processing photos without losing the original composition. This allows for a whole new level of image manipulation.
Another issue is personal photo modification. For instance, if you want to paste your head onto a medieval knight's body, the models have never been trained on your picture before, so they are unlikely to make it fit the way you want. However, there is a workaround.
Google has developed Dreambooth, which allows users to provide 20-30 images of themselves, after which the algorithm seamlessly fits them onto an existing image.
The next frontier is enabling a video generator. A company called Runway has already presented a model capable of generating 3-second video clips. "Initially, companies went with text-to-video," says Bibi.
"The outcomes now resemble a GIF more than an actual video, which has led to the new thinking regarding video-to-video. This approach allows for the manipulation of existing videos, such as making a regular clip appear as if it were filmed in a post-apocalyptic world."
"Despite today's limitations, there's little doubt that text-to-video apps are coming," says Sahar Mor, product manager for AI company Stripe. "Probably within six months."
"I think video is going to really blow up soon," says Bejarano. "You'd be able to produce a great video just from typing what it is you wish to see happen. I don't know if it will turn human labor obsolete, but it will render all those unable to use it irrelevant. The technology won't replace people, but people who are proficient will replace those who aren't."