@skarlow181

skarlow181@lemmy.world · edit-2 10 months ago

Or they can skip the first part and just draw something from their imagination

Which is why blind people are so amazing at drawing…

crazy foam monster while having never seen one.

You are recombining pattern you have seen before, “crazy”, “foam”, “monster”, those all have a certain look that your brain got trained on, you are simply remixing them. An AI can do exactly the same. The fact that there are words for those concepts should be enough to tell you that those ideas are not original.

skarlow181@lemmy.world · 10 months ago

That creates a problem of how to first piece of art came from.

They looked at nature and crudely copied that. They didn’t start drawing Mickey Mouse on day one.

skarlow181@lemmy.world · 10 months ago

Utter nonsense. Have you ever looked at the history of art? It’s all a slow incremental crawl based on previous efforts. Nothing comes from nothing.

skarlow181@lemmy.world · 10 months ago

and a limited FOV.

Not for movies. Modern VR headsets have around 100°, comfortable movie viewing distances only needs 30-60° (+ a couple degree for head movement). The resolution is a far bigger problems, with VisionPro being the first one that can do about 1080p at 50° FOV. Most other headsets are stuck with 720p or below when they emulate 2D display.

Also VR can effortlessly do 3D movies and Apple is the first to actually offer them out of the box, finding those for other headsets has always been a huge struggle (i.e. piracy or ripping them yourself).

One thing I haven’t yet seen on VisionPro is if it has any form of multiplayer. Watching movies together with other people (VRChat, BigScreen), was one of the more interesting things VR can do, so far VisionPro looks like a single-player device. Outside of video calls, I have seen no indication that it has full avatars or how it behaves when multiple people in the same room wear a VisionPro.

skarlow181@lemmy.world · 10 months ago

The crux is that they went “draw me a cartoon mouse” and Midjourney went “here is Disney’s Mickey Mouse™”. A simple prompt should not be able to generate that specific of an image. If you want something specific, you should need to specific it, otherwise the AI failed to generalize or is somehow heavily biased towards existing images.

skarlow181@lemmy.world · 10 months ago

It’s a bit different for MidjourneyV6, previous AI models would create their own original images based on patterns learned from the data. MidjourneyV6 on the other side reproduces the original images to such a degree where they look identical to the originals for the average observer, you have to see them side by side to even spot the differences at all. DALLE3 has that problem as well, but to a much lesser degree.

That means there is something going wrong in the training, e.g. some images end up being duplicated so often in the training data that the AI remembers them completely. Normally that should be reduced or avoided by filtering out duplicate images, but that seems to not be happening or the images slip through due to small changes (e.g. size or crop will be different on different websites).

Note this doesn’t just impact exact duplication, it also impacts remixing, e.g. when you tell it to draw Joker doing some task, you’ll get Joaquin Phoenix’s Arthur Fleck, not some random guy with clown features.

All of this happens with very simple prompts that do not contain all those very specific details.

In AI’s defense: All the examples I have seen so far are from press releases of movie stills. So they naturally end up getting copied all over the place and claiming copyright violation for your own material that you released to be reused by the press wouldn’t fly either. But either way, Midjourney is still misbehaving here and needs to be fixed.

More broadly speaking, I think it would be a good time to move away training those AI almost exclusively on images and start training them on video. Not just to be able to reproduce video, but so that the AI get a more holistic understanding of how the world works. At the moment all its knowledge is based on deliberate photo moments and there are very large gaps in its understanding.