OpenAI introduces Sora, its text-to-video AI model

catculation@lemmy.zip · 9 months ago

OpenAI introduces Sora, its text-to-video AI model

EdibleFriend@lemmy.world · 9 months ago

jownz@lemmy.world · 9 months ago

The folks with access to this must be looking at some absolutely fantastic porn right now!

webghost0101@sopuli.xyz · 9 months ago

Oh its going to be fantastic all right.

Fantastical chimera monster porn, at least for the beginning.

helpImTrappedOnline@lemmy.world · 9 months ago

Honestly, let’s make it mainstream. Get it to a point where it’s more profitable to mass produce Ai porn than exploit young women from god knows where.

myxi@feddit.nl · 9 months ago

I don’t think they would make a model like this uncensored.

dylanTheDeveloper@lemmy.world · edit-2 9 months ago

‘obama giving birth’, ‘adam sandler with big feet’, ‘five nights at freddy’s but everyone’s horny’

possibilities are endless

ThePowerOfGeek@lemmy.world · 9 months ago

YouTube is about to get flooded by the weirdest meme videos. We thought it was bad already, we ain’t seen nothing yet.

nyakojiru@lemmy.dbzer0.com · 9 months ago

shit is going too far, as excited expected, and governments give a fuck about societies. Only in the EU, there are a few human-like movements.

Drew Got No Clue@lemmy.world · 9 months ago

This is so much better than all text-to-video models currently available. I’m looking forward to read the paper but I’m afraid they won’t say much about how they did this. Even if the examples are cherry picked, this is mind blowing!

BetaDoggo_@lemmy.world · 9 months ago

I’m looking forward to reading the paper

You mean the 100 page technical report

PerogiBoi@lemmy.ca · 9 months ago

Just get ChatGPT to summarize it. Big brain time.

steal_your_face@lemmy.ml · 9 months ago

Can I get sora to create a video from the summary?

KingJalopy @lemm.ee · 9 months ago

Full circle.

Eventually, the internet will just be AI criticizing itself to create a better version of itself…

Hang on…

RGB3x3@lemmy.world · edit-2 9 months ago

How do you know you’re not AI?

Doo^doo doodoo doo^doo doodoo doo^doo doodoo

tiny_electron@sh.itjust.works · 9 months ago

The quality is really superior to what was shown with Lumiere. Even if this is cherry picking it seems miles above the competiton

Drew Got No Clue@lemmy.world · 9 months ago

I can’t understand how the shadows and reflections are so accurate (not perfect, but convincing) like here or here.

barsoap@lemm.ee · 9 months ago

The second one is easy as you don’t need coherence between reflected and non-reflected stuff: Only the reflection is visible. The second one has lots of inconsistencies: I works kinda well if the reflected thing and reflection are close together in the image, it does tend to copy over uniformly-coloured tall lights, but OTOH it also invents completely new things.

Do people notice? Well, it depends. People do notice screen-space reflections being off in traditional rendering pipelines, not always, but it happens and those AI reflections are the same kind of “mostly there in most situations but let’s cheap out to make it computationally feasible” type of deal: Ultimately processing information, tracking influence of one piece of data throughout the whole scene, comes with a minimum amount of required computational complexity and neither AI nor SSR do it.

tiny_electron@sh.itjust.works · 9 months ago

Yeah we won’t be needing proper raytracing with this kind of tech it’s mind blowing

MermaidsGarden@lemmy.world · 9 months ago

Only the third most confusing entry in the Kingdom Hearts series

FoolHen@lemmy.world · 9 months ago

Lol And KH4 is gonna be about Sora being in the real world. This storyline is getting out of hand.

Ton the Supermassive@lemm.ee · 9 months ago

The demo looks pretty good, yes - but I won’t believe it 'till I try it!

paulzy@lemmy.world · edit-2 9 months ago

I wonder if in the 1800s people saw the first photograph and thought… “well, that’s the end of painters.” Others probably said “look! it’s so shitty it can’t even reproduce colors!!!”.

What it was the end of was talentless painters who were just copying what they saw. Painting stopped being for service and started being for art. That is where software development is going.

I have worked with hundreds of software developers in the last 20 years, half of them were copy pasters who got into software because they tricked people into thinking it was magic. In the future we will still code, just don’t bother with the thing the Prompt Engineer can do in 5 seconds.

fidodo@lemmy.world · 9 months ago

The hardest part of coding is managing the project, not writing the content of one function. By the time LLMs can do that it’s not just programming jobs that will be obsolete, it will be all office jobs.

systemglitch@lemmy.world · edit-2 9 months ago

I think that’s a bad analogy because of the whole being able to think part.

I’ll be interested in seeing what (if anything) humans will be able to do better.

InvaderDJ@lemmy.world · 9 months ago

What it was the end of was talentless painters who were just copying what they saw. Painting stopped being for service and started being for art. That is where software development is going.

I think a better way of saying this are people who were just doing it for a job, not because of a lot of talent or passion for painting.

But doing something just because it is a job is what a lot of people have to do to survive. Not everyone can have a profession that they love and have a passion for.

That’s where the problem comes in when it comes to these generative AI.

Kedly@lemm.ee · 9 months ago

And then the problem here is capitalism and NOT AI art. The capitalists are ALWAYS looking for ways to not pay us, if it wasnt AI art, it was always going to be something else

General_Effort@lemmy.world · 9 months ago

It was exactly the same as with AI art. The same histrionics about the end of art and the dangers to society. It’s really embarrassing how unoriginal all this is.

Charles Baudelaire, father of modern art criticism, in 1859:

As the photographic industry was the refuge of every would-be painter, every painter too ill-endowed or too lazy to complete his studies, this universal infatuation bore not only the mark of a blindness, an imbecility, but had also the air of a vengeance. I do not believe, or at least I do not wish to believe, in the absolute success of such a brutish conspiracy, in which, as in all others, one finds both fools and knaves; but I am convinced that the ill-applied developments of photography, like all other purely material developments of progress, have contributed much to the impoverishment of the French artistic genius, which is already so scarce.

What it was the end of was talentless painters who were just copying what they saw. Painting stopped being for service and started being for art.

This attitude is not new, either. He addressed it thus:

I know very well that some people will retort, “The disease which you have just been diagnosing is a disease of imbeciles. What man worthy of the name of artist, and what true connoisseur, has ever confused art with industry?” I know it; and yet I will ask them in my turn if they believe in the contagion of good and evil, in the action of the mass on individuals, and in the involuntary, forced obedience of the individual to the mass.

sleepmode@lemmy.world · 9 months ago

After seeing the horrific stuff my demented friends have made dall-e barf out I’m excited and afraid at the same time.

Carighan Maconar@lemmy.world · 9 months ago

The example videos are both impressive (insofar that they exist) and dreadful. Two-legged horses everywhere, lots of random half-human-half-horse hybrids, walls change materials constantly, etc.

It really feels like all this does is generate 60 DALL-E images per second and little else.

Natanael@slrpnk.net · 9 months ago

This would work very well with a text adventure game, though. A lot of them are already set in fantasy worlds with cosmic horrors everywhere, so this would fit well to animate what’s happening in the game

TheHarpyEagle@lemmy.world · 9 months ago

I mean, it took a couple months for AI to mostly figure out that hand situation. Video is, I’d assume, a different beast, but I can’t imagine it won’t improve almost as fast.

archomrade [he/him]@midwest.social · 9 months ago

For the limitations visual AI tends to have, this is still better than what I’ve seen. Objects and subjects seem pretty stable from Frame to Frame, even if those objects are quite nightmarish

I think “will Smith eating spaghetti” was only like a year ago

AutoTL;DR@lemmings.world · 9 months ago

This is the best summary I could come up with:

Sora is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post.

The company also notes that the model can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.”

Many have some telltale signs of AI — like a suspiciously moving floor in a video of a museum — and OpenAI says the model “may struggle with accurately simulating the physics of a complex scene,” but the results are overall pretty impressive.

A couple of years ago, it was text-to-image generators like Midjourney that were at the forefront of models’ ability to turn words into images.

But recently, video has begun to improve at a remarkable pace: companies like Runway and Pika have shown impressive text-to-video models of their own, and Google’s Lumiere figures to be one of OpenAI’s primary competitors in this space, too.

It notes that the existing model might not accurately simulate the physics of a complex scene and may not properly interpret certain instances of cause and effect.

The original article contains 395 words, the summary contains 190 words. Saved 52%. I’m a bot and I’m open source!

Jeena@jemmy.jeena.net · 9 months ago

The cat video is funny, the cat has 5 legs :D

troybot [he/him]@midwest.social · 9 months ago

Seeing the 5 legged cat was the moment I started to believe this stuff really was AI generated.

1984@lemmy.today · 9 months ago

I’m really impressed by the demo, but yes, let’s see how well it works when it’s made public.

People who don’t think AI will take a lot of jobs may have to rethink…

anguo@lemmy.ca · 9 months ago

Her legs rotate around themselves and flip sides at 16s in. It’s still very impressive, but …yeah.

Marcbmann@lemmy.world · 9 months ago

Wow didn’t see that the first time

Vex_Detrause@lemmy.ca · 9 months ago

Imagine VR giving an AI generated world. It would be a Ready Player One in irl.

AgentGrimstone@lemmy.world · 9 months ago

I recently played a game where people found immortality and each individual just lived in their own personal virtual reality for thousands of years. It’s kinda creepy seeing the recent advances in technology today lining up to that, minus the immortality part.

nossaquesapao@lemmy.eco.br · 9 months ago

What game was that?

Toribor@corndog.social · 9 months ago

The compute power it would take to do that in realtime at the framerates required for VR to be comfortable would be absolutely beyond insane. But at the rate hardware improves and the breakneck speed these AI models are developing maybe it’s not as far off as I think.

Blue_Morpho@lemmy.world · 9 months ago

An Ai generated VR world would be a single map environment generated in the same way you wait at loading screens when a game starts or you move to an entirely new map.

A text to 3D game asset Ai wouldn’t regenerate a new 3D world on every frame in the same way you wouldn’t ask AI to draw a picture of an orange cat and then ask it to draw another picture of an orange cat shifted one pixel to the left if you wanted the cat moved a pixel. The result would be totally different picture.

Toribor@corndog.social · 9 months ago

I think we’re talking about different kinds of implementations.

One being an ai generated ‘video’ that is interactive, generating new frames continuously to simulate a 3d space that you can move around in. That seems pretty hard to accomplish for the reasons you’re describing. These models are not particularly stable or consistent between frames. The software does not have an understanding of the physical rules, just how a scene might look based on it’s training data.

Another and probably more plausible approach is likely to come from the same frame generation technology in use today with things like DLSS and FSR. I’m imagining a sort of post-processing that can draw details on top of traditional 3d geometry. You could classically render a simple scene and allow ai to draw on top of the geometry to sort of fake higher levels of detail. This is already possible, but it seems reasonable to imagine that these tools could get more creative and turn a simple blocky undetailed 3d model into a photo-realistic object. Still insanely computationally expensive but grounding the AI with classic rendering could be really interesting.