Reddit has struck a $60m deal with Google that lets the search giant train AI models on its posts

mesamune@lemmy.world · 6 months ago

Reddit has struck a $60m deal with Google that lets the search giant train AI models on its posts

Asafum@feddit.nl · edit-2 6 months ago

Google: “I’m looking to make an AI that’s incredibly opinionated, confidently incorrect, and prone to circlejerk behaviors.”

Spez: “I mean OpenAI pretty much did that already, but if you want to pay me to recreate that then I got you.”

GorGor@startrek.website · 6 months ago

One thing that doesn’t seem to get brought up is the AI porn angle. Gonewild is pretty big on reddit still. A lot of OnlyFans creators, and general perverts (respect). Reddit wants to commodify this content, so those people are selling the images of their bodies so AI can make porn of random strangers. Kinda fucked when you think about it.

Michael Ten @lemmy.world · 6 months ago

Minimum royalty laws should exist.

henfredemars@infosec.pub · 6 months ago

They can have my drunk shitposts. Jokes on them for buying my garbage.

Endorkend@kbin.social · 6 months ago

And this is how Skynet was born.

That one Microsoft Twitter bot turned into a full blown Nazi in just one day.

I can’t even imagine how fucked up and depraved one trained on Reddit data will get.

AwkwardLookMonkeyPuppet@lemmy.world · edit-2 6 months ago

They have a series of safeguards against that now. They’ve actually taken it in the extreme other direction now where it can’t give you anything without injecting diversity in there somewhere.

Here’s an example. This is what it produced when asking for an image of a German soldier in 1943.

Endorkend@kbin.social · 6 months ago

They may be able to alter what it says, but they can’t alter what it thinks.

Rayspekt@lemmy.world · 6 months ago

Is it just me or are 60 million a ridiculously small price for that whole dataset?

bobburger@fedia.io · 6 months ago

To be fair it’s a pretty terrible dataset. The AI is just going to say “this” to every question you ask

jkrtn@lemmy.ml · 6 months ago

Hey, now, be fair. There are some Top 40s song lyrics in there too.

lol@discuss.tchncs.de · edit-2 6 months ago

You’re exaggerating of course, but I don’t think it’s terrible at all; the opposite really. It’s likely incredibly useful for creating LLMs with specific knowledge or behavior.

The categorization into subreddits alone opens up so many possible applications. Imagine for example training a conversational AI with data from specific subreddits like science, askscience, biology, physics, astronomy,… or posts by users that frequent such subreddits in order to create sort of an academic AI.

You could do the same for all sorts of topics: Want a sports commentator AI, use sports related subreddits; an AI that supports you in writing a novel, use creative writing subreddits etc. Don’t want your AI to spew political opinions, exclude political subreddits from your data; don’t want it to use offensive language, only use well-moderated subreddits etc.

Adderbox76@lemmy.ca · 6 months ago

This presumes that Reddit is populated by so-called experts answering questions and posting in those subs.

But the vast overwhelming truth is that most people pretending to be experts are just regurgitating the answers they heard from another reddit post, and so on, and so on.

You might as well just train your AI on the “confidently incorrect” sub and call it a day.

MBM@lemmings.world · 6 months ago

It’s always an eye-opener when you look at an ELI5 thread where you’re actually knowledgeable about the topic

OfCourseNot@fedia.io · 6 months ago

Yeah and Google already has everything scrapped and indexed

AwkwardLookMonkeyPuppet@lemmy.world · 6 months ago

Or “just Google it”.

GBU_28@lemm.ee · 6 months ago

Ai:

😭 I’m trying

captainlezbian@lemmy.world · 6 months ago

My heel turn as a mod back in the day was having automod remove lmgtfy links

rarkgrames@lemmy.world · 6 months ago

This.

Altima NEO@lemmy.zip · 6 months ago

and “and my axe”

Shimon@slrpnk.net · 6 months ago

and “rock and stone”

InFerNo@lemmy.ml · 6 months ago

& Knuckles, featuring Dante from the Devil May Cry series

AwkwardLookMonkeyPuppet@lemmy.world · 6 months ago

"Reject humanity. Return to monke.

GBU_28@lemm.ee · 6 months ago

How quickly you forget that half of it is just “I also choose this guy’s wife” and “the narwhal bacon’s at midnight”

trolololol@lemmy.world · 6 months ago

Considering it’s all full of Nazis and bots, and if you get to filter all of them out you’re left with reposts and low quality memes followed by comments that represent the hostile side of each of us… I’d say anything over $5 is a good deal for spez.

Now, I hope Google uses this data exclusively for detecting inappropriate answers. Can you imagine it giving answers based on the endless threads i of " I’m not your mate, bro; I’m not your bro, dude…".

Zaktor@sopuli.xyz · 6 months ago

I’m personally curious whether Reddit actually has any ability to protect that database. I don’t remember Reddit TOS, but usually those things give them license to use and copy the data, maybe even to sell it, but not actually the copyright on it. So if someone made a Reddit scraper and copied the comments, wouldn’t only the actual commenter be able to sue?

$60M may be reflecting that, in that it’s more a convenience fee to shield Google against individual Redditors going after them than something that Reddit itself could actually sue over.

Ebby@lemmy.ssba.com · 6 months ago

Perhaps, but not worth buying if you can’t make profit or keep it from your competition.

60M is for over almost 20 years of data, but once it’s ingested, google will only want new content. Next year, it’ll be more like 3M if the dataset isn’t poisoned by bots or the AI fad hasn’t collapsed. Reddit will struggle with finances again and users will suffer. At least that’s my prediction.

empireOfLove2@lemmy.dbzer0.com · 6 months ago

Spez has already grifted his money out of the initial stock pump so it literally doesn’t matter. Reddit could shut down tomorrow and he’d be happy as a clam.

Ebby@lemmy.ssba.com · edit-2 6 months ago

Yeah, what a load. Though now they can boot his arse and save.

Edited to remove number.

AwkwardLookMonkeyPuppet@lemmy.world · 6 months ago

I doubt he’s getting 120M per year. I think that big compensation package was a 1 time deal. That’s more than Satya Nadella makes.

AwkwardLookMonkeyPuppet@lemmy.world · 6 months ago

the AI fad

LOL. Do you realize that makes you sound like Boomers talking about the internet in the late 90’s and early 00’s?

Barbarian@sh.itjust.works · 6 months ago

It currently looks very much like a bubble. After the dot com bubble, the internet didn’t go away, but most companies died off and all the stupid monetisation went bankrupt.

We may be seeing something similar

Ebby@lemmy.ssba.com · 6 months ago

Haha! Wow I guess so. I’ll keep some shelf space available in the geezer museum next to 3D TV’s, deep fakes, fidget spinners, and my pogs. :D

qjkxbmwvz@startrek.website · 6 months ago

I wonder if Google’s unlimited legal budget plays a role. Not a lawyer, so probably way off here…

But, for example, reddit’s success in part depends on Google ingesting their data — reddit shows up in Google searches all the time, which can only happen if Google uses reddit’s content. So reddit telling Google “you can’t use our content” doesn’t work, and they need to say something like, “you can use our content for search results but you can’t consume it as training data.”

This is a pretty straightforward statement/request/demand, but one could imagine Google lawyers maliciously complying and throwing their hands up dramatically, claiming “well we use some amount of AI in our search results, so if we can’t use your content for AI training then we can’t risk using it for search results.” Which would, I imagine, really, really hurt reddit (no Google results would be catastrophic I suspect).

So, perhaps the “low” 60M figure is just Google using their leverage.

Or not. As a random person on the Internet, I can say I’m probably not contributing anything meaningful here…

ivanafterall@kbin.social · 6 months ago

As part of the deal, spez will personally train the AI Jailbait Model.

ThyTTY@lemmy.world · 6 months ago

Can’t wait to see an AI chatbot in my Google searches that behaves like a typical redditor.

8ender@lemmy.world · 6 months ago

I mean one of the most popular search types on Google is <topic + Reddit> so not much would change

bobburger@fedia.io · 6 months ago

Every thing you google is just going to direct you to a link to let me google that for you

Ilovethebomb@lemm.ee · 6 months ago

I love that site.

qjkxbmwvz@startrek.website · 6 months ago

This.

Asafum@feddit.nl · 6 months ago

“Hey Google AI, could you help me find a way to do _____?”

“Why the hell would you want that? Are you stupid? There’s like 15 better ways to accomplish what I think your peanut brain is trying to accomplish.”

“… nevermind Google.”

QuadratureSurfer@lemmy.world · 6 months ago

Just wait till the LLM starts “singing” randomly to you.

trolololol@lemmy.world · 6 months ago

– Hey Google/reddit, what does xxxxxx mean?

–Wtf is people so lazy, Google it yourself it’s only 5 seconds!

–But but, you are Google, are you not?

–Buahaha , haha!

cabron_offsets@lemmy.world · 6 months ago

AI be like “stfu regard”

Ben Hur Horse Race@lemm.ee · 6 months ago

AI be like there things are over they’re

🇰 🌀 🇱 🇦 🇳 🇦 🇰 ℹ️@yiffit.net · 6 months ago

Welp… Time to nuke it all.

Granite@kbin.social · 6 months ago

Grateful this is no longer my problem

SonicDeathTaco@lemm.ee · 6 months ago

*your posts

SharkAttak@kbin.social · 6 months ago

Oh no, my thousands of identical messages!

InFerNo@lemmy.ml · 6 months ago

You sir are a scholar and a gentleman.

I also choose this man’s wife.

trolololol@lemmy.world · 6 months ago

And my axe

Graphy@lemmy.world · 6 months ago

This

Elsie@lemmy.ml · 6 months ago

Scrolled too far down to find this

HopingForBetter@lemmy.today · 6 months ago

ChatGPT4: tl;dr The universe is bigger than we thought.

ChatGPT5: fuck spez

AtariDump@lemmy.world · 6 months ago

Player2@lemm.ee · 6 months ago

Wrong company, it would be Gemini in this case

garbagebagel@lemmy.world · 6 months ago

Can someone point me the way of that bot or whatever that changes all your old Reddit posts before deleting them? I thought I had it saved somewhere but I can’t find it now and have no idea what it’s called.

InFerNo@lemmy.ml · 6 months ago

They keep copies of posts because people who mass edited their posts saw them reverted or have people reply still as if they were not edited.

GBU_28@lemm.ee · 6 months ago

Plus they can easily just detect mass edits, and ship the state prior to that event.

BassTurd@lemmy.world · 6 months ago

I had read that with some people, is was a delay from their server instance between read/write and in the end the changes did end up sticking, but I don’t know if that was true. A lot of people were mass editing at the same time, and since editing isn’t something that happens super frequently, it might have less priority in the stack and caused backups.

Grimy@lemmy.world · 6 months ago

They change it on their website but the data that’s collected and sold isn’t changed.

It still devalues their google search though but also makes it harder to scrap data for free and ups the value of what they are selling.

Blaze@dormi.zone · 6 months ago

https://shreddit.com/ ?

stanleytweedle@lemmy.world · edit-2 6 months ago

I deleted my comment history after the API exodus. I’m sure they could dig it up if they wanted but at least they’ll have to click like 3 more buttons if they want to train AI on my nonsense.