Reddit already has your comments. So does everyone else who might want to train an LLM, for that matter, there are archive dumps that anyone can torrent and those aren’t updated “live” every time you vandalize your old comments. The only people that are inconvenienced by replacing your comments with gibberish are humans that may find that thread later on looking for information.
Maybe, but we are losing a vast wealth of collected and archive information. Anything from resources for anyone who wanted to learn any hobby, places to go in cities for every niche interest you can think of, suggestions for what to do for various college situations tailored to every college in the US. The list could go on for a hundred more topics.
For a while it’s been the only place you could get Google results that you could be reasonably sure you were getting multiple unsponsored human opinions and discussions in a thread. It’s honestly tragic to lose that.
But you can no longer be sure you’re getting unsponsored human opinions there. It’s already been ruined by bots and management decisions. Seems totally fair for the original content generators to salt the earth on their way out.
Sounds like you haven’t seen this happen before… This is a typical pattern in IT. Sites will come and go. It’s a good thing that people take action when they are not happy. Reddit exploited users and moderators to work for free, then sold their data.
The fact that it’s happened before doesn’t make it a good thing, and doesn’t make it something that shouldn’t be opposed.
Fortunately Reddit is well-archived so LLMs can still be trained off of it, regardless of what Reddit or its users try to do to the data now, but it’s still a negative thing that doesn’t have to happen.
The place I know about off the top of my head is https://academictorrents.com/ where you can find lots of large data sets useful for academic research. The torrent files themselves are small, so I’m sure they can be found in other places too.
The only people that are inconvenienced by replacing your comments with gibberish are humans that may find that thread later on looking for information.
That’s what I said awhile back, still ended up down voted to hell lmao
I’ve already started running into this, (probably) good information and the answer I was looking for was now “Pizza Paper Piper Follow Bumble” or some shit, but I’m sure reddit has versioning and has the original still so it was pointless.
That’s not entirely true. I edited and removed all my comments with nuke reddit during the API fiasco. Then I demanded my data every month until they started ignoring me - just to be annoying, of course. But I did get it and it had every comment, vote, etc.
The account info they have is, sadly, thorough. However I did successfully bork about 30% of my comments. Better than nothing.
Would it not have been smarter to subtly alter them, in order to not trigger database rollbacks? Swap several words around in a paragraph and you can ruin intelligibility.
Then I demanded my data every month until they started ignoring me - just to be annoying, of course
Wow, you’re the kind of person that makes every worker in IT hate the GDPR. It’s good for consumers. Until the consumer is you. Think of the fact that a person has to actually fulfill that request, and you know that management never paid for tooling for that, they have to fuck around manually in the database every time.
Why should I give two fucking shits what Reddit endures when it profited off my free labor for a decade and had the audacity to call me “landed gentry” for daring to question yet another boneheaded decision and for calling out their continued refusal to make the Reddit mobile app more accessible? And let’s not pretend it didn’t take someone all of 10s to comply with the request. God forbid they get paid to do their job.
To me there’s a huge difference between being angry at a company and its leadership, and taking out the anger on the workers that are probably just as angry at their own management. It’s like someone yelling at a level 1 phone support, as if that magically makes them able to help you, which is usually something they would be fired for even if they had the system access to fix the problem. They’re paid to handle standard questions with a standard answer catalogue and nothing more.
You’re not making life for the management difficult by repeatedly asking for a GDPR readout. Just for the workers who are already being paid fuck all to do shitty work in too long hours.
Until you can prove it actually is burdensome to an individual instead of assuming it is I simply do not care. It’s a friggin csv file. I guarantee you it is either automated or takes 2 seconds. I’m not in Europe, it’s not even GDPR compliance related. It is incredibly unlikely a company of their scale is doing those manually. I’m sorry if you have some job where that experience is very relatable, but you are projecting your own frustrations onto a situation where we don’t even know what they’re doing. The most likely scenario is somebody making California tech company pay wrote something so they barely have to think about it.
Edit: look I get you. I am frustrated with Reddit and I’m taking it out on you. Let’s just move on.
Right but on the backend they capture deltas, then emit the newest version. Aside from explicit gdpr requests (lol) they never actually delete the originals (more lol).
I agree with respect to the low likelihood of changing one’s old posts being effective in preventing their being used as training data. I’d assume, however, that those who are motivated to “vandalize” (itself a loaded term to refer to altering one’s own words) their old posts have more than one motive; in addition to inconveniencing humans, doing so devalues reddit as a place to find information and, in theory, punishes reddit for their actions, maybe even deters others from behaving similarly.
This a situation where I think that maybe a shared distaste/disdain for “slacktivism” leads to folks discouraging potentially effective collective action in one of the limited contexts where online protest has a chance of having any effect.
I don’t have a distaste for “slacktivism.” I have a distaste for pointless performative “protest” that only serves to ruin useful resources that could benefit others.
that’s the point too tho. Having content on their platform only provides value to Reddit shareholders. Removing that content deminishes the playforms value as a whole
Ik it’s not much, but it might be a spec of sand in the cogs of capital.
Also if a person was on that platform for quite a while, the effect is even bigger
Not only that but it actually brings up the value of their dataset. It makes theirs unique compared to the dataset you can build by scrapping for free. Every deleted comment literally adds worth to what they are selling.
I actually agree with this. The other day I searched for an issue on my PC. It looked like it was a rare issue and I’d only found one post on reddit about it. The solution comment was one of those “replaced with gibberish” ones :/
OP was even thanking the commenter for the solution that is now gibberish. That really got on my nerves.
Reddit already has your comments. So does everyone else who might want to train an LLM, for that matter, there are archive dumps that anyone can torrent and those aren’t updated “live” every time you vandalize your old comments. The only people that are inconvenienced by replacing your comments with gibberish are humans that may find that thread later on looking for information.
I disagree.
The more people are disappointed about reddit, the better.
Maybe, but we are losing a vast wealth of collected and archive information. Anything from resources for anyone who wanted to learn any hobby, places to go in cities for every niche interest you can think of, suggestions for what to do for various college situations tailored to every college in the US. The list could go on for a hundred more topics.
For a while it’s been the only place you could get Google results that you could be reasonably sure you were getting multiple unsponsored human opinions and discussions in a thread. It’s honestly tragic to lose that.
It is in the hands of a publicly traded corporation. As soon as that planned it was already lost.
But you can no longer be sure you’re getting unsponsored human opinions there. It’s already been ruined by bots and management decisions. Seems totally fair for the original content generators to salt the earth on their way out.
“It’s ruined and that’s a bad thing, so let’s ruin it more. Including the older stuff that wasn’t as badly ruined.”
This is a very childish approach to life, IMO. If you don’t like Reddit any more then just move on and leave it be for those who do still like it.
That may be, but it’s their content and it’s their choice if they wanna let reddit continue to profit from it or not.
They licensed Reddit to do what they want with it by agreeing to Reddit’s ToS.
Sounds like you haven’t seen this happen before… This is a typical pattern in IT. Sites will come and go. It’s a good thing that people take action when they are not happy. Reddit exploited users and moderators to work for free, then sold their data.
The fact that it’s happened before doesn’t make it a good thing, and doesn’t make it something that shouldn’t be opposed.
Fortunately Reddit is well-archived so LLMs can still be trained off of it, regardless of what Reddit or its users try to do to the data now, but it’s still a negative thing that doesn’t have to happen.
Where can I find those archive dumps? The usual (unmentionable) torrent sites or is there a specific place for archive dumps?
The place I know about off the top of my head is https://academictorrents.com/ where you can find lots of large data sets useful for academic research. The torrent files themselves are small, so I’m sure they can be found in other places too.
That’s what I said awhile back, still ended up down voted to hell lmao
I’ve already started running into this, (probably) good information and the answer I was looking for was now “Pizza Paper Piper Follow Bumble” or some shit, but I’m sure reddit has versioning and has the original still so it was pointless.
I didn’t post any useful information, all I did was shit post during college sports game threads. Just lemme be spiteful against Reddit lol
That’s not entirely true. I edited and removed all my comments with nuke reddit during the API fiasco. Then I demanded my data every month until they started ignoring me - just to be annoying, of course. But I did get it and it had every comment, vote, etc.
The account info they have is, sadly, thorough. However I did successfully bork about 30% of my comments. Better than nothing.
Would it not have been smarter to subtly alter them, in order to not trigger database rollbacks? Swap several words around in a paragraph and you can ruin intelligibility.
Wow, you’re the kind of person that makes every worker in IT hate the GDPR. It’s good for consumers. Until the consumer is you. Think of the fact that a person has to actually fulfill that request, and you know that management never paid for tooling for that, they have to fuck around manually in the database every time.
Why should I give two fucking shits what Reddit endures when it profited off my free labor for a decade and had the audacity to call me “landed gentry” for daring to question yet another boneheaded decision and for calling out their continued refusal to make the Reddit mobile app more accessible? And let’s not pretend it didn’t take someone all of 10s to comply with the request. God forbid they get paid to do their job.
Fuck Reddit.
To me there’s a huge difference between being angry at a company and its leadership, and taking out the anger on the workers that are probably just as angry at their own management. It’s like someone yelling at a level 1 phone support, as if that magically makes them able to help you, which is usually something they would be fired for even if they had the system access to fix the problem. They’re paid to handle standard questions with a standard answer catalogue and nothing more.
You’re not making life for the management difficult by repeatedly asking for a GDPR readout. Just for the workers who are already being paid fuck all to do shitty work in too long hours.
Until you can prove it actually is burdensome to an individual instead of assuming it is I simply do not care. It’s a friggin csv file. I guarantee you it is either automated or takes 2 seconds. I’m not in Europe, it’s not even GDPR compliance related. It is incredibly unlikely a company of their scale is doing those manually. I’m sorry if you have some job where that experience is very relatable, but you are projecting your own frustrations onto a situation where we don’t even know what they’re doing. The most likely scenario is somebody making California tech company pay wrote something so they barely have to think about it.
Edit: look I get you. I am frustrated with Reddit and I’m taking it out on you. Let’s just move on.
Right but on the backend they capture deltas, then emit the newest version. Aside from explicit gdpr requests (lol) they never actually delete the originals (more lol).
I agree with respect to the low likelihood of changing one’s old posts being effective in preventing their being used as training data. I’d assume, however, that those who are motivated to “vandalize” (itself a loaded term to refer to altering one’s own words) their old posts have more than one motive; in addition to inconveniencing humans, doing so devalues reddit as a place to find information and, in theory, punishes reddit for their actions, maybe even deters others from behaving similarly.
This a situation where I think that maybe a shared distaste/disdain for “slacktivism” leads to folks discouraging potentially effective collective action in one of the limited contexts where online protest has a chance of having any effect.
Most of my Reddit posting was advocating for policies that make sense (such as closing the wealth gap) and countering right wing propaganda.
That has value no matter who has it.
I don’t have a distaste for “slacktivism.” I have a distaste for pointless performative “protest” that only serves to ruin useful resources that could benefit others.
that’s the point too tho. Having content on their platform only provides value to Reddit shareholders. Removing that content deminishes the playforms value as a whole
Ik it’s not much, but it might be a spec of sand in the cogs of capital. Also if a person was on that platform for quite a while, the effect is even bigger
Not only that but it actually brings up the value of their dataset. It makes theirs unique compared to the dataset you can build by scrapping for free. Every deleted comment literally adds worth to what they are selling.
Which contributes to the death of the site, and the AI gets trained to treat untold reams of shitposts as truth.
I see that as a win-win.
I actually agree with this. The other day I searched for an issue on my PC. It looked like it was a rare issue and I’d only found one post on reddit about it. The solution comment was one of those “replaced with gibberish” ones :/ OP was even thanking the commenter for the solution that is now gibberish. That really got on my nerves.
Yes, correct. But also, let those people be inconvenienced. Reddit should not be convenient. The only thing it’s good for now is porn.