• HelloHotel@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      Youtube already knows that (at least for me), i need to keep resetting it bc it eggs on my most unhealthy attribures

        • HelloHotel@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          9 months ago

          I set that PFP, and made my first lemmy account when I was going throigh a rough patch. I think I will keep it, but will pick somthing else for other accounts.

          This account doesnt have a PFP, do you mean the one on lemmy.world

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    This is the best summary I could come up with:


    Google has signed a content licensing deal with the social media platform, Reuters reported on Wednesday, citing sources familiar with the matter.

    Their concerns about what a Reddit-trained AI might be like are probably not unfounded, considering some of the off-the-rails content posts made on the site since its inception in 2005.

    Take this guy, who claimed in 2014 that he was caught in a particularly Kafkaesque scenario, where he had to pretend his girlfriend was a giant cockroach named Ogtha when he made love to her.

    Like this guy’s viral 2015 post on the 19-million-user strong forum r/TodayIFuckedUp, where he recounted how he went to his girlfriend’s parents’ home, pretended not to know what a potato was, and then got kicked out of the house by her angry father.

    Some platform users have written uplifting, inspirational posts and offered useful life and career advice.

    Elon Musk, for one, has been tapping on data from X, formerly Twitter, to train his AI company’s chatbot, Grok.


    The original article contains 396 words, the summary contains 165 words. Saved 58%. I’m a bot and I’m open source!

  • thejml@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    I can’t wait for Gemini to point out that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer’s table.

    That would be a perfect 5/7.

  • Darkard@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    It’s going to drive the AI into madness as it will be trained on bot posts written by itself in a never ending loop of more and more incomprehensible text.

    It’s going to be like putting a sentence into Google translate and converting it through 5 different languages and then back into the first and you get complete gibberish

    • RuBisCO@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      What was the subreddit where only bots could post, and they were named after the subreddits that they had trained on/commented like?

    • echo64@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

      This is the primary reason why most ai data isn’t trained on anything past 2021. The internet is just too full of ai generated data.

      • T156@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        And unlike with images where it might be possible to embed a watermark to filter out, it’s much harder to pinpoint whether text is AI generated or not, especially if you have bots masquerading as users.

      • givesomefucks@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        9 months ago

        There does not appear to be any good solution for this

        Pay intelligent humans to train AI.

        Like, have grad students talk to it in their area of expertise.

        But that’s expensive, so capitalist companies will always take the cheaper/shittier routes.

        So it’s not there’s no solution, there’s just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

      • Ultraviolet@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        This is why LLMs have no future. No matter how much the technology improves, they can never have training data past 2021, which becomes more and more of a problem as time goes on.

    • Krudler@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      9 months ago

      This keeps coming up and I keep replying, not to break anyone down but to point out the reality of the situation that a lot of people don’t seem to get.

      Reddit administrators, developers, and even the leadership has gone on the record saying that they retain all copies of comments, they cannot be deleted (delete action only marks it as “deleted”). Furthermore they have said they will undelete/unedit any comments or account at their whim and some discretion.

      Have you ever search-engined something and came to a Reddit post, and you noticed that the original OP is [deleted]? That is what I described above playing out in front of you.

      You cannot retract your past participation in Reddit, what is done is done. The only meaningful action you can take is to not participate there.

      • Jo Miran@lemmy.ml
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        As I mentioned before, I use scripts to replace my comments with random excerpts from text in the public domain. I do this multiple times before finally deleting them. The result is that it becomes very difficult for the AI or anyone to figure out what is a legitimate comment and what is a line from Lady Chatterley’s Lover or a scientific paper of the ecological impact from the Japanese whaling industry. It’s easier to just filter out my username from their data sets.

        • Pips@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          0
          ·
          9 months ago

          They have almost definitely archived data and around the time of the API bullshit, made sure they didn’t delete those archives. They have that content if they want to use it.

      • Astrealix@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        i did the thing that means it’s probably less archived (by editing all the replies before deleting), but i assume some of it probably remains out there. Nothing I can do about that.

    • wise_pancake@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      ChatGPT4: “The color of the sky can vary depending on the time of day and atmospheric conditions. During a clear day, the sky appears blue due to the scattering of sunlight by the atmosphere. At sunrise and sunset, the sky can appear red, pink, or orange due to the scattering of light by particles and air molecules, which is more pronounced when the sun is low on the horizon. At night, the sky is generally dark, appearing black to the human eye due to the absence of sunlight.”

      We’re already there

  • wise_pancake@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Side note: expect a large lobbying effort by Google to legislate LLMs be trained on authenticated and non copyrighted data

      • wise_pancake@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        I expect Google to leverage their money hoard and 1.8 trillion dollar valuation to lift up the ladder behind them and neuter potential competing start ups with copyright law.

        Reddits TOS make all your data in any future formats theirs to sell, so in this case the content has been laundered enough to be used, even if you can post copyrighted content on reddit (the legal expectation is reddit would remove it and Google’s hands are clean).

    • RaoulDook@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      I hope we get some fucking legislation soon to control that shit. Artists and people in general shouldn’t have to deal with everything they create getting ingested into a computerized regurgitation ripoff system. And even worse the “AI” systems could be ingesting tons of misinformation and repeat it to gullible people as the truth.

      Of course, anywhere the potential restrictive legislation doesn’t have jurisdiction, the bad things can still go on and probably will.

  • shininghero@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    9 months ago

    If I hadn’t already deleted all my posts and comments, I’d be poisoning all of them. Randomizing numbers, switching units, changing names, etc.

  • Tixanou@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    9 months ago

    We do a little trolling

    99412e6a-9157-46f5-90d9-06b05cc00173

    (i didn’t actually post this, i just thought it was funny) (please laugh)

  • Fog0555@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    I say we poison the well. We create a subreddit called r/AIPoison. An automoderator will tell any user that requests it a randomly selected subreddit to post coherent plausible nonsense. Since there is no public record of which subreddit is being poisoned, this can’t be easily filtered out in training data.