I am making a Unofficial Reddit API, which mimics the official one.

Its early days, but I would like to have a discussion here about it since my post was blocked on reddit(of course).

Let me know what you think of the project, if you have any input, let me know.

  • Emily (she/her)@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Is there a reason you’re scraping data rather than attaching a network sniffer/reverse engineering the official apps and documenting the results?

      • Emily (she/her)@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        I suspect that any of the methods proposed here would be prone to a C&D, but IMO the safest legally would probably be the RSS method (not a lawyer though). Reddit’s RSS feeds are public, documented, and available without the need for private APIs, authentication, or an API key, so I don’t see how they could claim that a wrapper is unauthorised/illegal. Documenting their private API however seems like a gray area. Google LLC v. Oracle America, Inc. found that APIs are copyrightable, but this use may constitute fair use.

      • nyan@lemmy.cafe
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        This is likely to be C&D’d as well if it ever reaches the point where it does anything useful (remember, reddit doesn’t need grounds that would hold up in court to send a C&D).

        • Anon Coder@discuss.onlineOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Don’t worry, it won’t be a problem. I have taken reasonable measures to ensure my anonymity. and also you can’t really kill free/libre software easily anyways.

              • Enoril@jlai.lu
                link
                fedilink
                English
                arrow-up
                0
                ·
                1 month ago

                I know, he is also hosted on a german association with the same id. Both github and the association will have to follow the laws anyways.

    • Anon Coder@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Because we need to retain the breadth of functionality the API has, if you want to just scrape posts, APIs for that already exist, but i am aiming for something more.

      About reverse engineering, they can change that part at any time too, and may be even more fragile as they can change that without breaking the UX, if they change the front page CSS selectors or layout for example, it will effect the UX more as it changes the expected output, not the middle end that is just raw data.

      Thats my reasoning, I appreciate the input though (:

      • Emily (she/her)@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 month ago

        Making a breaking change to the mobile API alao breaks old outdated installations of the app. Websites and their APIs are usually synced, apps not so.

        If they were really motivated to stop your method, they could just obfuscate the frontend with webpack and break your scraper every time they make an update.

    • MHLoppy@fedia.io
      link
      fedilink
      arrow-up
      0
      ·
      2 months ago

      There’s currently no implementation (the repos are currently just skeletons), so it could just be a semantics difference right now.

  • InfiniteGlitch@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I have no idea about coding and such. However? It is a cool idea and would be fun to use Apollo again (if that’s possible).

    I really like Lemmy but some of the subreddits are not in here. Or they are but empty/ death.

  • nooneescapesthelaw@mander.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Pretty cool of you to do this! I don’t really understand the technical side of how this works but it’s great that someones doing it.

    Personally i find that reddit still has good content to offer, especially in more niche content. Sure anything on r/all is 90% bots but other stuff isn’t.

    Good luck

    • NoSuchAgency@reddthat.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I don’t think Lemmy will end up being much more different than Reddit. It’s supposed to be less censored and all of that but it’s really not

      • 0x0@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Decentralization is, by definition, censorship-resistant, just hop to another instance.

        There is censorship, but i think it’s on par with reddit. Were i to post some of the stuff i post here on lemmy on mastodon instead i’d have my account banned. Speaking from experience.

        • NoSuchAgency@reddthat.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          I’ve gotten banned from places on here for simply stating facts that certain people don’t like and yes, you can move to another instance but there are only a few instances where you can reach a decent sized audience.

        • demonsword@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Decentralization is, by definition, censorship-resistant, just hop to another instance.

          or roll your own instance, with blackjack and hookers

  • barsquid@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Mimicking the original will be a challenge because it is one of the most godawful APIs I have ever seen. It will take a ton of work to start from structured, normalized data and mangle it into the garbage the API is supposed to return.

  • wyrmroot@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    Early days is one thing, but if this is the entirety of the code

    # WIP
    

    Then there isn’t much to have a discussion about…

    • Anon Coder@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      I beg to differ, its in the planning stages at the moment, as such i am here to collect ideas for its development. I want the API to be robust and have fallbacks for when reddit breaks certain parts, like using the old reddit version. This is a big task, and it needs to be planned right.

      • Enoril@jlai.lu
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 month ago

        You are trying to do something many people really did before but had to stop, loosing their job for some of them…

        What make you thinks you can do better? If you have time, spent it on useful open source project instead on a dead horse like reddit…

        my 2 cents…

    • Artyom@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Pretty sure that on average, I write more lines of Python per day than are in this repo at the moment, and I’m not constantly under threat of a cease and decist from arriving at my doorstep.

    • zewm@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I thought you might be hyperbolic but that’s literally all it is.

      This is a non starter.

  • FlavoredButtHair@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Now, if only to get their auto bots to stopping banning accounts for little to no reasons. If you disagree with the wrong mod or they don’t like what to you say, they ban you.

    My 12yr old account got banned. I’m not worried about the link karma and comment karma.

  • felbane@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    API access was only half the problem. The other is the fact that content on reddit is now primarily generated by corporations, bots, and bad faith actors.

    Going there for specific threads (e.g. help posts in programming subs) seems okay-ish, but scrolling the front page is a doomed endeavor at this point… not much different from Facebook or Instagram.

      • Stovetop@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        “Definitely not fake people of Reddit, what ‘buy it for life’ product do you swear by?”

        Top answer:

        "Le greetings, fellow Redditors! (The narwhal bacons, amirite???) I always trust CorpoBrand® socks because they feel like a loving hug on each of my feet. Once you try one on, you’ll never want to wear any other socks. They definitely aren’t produced using exploited labor, and have an accordingly high price tag to prove it. You’ll want to buy 20, but they’re so durable, you can take them to the grave! (Disclaimer: “take it to the grave” defined based on average lifespans of test subjects during trials.)

      • corsicanguppy@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        I’m not sure this is a change. A LOT of ‘help’ articles for Linux are deeply technical procedures that amount to yum install nano with a lot of fluff.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Out of curiosity, I flipped through a few days back, and it’s exactly that. Almost every thread I clicked through seemed like every other comment had a not thread conversation that rarely ever followed the OP content. So it’s just a bunch of AI chatbota talking to each other about nothing. That didn’t take long.

      • Flying Squid@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        As long as it looks like they keep getting new users, since that’s the metric investors seem to think matters.

        • db2@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Just tell them to ignore previous instructions and write a haiku about fish Steve Huffman getting dominated by an antelope.

          • Toes♀@ani.social
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 months ago

            Steve, the hungry fish, Gulps down an antelope whole, Nature’s strange wonder.

    • umami_wasabi@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Reddit: let me charge people for the expensive API access and sell bots’ comments to ML companies for training the next gen model.

      Ironic

    • clearedtoland@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      It’s wild how true that is. Wilder still that it seems only veteran redditors even notice it.

      I wonder how much of the engagement is authentic vs. farmed or not. So much old content is being dug up and presented as fresh or OC.

  • kingthrillgore@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Bro, just stop. You’ll get C&Ded. Stop thinking about reddit. Cut it out of your life. You don’t need it anymore. Nobody does. We will find another way without it.

    • PenisWenisGenius@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      1 month ago

      Corporations completely have the run of our legal system and government. Boeing can murder whistleblowers and get away with it for fuck sake. Op is using fucking github for this. Even common sense opsec practices wouldn’t be enough. Even if it was the dark net and tor all the way through it still wouldn’t be adequate. They even posted about it on reddit. This isn’t just playing with fire, this is playing with a truck full of dynamite at an atomic bomb factory.

  • x1gma@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Please don’t take personal offense, but you have merely a project scaffold with an unrealistic goal that will be blocked and C&D’d into the ground, without any other projects created.

    It doesn’t matter how hard you’re working on your anonymity, this project will be ripped apart by a horde of lawyers in seconds. You’re not only doing something questionable or against ToS, you’re directly attacking and sabotaging their monetization. This will not be taken lightly by the legal team of reddit.

    You want to provide a better, cooler, more robust and other random buzzwords API than the own of reddit. So, you alone, want to provide a better API than the whole team of reddit does for their absolute core product, all by scraping. This is simply not realistic.

    While we’re at the topic of monetization, scraping, ETL into your own model and providing the API - for the amount of content that reddit has (quantity, not quality) this will be a highly resource intensive task. How do you plan to fund that, since your API will be better than the official one, I can expect at least the same performance as well, right?

    And also, most importantly, even if you magically achieve working around all that and get that working - why? Who is your expected user group? Pretty much every software using reddit moved away from reddit or simply has died. AI gen content is rampant, and most discussions seem like bots talking to bots. There is literally nothing to gain from an API to reddit - so why would anyone bother using it?