ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

ForgottenFlux@lemmy.world · 6 months ago

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

Nobody@lemmy.world · 6 months ago

Billions and billions invested to produce accuracy slightly less than flipping a coin.

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

Considering that the average person would likely give answers slower and less accurately (most people know exactly 0 programming languages), being correct almost half of the time in seconds is a pretty impressive performance.

thejml@lemm.ee · 6 months ago

48% is still better than the Punxsutawney Phil.

Eheran@lemmy.world · edit-2 6 months ago

Still the same shit study that does not even name the version they used…? The one posted here 1 or 2 days ago?

ForgottenFlux@lemmy.world · 6 months ago

Still the same shit study that does not even name the version they used…?

The answer to your question would be evident to you if you had taken the time to read what you are deeming “the same shit study.” The study mentions the version used on multiple occasions:

For each of the 517 SO questions, the first two authors manually used the SO question’s title, body, and tags to form one question prompt and fed that to the free version of ChatGPT, which is based on GPT-3.5.

Additionally, this work has used the free version of ChatGPT (GPT-3.5) for acquiring the ChatGPT responses for the manual analysis.

Hence, for this study, we used the free version (GPT-3.5) so that the results benefit the majority of our target populations.

Please ensure you have read the study before making uninformed remarks.

The one posted here 1 or 2 days ago?

I have already checked for duplicates within this community before posting, and the post you are talking about is not present.

Once again, please ensure your facts are accurate before making incorrect statements.

d3Xt3r@lemmy.nz · edit-2 6 months ago

I’m the footnotes they mention GPT-3.5. Their argument for not testing 4 was because it was paid, and so most users would be using 3.5 - which is already factually incorrect now because the new GPT-4o (which they don’t even mention) is now free. Finally, they didn’t mention GPT-4 Turbo either, which is even better at coding compared to 4.

PM_ME_YOUR_ZOD_RUNES@sh.itjust.works · 6 months ago

Anyone can use GPT-4 for free. Co-pilot uses GPT-4 and with a Microsoft account you can do up to 30 queries. I’ve used it a lot to create Excel VBA code for work and it’s pretty good. Much better than GPT-3.5 that’s for sure.

catloaf@lemm.ee · 6 months ago

4 is free for a very small number of queries, then it switches back to 3.5. Or at least that’s what happened to me the other day.

corroded@lemmy.world · 6 months ago

I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.

Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.

ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.

stufkes@lemmy.world · edit-2 6 months ago

This has been my experience as well. If you already know what you are doing, LLMs can be a great tool. If you are inexperienced, you cannot assess the quality nor the accuracy of the answers, and are using the LLM to replace your own learning.

I like to draw the parallel to people that have learnt to paint only using digital tools. They often show a particular colouring that shows a lack of understanding of colour theory. Because pipette tools mean that you never have to mix colours, you never have to learn to do so. Painting with physical paint isn’t superior, but it presents a hurdle (mixing paint) that is crucial to learn to overcome. Many digital-only artists will still have learnt on traditional media. Once you have the knowledge, the pipette and colour pickers are just a tool, no longer inhibiting anything.

phoneymouse@lemmy.world · 6 months ago

Hey! I can keep my job for at least a few more years

1984@lemmy.today · 6 months ago

Actually the 4o version feels worse than the 4. Im getting tons of wrong answers now…

AIhasUse@lemmy.world · 6 months ago

Yeah, it’s not supposed to be better than 4 for logic/reason/coding, etc… its strong points are it’s natural voice interaction, ability to react to streaming video, and its fast and efficient inference. The good voice and video are not available to many people yet. It is so efficient that it is going to be available to free users. If you want good reasoning, then you need to stick with 4 for now, or better yet, switch to something like Claude Opus. If you really want strong reasoning abilities, then at this point, you need a setup using agents, but that requires some research and understanding.

Siegfried@lemmy.world · 6 months ago

Well, I do it 99% of the times

efstajas@lemmy.world · 6 months ago

Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.

Jimmyeatsausage@lemmy.world · 6 months ago

Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.

InternetPerson@lemmings.world · 6 months ago

That’s a good way to use it. Like every technological evolution it comes with risks and downsides. But if you are aware of that and know how to use it, it can be a useful tool.
And as always, it only gets better over time. One day we will probably rely more heavily on such AI tools, so it’s a good idea to adapt quickly.

person420@lemmynsfw.com · 6 months ago

I tend to agree, but I’ve found that most LLMs are worse than I am with regex, and that’s quite the achievement considering how bad I am with them.

efstajas@lemmy.world · 6 months ago

Hey, at least we can rest easy knowing that human devs will be needed to write regex for quite a while longer.

… Wait, I’m horrible at Regex. Oh well.

Furbag@lemmy.world · 6 months ago

People down vote me when I point this out in response to “AI will take out jobs” doomerism.

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

I mean, AI eventually will take our jobs, and with any luck it’ll be a good thing when that happens. Just because Chat GPT v3 (or w/e) isn’t up to the task doesn’t mean v12 won’t be.

reksas@sopuli.xyz · 6 months ago

It could be good thing, but price for that is making being unemployed okay.

Furbag@lemmy.world · 6 months ago

Yes, this is also true. I see things like UBI as an inevitable necessity, because AI and automation in general will eliminate the need for most companies to employ humans. Our capitalistic system is set up in a way such that a person can sell their ability to work and provide value to the owner class, but if that dynamic is ever challenged on a fundamental level, it will violently collapse when people who can’t get jobs because a robot replaced them either reject automation to preserve the status quo or embrace a new dynamic that provides for the population’s basic needs without requiring them to be productive.

But the way that managers talk about AI makes it sound like the techbros have convinced everybody that AI is far more powerful than it currently is, which is a glorified chatbot with access to unfiltered Google search results.

NoLifeGaming@lemmy.world · 6 months ago

I’m not so sure about the “it’ll be good” part. I’d like to imagine a world where people don’t have to work because everything is done by robots but in reality you’ll have some companies that will make trillions while everyone else will go hungry and become poor and homeless.

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

Yes, that’s exactly the scenario we need to avoid. Automated gay space communism would be ideal, but social democracy might do in a pinch. A sufficiently well-designed tax system coupled with a robust welfare system should make the transition survivable, but the danger with making that our goal is allowing the private firms enough political power that they can reverse the changes.

smnwcj@fedia.io · 6 months ago

This begs some reflection. what is a"job", functionally? What would be needed for losing it to be good?

I suspect a system with jobs would not eradicate jobs, just change them.

assassin_aragorn@lemmy.world · 6 months ago

If it’s possible for AI to reach that level. We shouldn’t take for granted it’s possible.

I was really humbled when I learned that a cubic mm of human brain matter took over a petabyte to map. It suggests to me that AI is nowhere close to the level you’re describing.

Leate_Wonceslace@lemmy.dbzer0.com · edit-2 6 months ago

It suggests to me that AI

This is a fallacy. Specifically, I think you’re committing the informal fallacy confusion of necessary and sufficient conditions. That is to say, we know that if we can reliably simulate a human brain, then we can make an artificial sophont (this is true by mere definition). However, we have no idea what the minimum hardware requirements are for a sufficiently optimized program that runs a sapient mind. Note: I am setting aside what the definition of sapience is, because if you ask 2 different people you’ll get 20 different answers.

We shouldn’t take for granted it’s possible.

I’m pulling from a couple decades of philosophy and conservative estimates of the upper limits of what’s possible as well as some decently-founded plans on how it’s achievable. Suffice it to say, after immersing myself in these discussions for as long as I have I’m pretty thoroughly convinced that AI is not only possible but likely.

The canonical argument goes something like this: if brains are magic, we cannot say if humanlike AI is possible. If brains are not magic, then we know that natural processes can create sapience. Since natural processes can create sapience, it is extraordinarily unlikely that it will prove impossible to create it artificially.

So with our main premise (AI is possible) cogently established, we need to ask the question: “since it’s possible, will it be done, and if not why?” There are a great many advantages to AI, and while there are many risks, the barrier of entry for making progress is shockingly low. We are talking about the potential to create an artificial god with all the wonders and dangers that implies. It’s like a nuclear weapon if you didn’t need to source the uranium; everyone wants to have one, and no one wants their enemy to decide what it gets used for. So everyone has the insensitive to build it (it’s really useful) and everyone has a very powerful disincentive to forbidding the research (there’s no way to stop everyone who wants to, and so the people who’d listen are the people who would make an AI who’ll probably be friendly). So what possible scenario do we have that would mean strong general AI (let alone the simpler things that’d replace everyone’s jobs) never gets developed? The answers range from total societal collapse to extinction, which are all worse than a bad transition to full automation.

So either AI steals everyone’s job or something worse happens.

assassin_aragorn@lemmy.world · 6 months ago

Thanks for the detailed and thought provoking response. I stand corrected. I appreciate the depth you went into!

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

You’re welcome! I’m always happy to learn someone re-evaluated their position in light of new information that I provided. 🙂

magic_lobster_party@kbin.run · 6 months ago

Even if AI is able to answer all questions 100% accurately, it wouldn’t mean much either way. Most of programming is making adjustments to old code while ensuring nothing breaks. Gonna be a while before AI will be able to do that reliably.

Chrishmax@discuss.online · 6 months ago

Using the best signals, you can turn $500 into $3000 in just a few days of trading in the future and on the site, just start copying our signals and start enjoying your trades.As for a referral for good trading, checking out (Expert~~Eloi$e Wilbert) on ilint$ttrragrram, They have a user-friendly platform and offer a wide range of trading options.

Dizzy Devil Ducky@lemm.ee · 6 months ago

Rookey numbers! If we can’t get those numbers up to at least 75% by next quarter, then the whippings will occur until misinformation increases!

Vespair@lemm.ee · 6 months ago

Anyone else tired of these clickbait headlines and studies about LLM which center around fundamental misunderstandings of how LLMs work, or is it just me?

“ChatGPT didn’t get a single answer on my algebra exam correct!!” Well yes, because LLMs work on predictive generation, not traditional calculation, so of course they’re not going to do math or anything else with non-language-based patterns properly. That’s what a calculator is for.

All of these articles are like complaining that a chainsaw is an inefficient tool for driving nails into wood. Yeah; because that’s not the job this tool was made for.

And it’s so stupid because there are ton of legitimate criticisms about AI and the AI rollout to be had; we don’t have to look for disingenuous cases of misuse for critique.

OhNoMoreLemmy@lemmy.ml · 6 months ago

That would be fine, if people weren’t using LLMs to write code, or to do school work,

But they are. So it’s important to write these articles that say “if you keep using a chainsaw to drive nails, here are the limitations you need to be aware of.”

Vespair@lemm.ee · 6 months ago

I see your point and I agree, except that that isn’t what these headlines are saying. Granted, perhaps that’s just the standard issue of sensationalism and clickbait rather than being specific to this issue, but the point remains that while the articles may be as you claim, the headlines are still presented instead as “A chainsaw can’t even drive a simple nail into wood without issue and that’s why you should be angry anytime you hear a chainsaw.” I dunno. I’m just so exhausted.

foremanguy@lemmy.ml · 6 months ago

We have to wait a bit to have an useful assistant (but maybe something like copilot or more coded focused ai are better)

Chrishmax@discuss.online · 6 months ago

Using the best signals, you can turn $500 into $3000 in just a few days of trading in the future and on the site, just start copying our signals and start enjoying your trades.As for a referral for good trading, checking out EXPERT ELOISE WILBERT ON INSTAGRAM, They have a user-friendly platform and offer a wide range of trading options.

OpenStars@discuss.online · 6 months ago

So it is incorrect and verbose, but also comprehensive and using a well-articulated language style at the same time?

Also “study participants still preferred ChatGPT answers 35% of the time”, meaning that the overwhelming majority (two-thirds) did not prefer the bot answers over the human(e), correct ones, that maybe were not phrased as confidently as they could have been.

Just say it out loud: ChatGPT is style over substance, aka Fox News. 🦊

just another dev@lemmy.my-box.dev · 6 months ago

ChatGPT is not good enough to use as a substitute for (whatever), but can be a useful tool for someone who can quantify its output.