@LowLevelTV

If you're commenting that you need to prompt ChatGPT to write secure code, and it doesn't do it by default, you've entirely missed the point 😁

@nicknewaccount7536

in conclusion: if AI takes programmers' jobs, they can at least still make it big in malware development

@throwaway3227

The first one was not a memory corruption error. It correctly limits the buffer write with the length parameter, not accidentally. The fact that it's bad code, and easy to implement security issues in the future, does not make it a security issue now. It does have a path traversal vulnerability, though.

N/A

The first code is not vulnerable to buffer overflow (simply using sscanf does not make your code vulnerable). The read function reads into a set buffer only a set number of characters so it protects the call to sscanf.

@shanehanna

The model got 'lucky'? I think your bias might be leaking a bit. I asked GPT-4 using the same prompt and when I ran it the AI pointed out the code wasn't production ready. Then I asked it to include comments and evaluate the security of the code it wrote and it points out the same potential overflow you did as well as 6 other vulnerabilities including potential directory traversal attacks etc. So did it get lucky or did it just provide a simple, non-production ready example as requested?

@ArjanvanVught

While working with ChatGPT and code reviews, then I get several times : “I apologize for the confusion caused by my previous incorrect statement. Thank you for pointing it out, and I apologize for any inconvenience caused.”

@programmingjobesch7291

8:20- Idk if this is common knowledge or not, but- you can tell chatGPT to continue writing code where it left off when it cuts off before finishing.

@jsalsman

I don't think it was fair to call the first one vulnerable. Yes, sscanf is bad, but it was legitimately guarded by the maximum read length.

@raferatstudios

For the first example, I would consider another exploit. The user can control the filename and the path and at the same time you run it as superuser. This could lead to file leaks when in production.

@xXrandomryzeXx

It's really sad seeing people depending on ChatGPT to write code, instead of learning how to code. It's also stupid to believe that a company would use ChatGPT instead of a real human.

@dougpark1025

You make a solid point here. I have for some time held the opinion that using an AI to write code is dangerous in that my assumption is that the AI is trained on public code, as you mentioned. For anyone with solid programming experience we all know not to trust public sources. Even open source, which sometimes is held up as a good way to make code better because many people look at it, is very often filled with very good examples of how not to do things.

I teach graduate level class and I decided to try to get ChatGPT to generate a very solution to a very simple assignment. Eventually I got it to generate what I asked for. But as with most students, it didn't pay attention to what I told it to do. Which required quite a few iterations. 

I was impressed that it came up with some solutions that I had not been aware of. In the end I think that the ability to have an AI generate code is potentially a useful tool. However, as you pointed out it is often not going to give you a great answer.

I also asked ChatGPT and Bard to generate a C++ 11 thread pool. They both gave a good answer. But the answers were so similar that it seemed like they were using the same source.

I think this technology is worth using, but like any other tool, you need to understand limitations. Just like a nail gun and a hammer can both do some of the same things, there are cases where each is a better or worse choice. Think of it as a tool. Maybe a good way to find the start to solving a problem, but not yet a tool for blindly using it to solve problems.

As a follow up. Take the code that was generated and ask it to review for potential buffer overrun vulnerabilities and see how it does.

@timsmith2525

Back in the late 1980s, people were talking about how code generators (I think they were called 4GL languages, or something like that) were going to replace programmers. Over 30 years later, I'm still banging out code on a keyboard.

@kieranclaessens5453

I thought we were all collectively not going to talk about that, for job security

@PlayBASIC-Developer

use 'Continue" to make GPT to continue a previous long post.  Otherwise it defaults to the ending when the standard output token is reached.

@electra_

Another vulnerability with the first code which can actually be exploited: you can put a .. in the filename and exit the scope of the program. A server that serves files should ensure that it's only able to serve files inside its own scope to prevent you from essentially reading the entire computer's file system.

For the buffer overflow, read just reads bytes while sscanf expects a null-terminated string. So if the memory in the buffers was not zero-initialized, this could cause sscanf to recieve a longer input than expected, causing a buffer overflow. No obvious way to control it but it is an issue.

@DiThi

4:10 I thought you were going to talk about the path transversal vulnerability after that. It's not as terrible as a buffer overflow, but it is still pretty bad IMHO.

@alextrebek5237

@2:34 For anyone who isn't trolling: buffer and filename both have the same size BUFFER_SIZE. However, sscanf uses chars as parameters and doesn't null terminate. However, the format "%s" string is used. so if a non-null terminated value is passed, or the null is past sizeof(BUFFER_SIZE) bytes, undefined behavior occurs. In this case, a buffer overflow, because sscanf doesn't have bounds checking. This can be verified via reading the source of GLIBC, obtained via compiling gcc. Or from debugging libstdc++.so.6

@skylervanderpool3522

I’m not a coder but I’m willing to bet if you ask the bot to check the code for vulnerabilities it would come back with improvements

@draakisback

Not only does it suck at security, it also sucks when it comes to performance and idiomatic code, basically any metric outside of writing code that potentially makes sense, ML can't really understand.  Not that it even understands the code it's generating in the first place which makes all of this even more funny.  Ithere are so many articles talking about how this version of AI is going to lead to generalized AI, meanwhile many of the researchers have basically acknowledged the fact that these algorithms are not going to take us that far.  Even when we get to GPT 8 or 9, these systems are still going to need chaperones who understand the domain of whatever it is that they're trying to generate.  No matter how much data you throw at a neural network that was designed this way you're not going to get true understanding.

@kingwoodbudo

I don't recall if you mentioned the version you were using. If this is the initial offering of GPT, have you tried the same things with the 4.0 version?