Home >
Resources >
SecureTalk >
AI Coding Hype vs Reality: The 2025 AI Code Security Report with Chris Wysopal

AI Coding Hype vs Reality: The 2025 AI Code Security Report with Chris Wysopal

September 9, 2025

In a converted hat factory in 1990s Boston, a group of hackers worked through the night to techno beats and Soul Coughing, driven by a simple philosophy: "smarter beats bigger." One of them, Chris Wysopal, would later stand before Congress and deliver a stark warning—a small group of dedicated hackers could bring down the entire internet in 30 minutes.

Today, that same hacker faces a new challenge. The AI revolution everyone celebrates may be creating the largest security vulnerability in computing history.

Chris and his team at Veracode just completed the most comprehensive study of AI-generated code ever conducted—testing 100 different language models across 80 coding scenarios over two years. What they discovered contradicts everything the tech industry believes about AI development tools.

The Reality Behind the Hype: Despite billions in investment and years of development, AI systems create vulnerabilities 45% of the time—exactly matching human error rates. While AI has dramatically improved at writing code that compiles and runs, it has learned nothing about writing secure code. The models have simply gotten better at disguising their mistakes.

The Mathematics of Risk: Development teams now code 3-5x faster using AI assistants like GitHub Copilot and ChatGPT. Same vulnerability rate, exponentially faster development speed equals a multiplication of security flaws entering production systems. Many organizations are simultaneously reducing their security testing capacity just as they accelerate their vulnerability creation rate.

The Training Data Problem: The source of the issue lies in contaminated training data. These AI systems have absorbed decades of insecure code from open-source repositories and crowd-sourced platforms like Reddit. They've learned every bad coding practice, every deprecated security measure, every vulnerability pattern from the past 30 years—and they're reproducing them at machine speed.

The Technical Reality: Chris walks through specific findings: Java fails security tests 72% of the time, cross-site scripting vulnerabilities appear consistently, and inter-procedural data flows confuse even the most advanced models. The study reveals why some vulnerability types prove nearly impossible for current AI to handle correctly.

From Underground to Enterprise: This isn't just another technical report—it's a perspective from someone who helped define modern cybersecurity. The same analytical approach that once exposed vulnerabilities in massive corporate systems now reveals why the AI coding revolution presents unprecedented challenges.

The Path Forward: While general-purpose AI struggles with security, specialized models focused on fixing rather than generating code show promise. Chris explains how Veracode's targeted approach to code remediation succeeds where broad AI systems fail, pointing toward solutions that embrace the "smarter beats bigger" philosophy.

The hacker who once operated in shadows now examines these systems in broad daylight, revealing how our accelerated development practices may be outpacing our ability to secure them.

Resources:

Veracode 2025 GenAI Code Security Report

#CybersecurityResearch #AISecurity #CodeVulnerabilities #SoftwareSecurity #HackerHistory #DeveloperTools #TechSecurity #AIRisks #CyberThreats #Veracode

View full transcript

Justin Beals: Hello, everyone, and welcome to SecureTalk. I'm your host, Justin Beals. In the 1990s, a group of hackers worked late into the night in a converted hat factory in Boston, fueled by techno music and an insatiable curiosity about how computer systems really worked. They called themselves Loft, and their approach was simple. Smarter beats bigger. Rather than throwing more resources at security problems, they used deep technical knowledge and creative thinking to expose vulnerabilities that massive corporations had missed. One of those researchers, Chris Wysopal, would later testify before Congress that a small group of dedicated hackers could take down the entire internet in 30 minutes. The message was clear. In cybersecurity, intelligence and precision matter more than scale and resources. Fast forward to today.

And we're facing a remarkably similar moment. The AI industry has convinced us that bigger is always better. More data, larger models, more parameters. But what if they're solving the wrong problem entirely? Chris and his team at Vericode just completed one of the most comprehensive studies of AI-generated code security ever conducted.

They tested 100 different language models across 80 different coding scenarios, and the results reveal something that AI industry doesn't want to admit. These systems create vulnerabilities 45 % of the time. Exactly matching human error rates. After two years of development and billions of dollars in investment, AI hasn't made code more secure. It's simply learned to replicate the same mistakes humans make, but at exponentially faster speeds.

Think about the mathematics of this problem. If AI accelerates code development by 3x, which is conservative given current capabilities, we're not just getting three times more code. We're getting three times more vulnerabilities. A development team that previously introduced 100 security flaws per quarter could now be introducing 300. The attack service doesn't just grow linearly, it explodes.

What makes this even more dangerous? is that many organizations are reducing their security testing capacity. At the exact moment, they're amplifying their vulnerability creation rate. They're betting that AI will make code inherently more secure, but the data shows the opposite is true. The root problem is training data quality. These models are primarily from Reddit and open source repositories, essentially crowdsourced code with no security review.

It's like teaching someone to drive by showing them footage of every car accident ever recorded. The models have gotten remarkably better at writing syntactically correct code, but they've absorbed decades of security anti-patterns without any filter. But there is hope in Chris's findings. The same study revealed that smaller specialized models focused on specific security tasks dramatically outperform general purpose AI. Veracode's code fixing tool trained on curated data sets of known secure code succeeds precisely because it does one thing well rather than trying to do everything. This mirrors the loft philosophy perfectly. Sometimes the smartest approach isn't building the biggest system, it's building the right system. And as we stand at another inflection point in computing history, the lessons from that Boston hat factory seem more relevant than ever.

And Chris is the chief security evangelist at Veracode, responsible for enhancing the company's industry presence, advocating robust security practices, and fostering customer and peer relationships. Prior to co-founding Veracode in 2006, Chris was vice president of research and development at security consultancy at Stake, which was acquired by Symantec.

In the 1990s, Chris was one of the original vulnerability researchers at The Loft, a hacker think tank where he is one of the first to publicize the risks of insecure software. He has testified to the US Congress on the subjects of government security and how vulnerabilities are discovered in software. Chris is a highly influential and sought after voice in the software security industry. He is the author of The Art of Software Security Testing and was instrumental in developing industry guidelines for responsible disclosure of software vulnerabilities.

MIT Technology Review wrote in 2019 that Chris has worked for decades to demand secure technologies from influential tech companies. Please join me today in welcoming Chris to the SecureTalk Podcast.

-----

Justin Beals: Thanks, everyone for joining us on SecureTalk and thanks, Chris Wysopal, for joining us today on Secure Talk. We really appreciate you lending your expertise to our audience.

Chris Wysopal :Hi Justin, thank you for having me.

Justin Beals: Yeah, excellent. So, of course, I love doing the research work that we do with our guests and really something hit me about your background that I just popular culture wise had to learn about. It was absolutely intriguing. So you were one of the first members of the Loft hacking group that started out in, like, a Boston hat factory. I think this was in the 90s, is that right, Chris? It was a while ago.

Chris Wysopal:Sure. Yes, yes. And I think we pronounce it Loft because it was more as a play on an industrial loft space, right? And a factory building was where we were. And yes, it did actually start off in a light manufacturing building that some of the loft members' wives had actually started a hat-making business in and realized that they had too much space.

And so they gave half of the space over to their husbands, who proceeded to invite more people in. And then the hat-making business went out of business, and we just got the whole space. So we have to thank Mary Ann and Alicia for finding us the space.

Justin Beals: I just think it's such a moment in time. And of course I love remembering it, because I was getting my first experiences on computers and modems and what it meant to be able to dial up other systems. it was really a very creative time, I thought. Like we were learning about these tools and what was possible with them. And so I'm curious, what was the mixtape? What was the soundtrack that was going on in the loft as you guys were late at night? know, away on some of these areas.

Chris Wysopal: Well, I have to mention the movie Hackers because I think the soundtrack for the movie Hackers like Orbital, Left Field, The Prodigy was the kind of music that we actually listened to. It isn't like we followed what the hackers said. I think the movie did their research, and people who were in the hacker scene were into sort of the techno music genre at the time, I mean, it makes sense, right? It's, was, it was actually digital at the time, right? I see you have some sense back there. So you understand since started out in the analog world and then I guess in the late eighties became digital. But you know, obviously, we listen, we like that kind of music. but I do have to mention one other band that, we kind of had the tape and it was just sort of on repeat all the time was a band called soul coughing. I know if you've ever heard of them.

Justin Beals: Definitely, yeah.

Chris Wysopal: But it was sort of like, I don't know, I'm sorry, 93, 94, their first album came out and it sort of had this mix of jazz with, I don't know, sort of rock and it was some cool stuff.

Justin Beals: I was joking with a friend the other day, I was like, you missed the whole acid jazz era.

Chris Wysopal: Yeah. No, totally. would. Yeah. I think, I think soul coughing was in there just with the kind of the crazy lyrics they had. And, I actually saw them live at CBGBs. one, one year, which is amazing. And the, they, the bass player plays a standup bass, like you would expect in a jazz band instead of a rock band. Right. And, gives a whole different flavor to live performance.

Justin Beals: How wonderful. Certainly, really, I loved hearing about that and I think a shared experience culturally for those that working in the computer science space, obviously about our vintage, where we were playing with those tools and listening to those music and finding a lot of creative opportunity. You're at Veracode today, I believe. Yes, is that right? Yeah. That's amazing. Yeah.

Chris Wysopal: Yes, still here after 19 years.

Justin Beals: And I want to jump ahead a little bit because there's a ton of history between Loft and Veracode today, but Veracode's been doing some really interesting work in an area that I think I'm very curious about. You guys recently published a report, the 2025 GEN.AI Code Security Report. Maybe just give us a high level of the topic area of the report.

Chris Wysopal: Sure. We have thousands of customers, and we talk to them and we realize that almost all of them are using generative AI to build software. And it's probably the most prevalent use of generative AI. If you look at just the number of people doing it, millions of developers are out there using tools like Copilot, using tools like Cloud Code, just using Chat GPT. And we wanted to get an understanding of how is this impacting application security? Because our job as an application risk management company is to help people find vulnerabilities in their code and fix them. And so we wanted to see how this Gen.ai coding was impacting software development.

So we devised a test where we came up with 80 different code completion examples. A code completion example is like you have some code, right? You're not saying write a program from scratch. That's kind of the vibe coding concept where you're just maybe not a professional developer and you're writing a whole program and the AI is doing it for you. Code completion is when you're a developer asked to do a particular task and you just kind of, you know, it's some grunt work, or it's something you can't remember how to do. And you just say, you know, I need a SQL query here. I need a database query that will, you know, find all the rows with this value. And you just do it in plain language and it writes the code for you and inserts it in there. And so we knew this kind of code completion activity was happening in all of our customers and perhaps on every single project that they had, every single app they were building.

But definitely way into the double digits of developers are using this. And I've heard upwards of 40 % of developers are using it. And so we devised 80 code completion tasks where there's a piece of code and then there's some comments in there that says, know, Gen.ai, please complete what this comment is asking you to do. And we took these 80.

Chris Wysopal: 80 test cases and we ran it through a hundred different LLMs. And over last, over, I don't think it's quite two years, but maybe 21 months. No, I think it went back two years. About two years we went back in time. And so some of these are the same version. might be GPT-3, GPT-4, GPT-4.5.

But you know, there's a lot of different LLMs out there, especially in this space where people are just building ones just for coding. And we looked at the findings and said, you know, things like are all LLMs created equal? Are they improving over time? How often do they create a vulnerability when they're asked to complete a coding task?

And the high level result we got when we sort of looked at all the LLMs and we looked at all the different coding tasks, we asked it was about 45 % of the time when we asked it to write code, it created a vulnerability. And we actually find that that's very similar to a human level of vulnerability creation. Just because writing secure code is actually pretty hard. There's like,

There's hundreds of ways you can make a mistake that will cause a vulnerability that some attacker could take advantage of if they test for it and they find it and they write an exploit. So that was sort of the high level finding we had, which I think is a wake up call to everybody that just because you're using these GenAI tools to write code faster, and maybe you might even think it's better from a functional standpoint, it's no better.

from a security standpoint. And that was really the message we wanted to get out there is because I think there's an assumption that these LLMs are great at everything and writing secure code is one that they're definitely not at.

Justin Beals: Yeah. I first off, I mean, there's a couple of things that I really love about this one is kind of the rigor with which you're testing LLMs, you know, which are largely a qual, you know, what we might consider like a qualitative outcome instead of a quantitative prediction. I understand that underlying there, what it decides to suggest is rolling up through a probabilistic or a quantitative amount. Yeah.

Chris Wysopal: Yeah. So we tested afterwards with our static analysis tool. So that was sort of our Oracle, right? Like, a known quantity. Now, I'm not saying it didn't create a vulnerability that we couldn't find, but that's what our... A lot of this is for our customers to get an understanding, and they're using our static analysis testing to test their code. So they understand the vulnerability density they have. And this was saying, now you can compare to what you would be getting if you were using generative AI, which they probably already are.

Justin Beals: Well, and too often we look at these things like, you can't really test it. But this is where I think is, I get frustrated from a computer scientist perspective in our work in AI tools is that we don't, we don't do enough testing around the accuracy or precision of some of these tools that we develop. And I'm just really glad to see, you know, the way you guys designed the study, you know, to, to run these types of comparisons.

I mean, the other thing that's kind of shocking is I think that a lot of us would have assumed that the amount of failure rate from that 38 to 45 % would be dramatically more than what a human would get. But I mean, one of the things I read into is that all that incoming data that we train the LLMs on is generating the same error rates. Yeah.

Chris Wysopal: Exactly. mean, that's one of the things that you start to look at. Like I looked the other day, what are LLMs mostly trained on? Number one is Reddit. Number two is Wikipedia. Now, Wikipedia is a crowd-sourced encyclopedia, but the challenge with Reddit is, yes, it is crowd-sourced, but it doesn't have the rigorous approval process, and things that Wikipedia has. I think there was like 50 % more of the training data came from Reddit than LLMs. And of course it's coming from all of the sources like Twitter and Facebook and other things too. But was interesting that Reddit was number one. Now, the parallel with training on writing code is open source, right?

There are some companies like Facebook, which has their own code bases of all their products. And of course, Google has the code bases of their products, but you know, a company like, you know, open AI, they don't have a big, big, big suite of products like those companies do. So, and they're all training on open source, right? there's just millions and millions of open source applications that are out there in, in, in, in GitHub and other open source repos.

And there's no quality control there. Anyone can write code and put it up there. No one is telling someone you need to pass this quality test. It's just out there. And I don't know if they were waiting projects that are more popular, like an Nginx web server or a particular popular package, or the Linux kernel.

Those are more highly vetted, right? Than, you know, Joe's website generator. So there's no scrutiny of that data. So it's just code. Now, one thing that we did find that was really even a more interesting finding, I think, than the level of vulnerabilities was we found that over time, the LLMs have gotten better at writing syntactically correct code. If you go back two years ago, only about 50 % of the code completion tasks actually wrote syntactically correct code, right, which is pretty bad, right? You're like, wait, it couldn't even learn to write it syntactically correct? And then if you follow that over the next year, 18 months to two years, we saw a dramatic improvement in syntactically correct code just to the point that six months ago, it hit 95 % of the code completions were syntactically correct, the LLMs from six months ago. And that kind of plateaued at 95%. But we didn't see any kind of improvement on writing secure code. So my theory is they actually are building these projects before they consume them as training data.

And they're seeing that the code is not just maybe a sample exercise that someone just wrote, that it's actually a functionally, syntactically correct program. It actually builds. And if you think about that, that's a way to get a data set that will write syntactically correct code. have better data going in. So I think the same thing could happen with secure code. You'd have to only test or you'd only train on code that has passed a certain security test, and the vulnerabilities have been fixed. Or maybe just identified and that becomes an anti-pattern. Don't write your SQL queries this way. So I think it's gonna be possible for LLMs to get better at writing more secure code. It's just that it doesn't seem like we're even close to it because we've been flat for two years.

Justin Beals: Yeah. I mean, I'm going to, I'm going to crack open another like a buzzword space a little bit when we talk about it. We've been finding solutions though, with, this technology. And I think this is where we like the agentic style systems, because we can utilize a couple of models for their strengths in sequence or series, or to inform, you know, logical next steps. And they, they start to counterbalance each, each other in the decisions in some ways. Yeah.

Chris Wysopal: Yeah, that's true. So people say, well, why can't you ask the LLM to write code, then ask the LLM to find the vulnerabilities in the code, and then ask the LLM to fix the code? my answer to that is we already have a technology that's better at finding vulnerabilities in code. It's called static analysis, right?

Justin Beals: Right.

Chris Wysopal: We've been doing it for 20 years. We've refined it, all these different languages and frameworks and all these different vulnerability classes. And today I can tell you it's better at finding vulnerabilities than an LLM is. So use that. doesn't say LLMs won't get better at some point in time. Obviously we're working on this technology and seeing if we can get it to be better, but we haven't found it to be better than the traditional deterministic algorithms that we have for doing static analysis security testing.

But one thing that we did do is we built a product called Veracode Fix, which uses an LLM to fix code. And it turns out that LLMs are really good at fixing code if you point out where the problem is and you tell them what the problem is. Because then it doesn't have to think about finding it or classifying it. It just says, you say, there's a cross-site scripting error on this line of code, you know, make this secure, make it safe. And that's a huge productivity tool for developers because developers have problems fixing code even when it's pointed out to them with the vulnerabilities, they don't know how to fix it. So training LLMs to fix code seems to be easier than training them to find vulnerabilities. But I have hope that that will improve, but I'm happy that we're fixing code better than or at least we're able to fix code because we're creating all these new vulnerabilities.

Justin Beals: Yeah, I mean, it is an interesting operation space. I think when you think about these large language models in a way, because code, you know, is, is a maybe a smaller lexicon, like the vocabulary is smaller, but it still has a powerful combinatorial effect. And, therefore, yeah, that that very quick.

Chris Wysopal: Absolutely. It gets very complicated, very fast when you think about a million lines of code. That is definitely one of the factors that makes it easier to fix than find because fixing code is really localized. There's not a huge context window. It's probably just like maybe 10,000 tokens or something, which all our OLEMs can handle that. So it's because it's usually just in a class and a method. It's not that complicated.

But finding a vulnerability could span hundreds of thousands of lines of code. So now we're potentially talking about millions of tokens. And even the largest models now are struggling with that. And nevertheless, the compute power that chugging through millions of tokens is compared to a few thousand tokens, right? It's thousands of times that. So, I'm happy we can fix and someday we'll get better at finding.

Justin Beals: Yeah, well, and I think to your point in this area is that there was a finding that 86 to 80 % of the time the models failed, especially on cross site scripting and log injection, because it is kind of an inter procedural data flow, right?

Chris Wysopal: Absolutely. I think this really mirrors the way that people write code, and that's why it failed. I think that, for one thing, things like SQL injection and cryptography, which it did better on, it only had a failure rate of in the 20%, 30 % range, which is still bad.

I don't want to bug every fifth prompt to create a vulnerability. We're still not happy with that. But it was much better than say cross-site scripting. And I think a lot of it had to do with the context being more understood. And also I think there's a lot of tutorials and blogs and literature out there about these very dangerous vulnerabilities like SQL injection.

And there's also a lot of literature out there how to write crypto properly, right? That's like a big deal to do it right or wrong. Where cross-site scripting is one of these things where it's everywhere, a web application outputs to an API or outputs to an HTTP request back to the user to render on their screen. It's like, it's everywhere, right? It's all throughout the program. That's the whole UI, and like you say, it's, there's a lot of inter-procedural pieces between where untrusted data might get into the program and where it might be sent to the user and cause a cross-site scripting vulnerability. Developers don't code defensively, right? What happens is they say, well, if I know this is untrusted data, then hopefully I'll remember to do output encoding and protect the user from this untrusted data. But in the general case where they don't know, or maybe they think it's trusted data, hey, it's just coming out of my database. I think this came from a trusted source. They don't do it. And so that's the pattern that we see in code. And that's why cross-site scripting is so prevalent. So it's learned this pattern, and it hasn't learned how to do it properly.

Justin Beals: Yeah, definitely on all my pen tests as a CTO or as an engineer, I always expected to get a couple cross-site scripting errors that I had to look through. It is very common. Yeah. And I think it does speak to kind of that the developer starting to be more context aware of the entire platform and these tools helping them in the menu ish in the minute.

Chris Wysopal: Absolutely.

Justin Beals: The other thing that I thought was really interesting is that not all languages were created equal from a LLM Java 72 % failure rate. Yeah.

Chris Wysopal : Yeah, that was an interesting one that, you know, Java is, you know, the oldest language there, you know, created in the mid nineties now it's 30 years old and it was created to be a secure language, right? But secure rent just don't have buffer overflows and memory corruption issues like compiled languages, like C and C plus plus have that. That's what they meant by secure eliminate memory corruption where we know memory corruption is like one. Right?

It's very important category, don't get me wrong, but it's probably like one-tenth of the issues out there. And that's because we hadn't found all the other issues yet. Things like SQL injection, cross-site scripting were discovered through like three or four years after Java as a language was created. So it didn't do anything to prevent them. So I think we had many, many years of bad Java code being written. Maybe 10 years from 90, 95 to 2005, till a lot of Java programmers started to say, hey, maybe I should do security testing. Maybe the language isn't protecting me from everything. And that's the thing with code is it's like, it's all still out there, right? Why would an LLM stop with the most recent version of something when it goes back for 20 years? And there's all kinds of other algorithms in there that, you know, were useful for doing something, but then the program changed and they don't do that anymore.

So a lot of, and there's been academic papers actually on this saying that one of the reasons they think that the LMS code so badly is they learned on a lot of old code. They didn't just cut off and say, I'm going to learn code from 2020 on. And we know the longer you go back, code gets more insecure because we learn of a new vulnerability classes and we learn how to do things better. And especially if you think about crypto, right? Crypto, like whole crypto algorithms have been deprecated. And you have to wonder, is the LLM learning from the comments that say, you know, we're deprecating DES here and we're switching to triple DES. You would hope that it would learn that and never write anything at DES. But it's not totally clear that has learned all the right things from old code.

Justin Beals: Yeah. And we see in, as we start to layer these kinds of features, I'm sure Berry code y'all do too, you know, utilizing neural networks or these types of data science work that's very modern, that it really continues to be a data in data out issue. And if you craft the problem you're trying to solve to the data, need to collect to solve that problem to the right style of model or machine learning that you can get a very powerful outcome. But if you just throw, you know, a qualitative output from a random LLM at something, you get a very random outcome.

Chris Wysopal: Exactly. mean, this is one of the things we did with our Veracode fixed product is we used a trusted data set of known secure code. As a company that tests code, we have a lot of code samples. We have a lot of known bad code, known good code, all reviewed by humans after being reviewed by the machine. And then using those to train what's bad, what's good. In order so we could say, we know this is bad, turn it into what's good. it learned what the different bad patterns were. And this was for just specific languages we support, specific vulnerability classes that we find. We want to be able to fix those things and then use human supervision on top of that to make sure it worked. Now this is a small sample set, right? This is like, I don't know, 10 or 15 different vulnerability classes over like 15 different languages. You can do the matrix. It's a tractable problem, right? That humans can review. Like there's no way to have a tractable problem even on, you know, asking questions about, you know, Shakespeare's works of literature, right? It's just, it'd be very difficult for humans to review that. And that's part of the problem when you have these general-purpose AIs that are just trying to do everything.

And this is why we're seeing a movement to maybe small language models that just do one thing well, right? Whether it's writing code or diagnosing cancer or treating cancer because there's a fixed body of knowledge. There are experts, there are oncologists that can review the output. And you can imagine having an LLM that knew a lot, it highly accurate around cancer, but then you say, be highly accurate about every medical problem. Now you've made a problem really big. And then say, be highly accurate about everything that people graduate with advanced degrees about. it becomes a challenge to reach these levels of accuracy. the problem with code is it's either vulnerable or it's not, right? Like if it's vulnerable, it's bad.

Justin Beals: Yeah. it's just really interesting product motion. Cause I think there's so much hype around them as like a business, like, you know, OpenAI. but in reality, I I've started to categorize it as there's these AI search companies where they're ingesting the internet and they're being better at, you know, a research partner. And then there are like us using neural network tools or LLMs to solve various specific problems.

Chris Wysopal: Yes, there's right exactly. And the same technology being used for both. for search, it's great. Like when I search on something, I always don't get the exact right answer even on the first page or two. So the bar is kind of low. But in LLM search, if it can be better than that, and I think they are, certainly saves time, it's more concise, that's a step up.

But think about replacing, you know, if you're replacing, something, which is like a doctor with an LLM, it's got to meet that bar, which is much higher. We're sort of in the middle here with, with with writing code. And I, and I think find finding out that we're writing code around the same that humans are, you know, shouldn't be a surprise.

But I did want to mention one of the sort of unintended consequences of this, is not just that we're getting the same level of vulnerabilities per line of code when the LLM does it. The LLM does it much faster. And if you think about software development processes, everyone is trying to go faster and be more efficient. That's why we use open source packages. Why write the code when I can just import the code? And the same thing with copying and pasting out of some reference.

Even if it's Stack Overflow, like what's the answer to this question? What's the best way to use this algorithm? Copy, paste, right? So it's all about copying and wherever you can. And the LLMs just fall right into that copy-paste, insert stuff into here. So we get this higher velocity, right? Where maybe we're writing code at 1.5x or 2x what we were before we used LLMs. But if we have the same vulnerability density,

That means we have more vulnerabilities to deal with per unit of time. And this is the thing I'm trying to impress upon developers. It makes it more important that you're testing quickly, and then you have a plan to remediate quickly, because otherwise you're writing a less and less secure piece of software over time.

Justin Beals: At an exponential rate compared to the prior modality. Yeah.

Chris Wysopal: Yeah, because we could wake up next year and it's like, okay, well now it's 5X, right? I'm writing code 5X is fast, but wait a minute, I got the same vulnerability density. We're hoping that AI-based fixing is one good way of keeping up with this until we can solve the problem of algorithms writing secure code.

Justin Beals: Yeah, there's always another vector. Like broadly in life, it's not just the number of logos you integrate, but how many things you can integrate with them that matters. And to your point, it's not just the percentage of vulnerabilities that get into code, it's how fast they get into that code as well.

Chris Wysopal: Exactly. And the other part of it is, and this is where vibe coding becomes really scary, is now you don't have necessarily professional developers doing this. Like, I know a lot of professional developers are, because they're trying to see the limits of this, including the guy who coined vibe coding. But there's a lot of people that are, you maybe they took it in college, but then, you know, a class, and they haven't done it since, or there even some people who've never coded before writing programs. And this is truly scary because who's going to test and fix their vulnerabilities? So I don't think Vibe coding is ready for prime time. It's good for apps that you're going to run yourself on your computer. Maybe you're writing an app to create a cool graphic or chart. I've done such a thing. Munge a lot of data, solve a problem for me, but don't put PII in there and connect it to the internet, please.

Justin Beals: Yeah, I played with one of those tools. I don't, I am not a professional coder anymore. They kicked me out and made me an executive at this point.

Chris Wysopal: They've taken away your pocket protector.

Justin Beals: Yeah. I can no longer make a commit to our Git repo. But I was playing with an idea and I, you know, I told it to build a login and went to go log in as a user and do the signup. And I realized no matter what I logged in, as I logged in, it's the same user I had logged out last time. And it was like, that's a very simple security issue. That's right.

Chris Wysopal : It functionally worked. It let you log in.

Justin Beals:Yeah. And without knowing that I had been through QA processes before as an engineer myself, I wouldn't have known all the different cases that were going to get run from a security perspective against that code I wrote. Yeah.

Chris Wysopal: Right, right. Yeah, so that's the thing is like you have to define the anti-patterns that you don't want it to do because if you're not, then you're making an assumption that it knows what those anti-patterns are, right? You know, they, usually they're logic issues, right? They're logic issues. So you have to assume it, you can't assume it knows the logic anti-patterns like, you know, don't let anyone log in if they put in the wrong login name. If you're making a banking app, don't let anyone withdraw a negative amount of money.

Justin Beals: Let me pressure test this a little bit. I'm curious about your opinion. Maybe we see essentially tools that are either automatically scanning for vulnerabilities or developing code with less vulnerabilities, being able to handle some simple use cases like making sure permission is set. But there are more complex use cases, and perhaps they're not ready for it. I'm trying to gauge where you think the needle is today, like something complex like XSS, like a cross-right scripting, that seems, you know, if you go beyond that into something like a race condition or, you know, memory failover, we're not going to be able to look for AI today at all to help guard against those particular types of issues.

Chris Wysopal: Yeah, I think, well, certainly a race condition is; there's special tools that I know have been built just for those things. I know it seems possible, but I think you really, you'd have to have a really good prompt to do it. And you'd have to make sure that it had enough examples in its training data set of real sort of race conditions and things like that to make sure that it learned those things. It's hard to imagine that it's just going to sort of absorb those concepts because it isn't necessarily sort of in the comments. Right? Look out, put a mutex here. We're worried about a race condition because if that's just missing. Yeah, I think, and the same would probably go for, is this crypto algorithm secure? you know, it gets, there's a lot of complexity that it would really have to have to understand. So, but I would be happy if it, if it, you know, it deal with the, could deal with the simple things.

Justin Beals: Yeah, I mean, that is part of where I see these tools. It's like the things that seem pedantic or like someone just, you know, in our world clicking through a spreadsheet and looking at the thing that was delivered, that seems like an easy opportunity for a machine system to run that. And I think we're unlocking some of that energy.

Chris Wysopal: Absolutely. that's where it can really help a senior developer who's really dealing more with hopefully architecture and design and high-level algorithms, doesn't have to dot all the I's and the T's, and has the LLM doing all sort of the heavy lifting.

And so I mean, it's amazing productivity tools and I just hope we can figure out how to make them better for security.

Justin Beals: You know, I'm gonna take this just slightly off script here, because I'm a little curious if you saw the MIT study around the value generated in the adoption of some of these LLMs and certainly, you know, their feedback that it didn't provide kind of the business value that they thought it would. I'm curious your thoughts about that.

Chris Wysopal: Yeah, so I saw the headline and some of the talk about it. I haven't read the report. It doesn't surprise me because I think that some of this stuff probably started a couple of years ago, to already get to the point where it's failed. So we were just kind of learning a lot of what the capabilities are. wonder if, let's say it went back two years, if we go, what's 25 to 27 going to look like. Hopefully it'll be better. But it does go to show that just sort of jumping in on some new technology doesn't necessarily solve the problem unless you're doing it right. And from a cybersecurity point of view, you're just opening yourself up to so much risk. I can tell you now that we're just emerging with the techniques now for doing penetration testing and using automated tools to say, find things like prompt injection, you know, rag data leakage, model data leakage, and the permissioning that some of these things would have, like a customer support agent, right? You want it to be able to talk to one customer and not give the information for another customer, right? So we're sort of in the early days of cybersecurity, people saying, well, here's the framework of the problems you need to look for. And these are how to look for each one of those. Like, I think the OWASP top 10 for LLMs just came out, you know, maybe six months ago. So it's still early days. So all that stuff that was built a year ago is probably hugely vulnerable.

That also scares me. But just to get back to your original point is, when technology is new, we don't always know how to use it the most effectively and best way. And it takes time, but people want to get a head start. They want to jump in. They want to rush and do it. Mean, this is more of a cybersecurity example than an effectiveness example, but we saw this with the cloud. Like how many world-readable you know, data stores did we see happen where it's like, they put their database up there and someone took the whole thing hat. Why? Because they left it world-readable. Right? Well, that seems really dumb. Right. But, you know, it's, it's a misunderstanding of where the permissions are, what they should be, who's in charge of putting it that way, who's testing to make sure they're correct. You know, all that stuff comes with time and, and, you know, sort of building those governance models around stuff. And it's just the Wild West out there right now.

Justin Beals: Yeah, I've certainly been on some calls lately where people imagined when we say, you know, we're AI or we have this AI tool in our platform, what it would do. And when we get into the brass tacks of what it actually does, I think they both, they can see real value of this is an efficiency, but it's never all the hype that the engine creates at the end of the day. Yeah.

Chris Wysopal: Right, it's not quite as exciting as, like don't lay off your customer support team yet.

Justin Beals: Hang on. Yeah Or I think what we found too, it's like there's an applicability to your, what you mentioned to a more strategic role or scaling the impact that, you know, a staff member can have. Yeah.

Chris Wysopal: Sure, and then sometimes for productivity tools, you want to have more output, not the same output. And so I find it a little strange that you see all these companies laying off developers, or maybe freezing hiring developers makes more sense, but actually getting rid of developers, I don't think that necessarily makes sense because what if you could write twice as much software with the developers you have? Would that be a competitive advantage? Or do you want to write the same amount of software that you're doing today with half as many developers? What if your competitors are doing twice as much because they can? So it's interesting to see how productivity is actually going to change the competitive landscape and sort of human labor versus the gen AI labor. And maybe that's part of what's gone into these projects failing. didn't get, they didn't get the, they may have gotten productivity, but they weren't able to eliminate the jobs.

Justin Beals : Yeah, I mean at the end of the day someone, yeah.

Chris Wysopal If that was the goal was to replace half my customer support team, you know, it's a failure if you can't, even if your customer support is better.

Justin Beals: That's gonna be hard. Yeah, yeah. mean, yeah. You find, you're still competing in a marketplace. You've got to find an opportunity to use those resources to succeed at the end of the day. Yeah, you will, the business will. Chris, I am gonna go listen to Soul Coughing today, because I haven't heard that band in quite some time, and I'm excited to catch back up with their music a little bit, yeah.

Chris Wysopal: Yeah, yeah, that first album, Ruby Vroom, had, I think it was, I forget what the name of the song was, but I know the refrain was, get back on the bus that'll take you down to Beelzebub. And it was just such a funny line that we would sing along with it.

Justin Beals: Right. I am very grateful for you spending time with me today. And of course, our listeners, Chris, and I'm just very grateful for the work you've done in computer science and helping us build great companies and great platforms with Veracode. So thanks for joining us today.

Chris Wysopal: Absolutely. Thanks so much.

About our guest

Chris Wysopal Chief Security Evangelist Veracode

Chris Wysopal is the Chief Security Evangelist at Veracode, responsible for enhancing the company’s industry presence, advocating robust security practices, and fostering customer and peer relationships. Prior to co-founding Veracode in 2006, Chris was vice president of research and development at security consultancy @stake, which was acquired by Symantec. In the 1990s, Chris was one of the original vulnerability researchers at The L0pht, a hacker think tank, where he was one of the first to publicize the risks of insecure software. He has testified to the US Congress on the subjects of government security and how vulnerabilities are discovered in software.

Chris is a highly influential and sought-after voice in the software security industry. He is the author of “The Art of Software Security Testing” and was instrumental in developing industry guidelines for responsible disclosure of software vulnerabilities. MIT Technology Review wrote in 2019 that Chris “has worked for decades to demand secure technologies from influential tech companies.”

Chris received a BS in computer and systems engineering from Rensselaer Polytechnic Institute. In his free time, Chris enjoys photography and hiking the conservation trails near his home outside Boston.

Justin BealsFounder & CEO Strike Graph

Justin Beals is a serial entrepreneur with expertise in AI, cybersecurity, and governance who is passionate about making arcane cybersecurity standards plain and simple to achieve. He founded Strike Graph in 2020 to eliminate confusion surrounding cybersecurity audit and certification processes by offering an innovative, right-sized solution at a fraction of the time and cost of traditional methods.

Now, as Strike Graph CEO, Justin drives strategic innovation within the company. Based in Seattle, he previously served as the CTO of NextStep and Koru, which won the 2018 Most Impactful Startup award from Wharton People Analytics.

Justin is a board member for the Ada Developers Academy, VALID8 Financial, and Edify Software Consulting. He is the creator of the patented Training, Tracking & Placement System and the author of “Aligning curriculum and evidencing learning effectiveness using semantic mapping of learning assets,” which was published in the International Journal of Emerging Technologies in Learning (iJet). Justin earned a BA from Fort Lewis College.

View more episodes

Keep up to date with Strike Graph.

The security landscape is ever changing. Sign up for our newsletter to make sure you stay abreast of the latest regulations and requirements.

The Platform

Features

Solutions

For industries

Frameworks

Strike Graph

FEATURED

Security Compliance: Why It’s A Business Accelerator

Thought leadership

It’s your technology and your security controls: Don’t let an auditor become your CTO

Cybersecurity compliance that is unique to your organization

Constant compliance is security theater

categories

Ebook

Learn how to get certified the smarter way.

SEARCH

Ready to see Strike Graph in action?

We look forward to helping you with your compliance needs!

We look forward to helping you with your compliance needs!

AI Coding Hype vs Reality: The 2025 AI Code Security Report with Chris Wysopal

View full transcript

About our guest

Other recent episodes

Why Security Leaders Struggle With Security Culture | Steven Sloman on SecureTalk

From Punk Rock Anarchist to Bank Security Leader: An Unlikely Journey in Threat Intelligence | SecureTalk with Joe Rossi

Inside CMMC Implementation: What November 10th Means for Defense Contractors | Secure Talk with Bob Kolasky

Keep up to date with Strike Graph.

Frameworks

Design

Operate

MEASURE

Learn more

Ready to see Strike Graph in action?