Security of AI Generated Code

The presentation explores the security implications of AI-generated code, highlighting that 48% of such code snippets contain vulnerabilities. It posits that AI-assisted coding tools produce different types of vulnerabilities based on their contextual awareness, with more robust models generating more insecure code. The analysis includes empirical studies, vulnerability detection methods, and the significance of context in improving code security.

Uploaded by

Beebo Benzedrine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Security of AI Generated Code

Uploaded by

Beebo Benzedrine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Final

Presentation
The Effect of Context on
the Security of AI
Generated Code
Table of contents

01
Background
02
Hypothesis
03
Literature
Review

04
Experimental
05
Analysis
06
Q&A
Approach
Motivation

➔ Growing reliance on AI code generation

tools
➔ Security implications remain
under-explored
➔ 48% of AI-generated code snippets
contain vulnerabilities
➔ Contextual awareness varies across tools
➔ Real-world impact on software security
Initial
Hypothesi
s
Among large language models (LLMs) specialized for
AI code generation, more robust LLMs trained on
larger, more diverse datasets generate code that is
more vulnerable, insecure, and prone to exploitation.
Revised
Hypothesi
s
AI-assisted coding tools (such as GPT-4 and Github
Copilot) produce systematically different types of
security vulnerabilities when generating or completing
code with limited contextual awareness—such as
lacking access to full project scope, dependencies, or
security constraints—versus with broader contextual
integration.
Literature
Review
Empirical Studies on AI Code Security
Benchmarking-Related Studies
Security Attacks on AI Code Tools
Using AI Code Tools for Offensive Security
Empirical Studies

Vulnerability
Decreased caution Prompt variation
types
AI assistance increased 29.8% of Copilot snippets Minor variations in prompts
security vulnerabilities by contained vulnerabilities could significantly impact
20% across 38 CWE categories whether Copilot generated
secure or vulnerable code
Security Attacks & Offensive Uses
➔ AI code tools vulnerable to jailbreaking, prompt injection, and
adversarial inputs
➔ INSEC framework: simple comment-based attacks succeeded in
generating insecure code
➔ Debug mode enabled lowered security controls (Elgedawy et al.,
2024)
➔ Neural Machine Translation models improved with contextual
information for exploit generation
➔ Copilot outperformed CodeWhisperer in security-relevant code
generation
Benchmarking Studies
➔ SecurityEval dataset: 68% of InCoder and 74% of Copilot code
contained vulnerabilities
➔ Automated tools detected significantly fewer vulnerabilities than
manual inspection
➔ CodeLMSec benchmark: LLMs can generate secure code, but
effectiveness depends on context inclusion
➔ Wang et al. (2024): Security context dramatically improved
performance (Claude 3 Opus: 13.83 → 39.89)
➔ DeVAIC tool: 54% of AI-generated Python code contained
vulnerabilities
04
Approach
Process

Step 1 Step 6
Narrow down AI code Compare performance
assistance tools across tools

Analyze types &

Step 2 Step 5
Distinguish varying
frequencies of
context levels
vulnerabilities

Step 3 Step 4
Generate prompts for Test code generation
each context level across tool
Context Levels

File level
Function level Project level

Includes configuration,
Minimal description and dependencies, and
function signature implicit security
Includes preceding code requirements
and relevant comments
Testing & Prompt Design
➔ Security-relevant coding tasks based on CWE categories
➔ Common vulnerability patterns:
◆ Weak randomization (CWE-330)
◆ OS command injection (CWE-78)
◆ SQL injection (CWE-89)
◆ Hard-coded credentials (CWE-798)
◆ Improper input validation (CWE-20)
➔ Natural language prompts or code comments
➔ No explicit security instructions to mimic real-world usage
Vulnerability Detection and
Categorization

Classiﬁcatio
SAST Manual review Incidence
n
SonarQube Validate findings Leverage CWE Tracking multiple
CodeQL and identify logical taxonomy vulnerabilities per
DeVAIC for Python issues snippet
Analysis Metrics
% of vulnerable Vulnerability density
Distribution of CWE
code snippets per (vulnerabilities per
categories
tool and context snippet/100 LOC)
level

Statistical
Benchmark against
Severity metrics comparisons
prior studies
(0-3 scale based on (z-test/chi-square)
(30-40% baseline
tool ratings) between tools and
for function-level)
contexts
Comparing tools

Vulnerability Identify patterns

rates

Vulnerability Cross-tool
types vulnerability
transfer
Signiﬁcance

Context-Security Relationship Providing Guidance

Identification of vulnerability Developers using AI coding tools
patterns specific to context levels Tool designers improving security features
Tool-specific security profiles Security teams developing detection strategies
Thank
you