0% found this document useful (0 votes)
41 views

Lecture 4 - Source Code Analysis

The document discusses techniques for program comprehension through static code analysis, including manual and automated approaches. It covers parsing source code into abstract syntax trees and symbol tables, as well as challenges with grammar availability and irregular code structures. Program slicing is introduced as a technique to extract relevant portions of source code based on variables and program points of interest. Different types of slicing, including static backward slicing, are explained through examples. Data and control dependence graphs are shown as representations of dependencies used in slicing.

Uploaded by

berlin.asd123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Lecture 4 - Source Code Analysis

The document discusses techniques for program comprehension through static code analysis, including manual and automated approaches. It covers parsing source code into abstract syntax trees and symbol tables, as well as challenges with grammar availability and irregular code structures. Program slicing is introduced as a technique to extract relevant portions of source code based on variables and program points of interest. Different types of slicing, including static backward slicing, are explained through examples. Data and control dependence graphs are shown as representations of dependencies used in slicing.

Uploaded by

berlin.asd123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

SOEN 6431

SOFTWARE
MAINTENAN
CE AND
Program
Comprehension
DR. JUERGEN RILLING

Week 3
Source Code Analysis
Overview
Comprehension - we looked so far at the
general idea behind comprehension –
including cognitive models/mental models

What we are looking now at are technique(s)


that can be used to support program
comprehension (code cognition).

02/22/2024 COMP 354 2


Static Code
Analysis
MANUAL ANALYSIS

02/22/2024 SOEN 6431 3


Automated Code
Analysis
• Extract source code models from system
artefacts
• Query/manipulate to infer new knowledge
• Present different views on results

02/22/2024 COMP 354 4


Derive information from
system artifacts

• variable usage, call graphs, file


dependencies, database access, …

Source Challenges
Model
Extraction • Accurate & complete results
• Flexible: easy to write and adapt
• Robust: deal with irregularities in input

02/22/2024 5
Parsing of artifacts
• Syntactical analysis
– generate / hand-code / reuse parser

• Lexical analysis
– tools like perl, grep, Awk or LSME,
MultiLex
– generally easier to develop
Break program down into its smallest
meaningful symbols (tokens, atoms)
Tools for this include lex, flex
Scanning/ Tokens include e.g.:
Lexical analysis ◦ “Reserved words”: do if float while
◦ Special characters: ( { , + - = ! /
◦ Names & numbers: myValue 3.07e02

Start symbol table with new symbols found


Parsing

Construct a A pattern- If no yacc, bison


parse tree matching pattern are tools for
from problem matches, this
symbols it’s a syntax (generate c
error code that
parses
specified
language)
 Language grammar defined by set of rules that identify legal
(meaningful) combinations of symbols
 Each application of a rule results in a node in the parse tree
 Parser applies these rules repeatedly to the program until leaves of
parse tree are “atoms”
Output of parsing

Top-down description of program


syntax
Parse tree • Root node is entire program

Constructed by repeated application of


rules in Context Free Grammar (CFG)

Leaves are tokens that were identified


during lexical analysis
Example:
Parsing rules for Pascal
These are like the following:

program PROGRAM identifier (identifier,more_identifiers) ;


block

block variables BEGIN statement more_statements END

statement do_statement | if_statement | assignment | …

if_statement IF logical_expression THEN statement ELSE …


Pascal code
example
program gcd (input, output)
var i, j : integer
begin
read (i , j)
while i <> j do
if i>j then i := i – j;
else j := j – i ;
writeln (i);
end .
Discovery of meaning in a program
using the symbol table
• Do static semantics check
• Simplify the structure of the parse tree ( from
parse tree to abstract syntax tree (AST) )

Static semantics check


Semantic
analysis • Making sure identifiers are declared before
use
• Type checking for assignments and operators
• Checking types and number of parameters to
subroutines
• Making sure functions contain return
statements
• Making sure there are no repeats among
switch statement labels
Example: parse tree
Example: AST
i+++++i; Grammar Challenges

Parsing challenges
Syntax Errors

Language Dialects

Local Idioms

Missing Parts

Embedded Languages

Preprocessing

Additional problem: grammar availability


• process languages without grammar (e.g., undisclosed proprietary languages)
• development of full grammar is expensive

15
Graph formalism is widely used. Example of graph formalism:
 Abstraction of the source-level domain model
 Entity-types: a subset of entity-types in source-code
 Relation-type: an aggregation of one or more relation-types in source-code

16
Automated parsing and
creating source models are
cool, but…

what and how do we analyze


now extracted source models

02/22/2024 SOEN 6431 18


Is there maybe
something we can
learn from other
domains?

02/22/2024 SOEN 6431 19


20
Solution:

Let’s slice
(a pizza) !!

SOEN 6431 21
22
23
What about a program ?
Too
large

02/22/2024 COMP 354 24


Solution:

Slicing
SOEN 6431 25
26
Program Debugging: that’s how slicing was discovered!

Testing: reduce cost of regression testing after


modifications (only run those tests that needed)

Parallelization: Split program so that it can be executed


on several processors, machines, etc.
Why Integration: merging two programs A and B that both
Program resulted from modifications to BASE
Slicing? Reverse Engineering: comprehending the design by
abstracting out of the source code the design decisions

Software Maintenance: changing source code without


unwanted side effects

Software Quality Assurance: validate interactions


between safety-critical components

27
General Idea of
Slicing
Given:
(1) A program
(2) A variable v at some point P in the
program

Goal:
Finding the part of the program that is responsible
for the computation of variable v at point P.

28
A simple example
29
02/22/2024 COMP 354 30
Basic Idea
31
32
Types of Slicing (Executable)
33
Static Static Backward Program Slicing was original
introduced by Weiser in 1982. A static program slice
consists of these parts of a program P that potentially
Backwar could affect the value of a variable v at a point of
interest.

d
Program
Slicing

34
Static Slicing
◦ Statically available information only
◦ No assumptions made on input
◦ Computed slice can never be accurate (minimal
Slicing slice)
Properties: ◦ Problem is undecidable – reduction to the
halting problem
◦ Current static methods can only compute
approximations
◦ Result may not be usefull
Data
Dependencies

36
main( ) main( )
{ {
2 sum = 0; 2 sum = 0;
3 i = 1; 3 i = 1;
4 sum = sum + 1; 4 sum = sum + 1;
5 ++ i; 5 ++ i;
6 cout<< sum; 6 cout<< sum;
7 cout<< i; 7 cout<< i;
} }
An Example Program & its slice w.r.t. <7, i>

37
38
39
Creating a PDG

1 input (n,a); Data Dependence:


2 max := a[1];
3 min := a[1]; Represents a data flow (definition-use chain).
4 i := 2;
5 s:= 0;
6 while i  n do => Data dependence between 2 and 7 but
begin not between 2 and 8.
7 if max <
a[i] then
begin
8 Control Dependence:
max := a[i];
9
The execution of a node depends on the outcome of a
s := max; predicate node.
end;
10 if min > => Control dependence between node 6 and 8, but not
a[i] then between 6 and 15.
begin
11
min := a[i];
12
s := min;
end; 40
13 output (s);
41
Another example
for a loop

02/22/2024 42
PDG of the Example Program
43

Control Dep. Edge

Data Dep. Edge


1

3 4 5 6 12
11

Slice Point

8 9

43
Static Backward slicing example

44
Program Dependence Graph (PDG)
A Program dependence graph is formed by combining data and control dependencies
between
45 nodes.
1 input (n,a);
2 max := a[1];
3 min := a[1];
4 i := 2;
5 s:= 0;
6 while i  n do
begin
7 if max <
a[i] then
begin
8
max := a[i];
9
s := max;
end;
10 if min > 16
a[i] then
begin
11
min := a[i]; Data Dependency
12
s := min; Control Dependency
end;
13 output Any problems within this PDG? 45
“Controversial” statements:

1. Static forward slicing will always provide a meaningful reduction

2. Can you think about any challenges for static slicing

02/22/2024 COMP 354 46


Forward
Slice (static)

Note: It is not necessarily


value preserving - meaning
the value for the variable in
the Slice might not be the
same as in the original
program.

47
Slicing – Forward Static

Objective: what parts of a program


are affected by a modification to the
the variable specified in the slicing
criterion.

48
Slicing – Forward Static

49
Slicing –
Forward
Static

50
Slicing –
Forward
Static

51
Controversial
statement:

Forward slicing provides more meaningful


insights compared to backward slicing?

Justify your answer

52

You might also like