Tse2022 - Code Cloning in Smart Contracts On The Ethereum Platform - An Extended Replication Study
Tse2022 - Code Cloning in Smart Contracts On The Ethereum Platform - An Extended Replication Study
X, MONTH 20XX 1
Abstract—Smart contracts are programs deployed on blockchains that run upon meeting predetermined conditions. Once deployed,
smart contracts are immutable, thus, defects in the deployed code cannot be fixed. As a consequence, software engineering
anti-patterns, such as code cloning, pose a threat to code quality and security if unnoticed before deployment. In this paper, we report
on the cloning practices of the Ethereum blockchain platform by analyzing 33,073 smart contracts amounting to over 4MLOC. Prior
work reported an unusually high 79.2% of code clones in Ethereum smart contracts. We replicate this study at the conceptual level, i.e.,
we answer the same research questions by employing different methods. In particular, we analyze clones at the granularity of functions
instead of code files, thereby providing a more fine-grained estimate of the clone ratio. Furthermore, we analyze more complex clone
types, allowing for a richer analysis of cloning cases. To achieve this finer granularity of cloning analysis, we rely on the NiCad clone
detection tool and extend it with support for Solidity, the programming language of the Ethereum platform. Our analysis shows that
most findings of the original study hold at the finer granularity of our study as well; but also sheds light on some differences, and
contributes new findings. Most notably, we report a 30.13% overall clone ratio, out of which 27.03% are exact duplicates. Our findings
motivate improving the reuse mechanisms of Solidity, and in a broader context, of programming languages used for the development of
smart contracts. Tool builders and language engineers can use this paper in the design and development of such reuse mechanisms.
Business stakeholders can use this paper to better assess the security risks and technical outlooks of blockchain platforms.
© 2022 IEEE. Author pre-print copy. The final publication is available online at: https://dx.doi.org/10.1109/TSE.2022.3207428.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 2
Prior work by Kondo et al. [13], reported an unusually Clone granularity can be either free or fixed. Free granular-
high 79.2% proportion of code clones on the Ethereum plat- ity clone detection considers the source code as a whole and
form. Our work is an extended conceptual replication of [13], does not make use of syntactic boundaries, such as func-
that is, we (i) pose the same research questions; but (ii) use tions, blocks, or statements [10]. Fixed granularity, however,
different methods to answer them; and by that, (iii) refine incorporates such syntactic units. As such, fixed granularity
and extend the findings of the original study.2 provides a more precise estimate of clone ratio, and is more
Specifically, in our study, we analyze code cloning prac- useful than free granularity in the eventual refactoring of the
tices at the level of function blocks, as opposed to the file- duplicated code [19]. Furthermore, clone detectors of free
level analysis of the original study. To achieve this finer granularity produce a higher number of false positives [11],
granularity of cloning analysis, we opt for the NiCad clone [20], which are code fragments that have been cloned with a
detection tool [14] instead of Deckard [15] which was purpose, such as getter/setter methods in Java code. In this
used in the original study, and extend it to support Solid- paper, we use a fixed granularity at the function level.
ity, the programming language of the Ethereum platform. Syntactic clones are identified based on textual program
NiCad has been frequently used for clone detection tasks code, while the identification of semantic clones requires
in conventional software systems. It has been thoroughly an analysis of the behavior of the units of code [21]. In
analyzed and benchmarked in previous studies to identify this paper, we focus on syntactic clones, which are further
optimal configuration settings for detecting clones [16]. We divided into three types. Type-1 clone fragments are exactly
also extend the scope of potential clone types to better identical except for variations in whitespaces, layout, and
identify near-miss (Type-3) clones, which can detect clones comments. For example, Listings 1 and 2 would be Type-
with modifications such as changed, added, or removed 1 clones of each other, were their respective source code on
statements [17]. We assess the ratio of clones in the code Lines 5 and 7 identical. Type-2 clone fragments include Type-
base by removing clone duplicates, i.e., clones that have 1 clones, but allow for differences in identifiers, literals, and
been identified multiple times as instances of different clone data types. For example, Listings 1 and 2 would be Type-2
types. This allows for a better understanding of the types of clones of each other, were the respective assigned values on
cloning-related issues in Solidity smart contracts [18]. This Lines 5 and 7 identical. Type-3 clone fragments include Type-
step is explained in detail in Section 5.2.1. 2 clones, but allow code fragments to differ in complete lines
To the best of our knowledge, this paper is the first of code, thereby capturing clones with entire lines added or
to explore cloning in Solidity smart contracts at this finer removed. The number of lines to be tolerated is defined by
granularity and with an awareness of these types of clones. the dissimilarity threshold, in ratio with the overall code block.
Results. We corroborate many findings of Kondo et In our experiments, we set the dissimilarity threshold to 0.3,
al. [13], but observe some important differences as well. which classifies clones as Type-3 if at least 70% of the nor-
Most importantly, we observe that the clone ratio decreases malized subsequences match. Accordingly, Listings 1 and
from 79.2% to 30.13% at the finer level of granularity of 2 are Type-3 clones. They differ on two out of twenty lines,
2
functions. Moreover, we observe that the vast majority of i.e., a 20 = 0.1 dissimilarity or 90% similarity, which exceeds
clones (90%) are Type-1 clones (i.e., exact replicas). This the threshold of 70%. Identifiers in Type-2 and Type-3 clones
90% proportion among the clone types tends to be steady are normalized by performing a renaming strategy. The two
over an extended period of time, while the total num- most common renaming strategies are blind renaming, where
ber of clones increases; i.e., smart contract development all identifiers are replaced with the same key; and consistent
practices heavily rely on copy-and-paste mechanisms. Tool renaming, where identifiers are given a unique key. For
builders and language engineers can use these results to example, the line int sum = 0 is changed to x x = 0 by
improve reuse mechanisms in smart contract programming blind renaming, and to x1 x2 = 0 by consistent renaming.
languages, including, but not limited to Solidity. Business Line 5 in Listings 1 and 2 is changed to x = "MT" and x =
stakeholders can use this paper to better assess the security "NEM", respectively by both blind and consistent renaming.
risks and technical outlooks of blockchain platforms. Were the variables named differently, e.g., symbol = "MT"
Fostering replication. The replication package containing in Listing 1 and sym = "NEM" in Listing 2, blind renaming
the data and analysis scripts of our study are publicly would still change them to x = "MT" and x = "NEM";
available for the independent verification or replication.3 however, consistent renaming would change them to x1 =
"MT" and x2 = "NEM".
In this paper, we identify Type-1, Type-2, and Type-3
2 BACKGROUND clones. The latter two types are further refined into blindly
Clone detection aims to identify repeated code. Clones are and consistently renamed clones (Type-2b, Type-2c; and
identified based on a similarity relation between their two Type-3b, Type-3c; respectively). Semantic clones (Type-4) are
respective code fragments. A clone fragment is a sequence beyond the scope of this paper.
of contiguous lines of code that is similar to another, non-
overlapping sequence of contiguous lines of code. Clones Smart contracts are programs that can be reliably executed
with similar properties form a clone pair, and when there are by a network of anonymous distributed nodes without the
many similar clones, they form a clone class (also referred to need for a centralized trusted authority. The collection of
as clone group or clone cluster) [16]. these nodes forms a distributed computing platform called
a blockchain [22], upon which smart contracts are executed.
2
In the remainder of this paper, we refer to [13] as the original study. The name blockchain reflects the fact that transactions (i.e.,
3
https://zenodo.org/record/6975351 actions initiated by an externally-owned account, such as
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 3
Listing 1: The MT.sol smart contract. Listing 2: The NEM.sol smart contract.
1 contract MT is ERC20Interface, SafeMath { 1 contract NEM is ERC20Interface, SafeMath {
2 ... 2 ...
3 constructor(string memory _name) public { 3 constructor(string memory _name) public {
4 name = _name; 4 name = _name;
5 symbol = "MT"; 5 symbol = "NEM";
6 decimals = 18; 6 decimals = 18;
7 totalSupply = 500000000000000000000000000; 7 totalSupply = 860000000000000000000000000;
8 balanceOf[msg.sender] = totalSupply; 8 balanceOf[msg.sender] = totalSupply;
9 } 9 }
10 10
11 function transfer(address _to, uint256 _value)public 11 function transfer(address _to, uint256 _value)public
returns (bool success) { returns (bool success) {
12 require(_to != address(0)); 12 require(_to != address(0));
13 require(balanceOf[msg.sender] >= _value); 13 require(balanceOf[msg.sender] >= _value);
14 require(balanceOf[ _to] + _value >= balanceOf[ _to]); 14 require(balanceOf[ _to] + _value >= balanceOf[ _to]);
15 balanceOf[msg.sender] =SafeMath.safeSub(balanceOf[msg. 15 balanceOf[msg.sender] =SafeMath.safeSub(balanceOf[msg.
sender],_value) sender],_value)
16 balanceOf[_to] =SafeMath.safeAdd(balanceOf[_to],_value) 16 balanceOf[_to] =SafeMath.safeAdd(balanceOf[_to],_value)
17 emit Transfer(msg.sender, _to, _value); 17 emit Transfer(msg.sender, _to, _value);
18 return true; 18 return true;
19 } 19 }
20 ... 20 ...
21 } 21 }
a human) within this network are stored in a chain of im- RQ2. What are the characteristics of clusters of similar verified
mutable blocks. One commonly used platform is Ethereum contracts?
[12]. Solidity is an object-oriented and statically-typed pro- The original study reports on three inferred characteristics:
gramming language designed for developing smart con- (i) category, (ii) activity concentration, and (iii) authorship.
tracts, influenced by C++, Python, and ECMAScript. In particular: (i) 9 out of the top-10 largest clusters are token
Listings 1 and 2 show code snippets from the MonPay- managers; (ii) transaction activity tends to be concentrated
Token (MT.sol)4 and NEM token5 smart contracts written on a few contracts; and (iii) contracts in a cluster tend to be
in Solidity. Both smart contracts create a custom token that created by many authors.
can be treated as a virtual currency. The listings show that
RQ3. How frequently code blocks of verified contracts are identi-
the contracts are identical apart from their symbols and the
cal to those from OpenZeppelin?
total supply of tokens. Both of the smart contracts use the
SafeMath contract and ERC-20 interface6 to implement the About one-third of all 165,005 code blocks extracted from
Token functionality. There are 20 instances of the same smart verified contracts are identical to OpenZeppelin code blocks.
contract being repeated with small changes in our corpus. 36.3% of the verified contracts include at least one code
Such repetitions pose a threat to the platform, as vulnerabil- block that is identical to an OpenZeppelin code block. 50%
ities in any of these base smart contracts would potentially of the code blocks from 26.3% of the verified contracts are
affect a large number of smart contracts in production. identical to OpenZeppelin code blocks. The ERC-20 Open-
Zeppelin category is the most frequently reused category,
containing code blocks to support the implementation of
3 S UMMARY OF THE ORIGINAL STUDY token contracts that comply with the ERC-20 standard.
Kondo et al. [13] report (i) the amount of cloned Solidity SafeMath.sol is the most frequently reused OpenZeppelin
smart contracts on the Ethereum platform; (ii) the charac- code file, containing functions that perform mathematical
teristics of clones; and (iii) the overlaps of clones with code operations efficiently and safely.
blocks of smart contract libraries (e.g., OpenZeppelin). The
authors analyzed 33,073 smart contracts amounting to about 3.2 Approach
4 MLOC, and 13 releases of OpenZeppelin7 to answer three
Clone granularity and detection tool. Deckard [15], a free
research questions.
granularity clone detector was used to detect clones be-
tween Solidity code files.
3.1 Research questions and major findings
The research questions and key observations from the orig- Clone types considered. Type-1 and Type-2 clones were
inal study are the following. considered as part of RQ1.
RQ1. How frequently are verified contracts cloned?
Corpus. The corpus consists of 4,004,543 lines of code,
79.2% of the studied contracts are clones. In particular:
extracted from 33,073 verified smart contracts. The files
16.7% of the studied contracts are Type-1 clones; 43.3% of
were retrieved from Etherscan8 in July 2018. Etherscan is
the studied contracts are Type-2 clones. Type-3 clones were
an analytics platform for the Ethereum blockchain that
considered out of the scope due to their detection still being
analyzes each block on Ethereum and provides insights on
actively researched.
each deployed contract. The existence of source code on
4
etherscan.io/token/0xa0b469450e78b3a85d828d454696f8e4bd420038 Etherscan indicates that the source code in Solidity provided
5
etherscan.io/token/0xc14db8e15690c28752dbda133f51821402d29f29 by Etherscan matches the bytecode deployed to Ethereum,
6
https://eips.ethereum.org/EIPS/eip-20
7 8
https://github.com/OpenZeppelin/openzeppelin-contracts https://etherscan.io
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 4
and therefore, it is considered verified. Thus, the corpus study (Deckard [15]), we rely on the NiCad clone detec-
contains only verified contracts. Verified smart contracts tion tool [14]. NiCad does not support Solidity out-of-the-
publish their flattened version on Etherscan. This flattened box. Therefore, we contribute a custom Solidity grammar,10
version of the source code is referred to as the code file of a which makes our analysis and other future work possible.
verified contract. No restriction on the transaction number
on the contracts was imposed. The corpus was compared Clone types considered. In addition to the Type-1 and
with 13 releases of OpenZeppelin in RQ3, released between Type-2 clones that the original study reports on, we also
2016-11-24 and 2018-08-10, with continuous growth in the include Type-3 clones in our scope. Furthermore, to refine
size of 1–5 KLOC over time. our reporting, we (i) split Type-2 and Type-3 clones into
subtypes based on the renaming strategy that has been
applied in the specific clone detecting case; and (ii) provide
4 S TUDY DESIGN
a systematic process to remove duplicated clones.
In this section, we discuss the design of our replication
study, following the guidelines of Carver [23].
5 E XPERIMENTAL SETUP
4.1 Type of replication In this section, we present our experimental setup. As
shown in Figure 1, our study is composed of three phases.
We have carried out a conceptual replication study [24]. That
is, we test the same research questions on the same corpus,
but use different measures and techniques. 5.1 Tool configuration and clone detection
In this phase, we select the clone detector for our study and
4.2 Motivation for replication configure it (Section 5.1.1), develop a grammar to support
clone detection in Solidity smart contracts (Section 5.1.2),
Our work is motivated by the high clone ratio in Solidity
carry out the clone detection (Section 5.1.3), and download
smart contracts reported by the original study being sig-
the releases of OpenZeppelin to be analyzed (Section 5.1.4).
nificantly higher than clone ratios in traditional software
systems. The figures are suggestive of systemic issues in 5.1.1 Tool selection and configuration
the design and methodology of engineering Solidity smart
We set out to select a clone detection tool that was (i)
contracts. Other work [25], [26] confirms this unusually high
freely available and (ii) customizable. While there exists a
rate of clones. Such unusual figures have to be verified
long list of freely available clone detection tools [16], we
by independent studies, especially because (i) the cost of
found NiCad being easily customizable for our purposes.
performing a transaction or executing a smart contract is
NiCad is a text-based clone detection tool that was primarily
proportional to its size,9 and thus, minimizing the size of
designed to detect near-miss clones. It has been widely used
smart contracts can result in direct cost reduction; and (ii)
for clone detection studies, thanks to its high precision and
the majority of smart contracts are deployed in financial
high recall for detecting near-miss clones [16], [28]. Follow-
applications, and thus, vulnerabilities might have serious
ing the suggestions of Wang et al. [29] and the settings used
financial repercussions [27]. Furthermore, the approach of
by Hasanain et al. [3], we set the granularity threshold to
the original study is often prone to false positives due to the
10 LOC and the dissimilarity threshold for Type-3 clones to
free clone granularity it relies on (see Section 2). Therefore,
0.3. These are also the default settings of NiCad.
we set out to replicate the analyses of the original study
using a fixed granularity at the function level. We conjecture
5.1.2 Grammar development
that this new viewpoint from which cloning can be observed
also enhances the applicability of the results in refactoring To conduct our experiment, we extended NiCad with a
processes aiming to eliminate duplicated code. grammar to enable the parsing of Solidity source code.
Our grammar10 is inspired by the grammar for Solidity
available in ANTLR.11 In order to extract a parse tree,
4.3 Level of interaction with the original researchers NiCad expects a context-free grammar for the source-code
Ours is an external replication, i.e., the original researchers language to be provided in a TXL grammar format [30]. TXL
were not involved in the replication [23]. The interaction is a programming language for rule-based transformations.
with the original researchers was restricted to inquiring The TXL grammar not only provides the correct input for
about the study’s data and receiving the data package along parsing, but also provides special markers, such as indent,
with technical pointers regarding its structure. extent, and newlines for pretty-printing the source code.
Duplicate
Clone detection
removal
Tool selection & Grammar Data pre-
Analysis Comparison Reporting
configuration development processing
OpenZeppelin Metadata
code analysis extraction
irrelevant blocks, performs normalizations, and transforms 5.2 Metadata extraction and preprocessing
the parse tree back to source code. We developed pretty- In this phase, we preprocess the clone detection results by
printers for the grammar to ensure that all functions are removing duplicated clones (Section 5.2.1), extracting meta-
evaluated consistently. The basic rules of pretty-printing are data (Section 5.2.2), and preparing the data (Section 5.2.3)
the following: (i) function signatures appear on a single for the subsequent analysis.
line; (ii) block parentheses follow the ECMAScript standard;
and (iii) every complete statement appears on its own line. 5.2.1 Duplicate removal
A block of at least ten lines of normalized source code is
Due to the overlapping definition of clone types, some
considered for regular clones because according to previous
clones might belong to multiple clone classes conforming to
studies, this is the best threshold value for the NiCad tool to
different types [18]. For example, if two code fragments are
detect code clones from Java and C source code [29]. Most
identical, they will also be identical after a blind renaming
studies of clone detection consider code clones of less than
procedure is performed on them. Consequently, the class
five LOC to be false positives [20] or micro-clones [31], [32],
of Type-2 clone instances that have been obtained by blind
[33] – an entirely different type of copy-pasted artifact.
renaming, will contain fragments that are also within the
class of Type-1 clone instances. (See Section 2.) We refer to
Flexible pretty-printing and normalization. In addition this implied containment relationship between clone classes
to pretty-printing, NiCad is capable of context-sensitive as the strictness of a clone class. Class Ct of clone instances
normalizations, i.e., normalization based on the context of of type t is stricter than Ct0 if each clone that belongs to Ct
the code fragment. For this initial exploration, we use the de- also belongs to Ct0 . It is directly implied by this definition,
fault normalization settings of NiCad. NiCad detects clone that Ct ⊆ Ct0 . We construct the classes of our approach
pairs in this step by performing a line-wise comparison of based on (i) the type of contained clone instances, and the
the normalized code snippets. renaming procedure. For simplicity, we refer to these classes
by their type and by appending c (consistent) or b (blind) to
Clone clustering. Finally, NiCad conducts a basic cluster the name, depending on the renaming procedure that was
analysis of the clones identified to combine similar clone applied while extracting the clones. Equation 1 defines the
fragments into the same clone cluster. Clones in the same relations between the resulting clone classes.
cluster belong to the same clone class.
Type-1 ⊆ Type-2c ⊆ Type-2b ⊆ Type-3 ⊆ Type-3c ⊆ Type-3b (1)
Corpus. We use the corpus of the original study,12 described We use this hierarchy to remove clone duplicates. The
in detail in Section 3.2. The corpus contains 33,073 verified process iterates through the sets from the strictest to the
smart contracts, amounting to 4,004,543 lines of code. By weakest, and excludes every clone present in the current
using verified contracts deployed to Ethereum, we can be set from the sets that are weaker than the current set. That
sure that the corpus is representative of code in production. is, first, we exclude every Type-1 clone from classes Type-
2c, Type-2b, etc; then, we exclude every Type-2c clone from
classes Type-2b, Type-3, etc; and as the last step, we exclude
5.1.4 OpenZeppelin code analysis
every Type-3c clone from class Type-3b.
We download the contracts of the twelve releases of Open-
Zeppelin that were analyzed by Kondo et al. [13]. We use 5.2.2 Metadata extraction
NiCad to extract contract and function blocks from the For the 33,073 verified smart contracts in the corpus, we
corpus as well as from the OpenZeppelin releases. As ex- have collected additional metainformation from the Ether-
plained in Section 5.1.3, the extraction of contracts by NiCad scan8 analytics platform. We collect two types of informa-
normalizes the source code within each code-block. Then, tion: Creation dates to answer the clone evolution aspect of
we calculate unique hashes for every code block extracted RQ1, and Author information to answer the authorship aspect
from OpenZeppelin releases. We will compare these hashes of RQ2. Both information is extracted from the transaction
with the hashes extracted from the corpus. log of contracts. In about 3.5% of contracts, the creation date
was not available from Etherscan. Those cases are excluded
12
https://github.com/SAILResearch/suppmaterial-18-masanari- from the analyses. We also calculate the length of files in this
smart contract cloning phase for further analysis.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 6
Cumulative % of clones
100.0
To allow for fast analysis in the subsequent phase, we 90.0
take care of the computation-intensive tasks of (i) merging 80.0
71.9
metainformation with the data obtained from the clone 60.0
analysis, and (ii) preprocessing the merged data in various 50.0
ways. For example, we calculate quarterly figures for RQ1 40.0
30.0
and calculate Gini-coefficients for RQ2. The preprocessing 20.0
scripts are available from the replication package.3 10.0
0.0
02 10 20 30 40 50 60 70 80 90 100
5.3 Analysis and reporting Cumulative % of clusters
In this phase, we analyze the data (Section 5.3.1) and carry
out the comparison with the original study (Section 5.3.2). Figure 2: Relationship between the proportions of clones
Finally, we report our findings (Section 5.3.3). and contracts with two characteristic values highlighted.
10 sell 97 Token
30 type-3 11 purchase 81 Token
25 other 12 decreaseApproval 78 Token
20 13 mint 72 Token
14 claimTokens 69 Token
150 15 69 Helper
10 15/3 23 16/3 16/4 17/1 17/2 17/3 17/4 18/1 18/2
15/4 16/1 16/2 16
finalize
refund 64 Token
5 10 7 8 6 7 6
17 deploy 61 Token
5 3 5 5 18 tokensOfOwner 57 Token
0
15/3 15/4 16/1 16/2 16/3 16/4 17/1 17/2 17/3 17/4 18/1 18/2 19 callback 54 Oracle
Quarter 20 investInternal 46 Token
(b) Number of newly created non-Type-1 clones.
Figure 3: Evolution of clone numbers and percentages. authorization contracts. The function of the code clones in
this category is to evaluate the authorization of the caller.
might be feasible. Furthermore, the analysis of clone clusters With any transaction, one needs to check whether the parties
suggests that there are hotspots of cloned source code that invoking the transactions are permitted to do so.
should be the primary targets of refactoring. Refactorings Oracle. Some smart contracts require data from outside
related to inheritance—such as class and method extraction, the scope of the blockchain, e.g., to determine the latest
method pull up and push down—could be of particular posted exchange rate for the US dollar. For this purpose,
utility. While inheritance is a supported language feature a collection of special smart contracts, called Oracles have
in Solidity, it is apparently underutilized, as evidenced by been created for the Ethereum platform. Oracle smart con-
the high proportion of clones despite the immutability of the tracts provide hooks to the outside world, which allows
deployed code and demonstrated in Listings 1-2. This might an external service to update the state of the Oracle. This
indicate a need for better tool assistance in recognizing allows Oracles to act as stable interfaces between the outside
abstraction/inheritance opportunities. world and other smart contracts. In practice, a non-Oracle
smart contract queries the Oracle smart contract, instead of
6.2 RQ2: What are the characteristics of clusters of querying an external service. On the other hand, an external
similar verified contracts? service sends a transaction to the Oracle when an update to
6.2.1 Approach the data encapsulated by the Oracle is requested.
Helper functions. This category includes functions
We extract the function identifier within each clone fragment
that serve as wrappers around existing functions. Smart
using a custom regular expression. For Type-1 and Type-
contract-specific functions, such as initialize and migrate also
3 clone clusters with no renaming, the function identifiers
belong to this category. The category can be considered as a
are the same for each clone fragment within a clone cluster.
collection of functions not categorized elsewhere.
Thus, there is one function identifier per clone cluster. For
the rest of the clone types, the unique function identifiers are
6.2.2 Findings
extracted from all clone fragments within a clone cluster.
Bartoletti and Pompianu [1] studied 811 smart contracts
written in Solidity and categorized them by the design Cloned functionality. Table 2 lists the 20 functions of the
patterns that they apply. We use the same categorization code base that contain the most clones. 17 of the 20 contracts
applied at the function level. In addition, we add a new are Token-related, i.e., they provide functionality for the
category of Helper functions, which broadly includes all management and provision of contacts, such as buy, sell,
functions not categorized elsewhere. Below, we describe the withdraw, refund, etc. These functions have the same intent
three categories that are most relevant to our study. as the transfer and transferFrom functions. Another group
Token. The code clones in these patterns are used for the of functions, including createTokens, mint, and deploy, are all
distribution of tokens or fungible goods to users. Token is an variants of a mechanism for increasing the supply of Tokens
abstract concept that can represent anything that is count- available for a smart contract.
able and transferable, e.g., shares in a company, outcomes The second most frequent category are Helpers. Crowdsale
of an event, etc. DigixGold13 is an instance of the Token is a common Helper function that is used to set the initial
pattern, which tracks the ownership of a fixed amount of conditions for carrying out a crowdsale operation. Crowd-
gold by using Tokens. A subset of token managers are sale is a method of flash sale where a number of tokens are
allocated to be sold within a time window. The presence
13
https://digix.global/#/ of this function is not unexpected, as a large number of
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 8
1.0 80
1.0 1.0
0.8 0.8 0.9 70
0.8
Gini-coefficient
0.6 0.6 60
0.7
Number of contracts
0.4 0.4 0.6 50
Entropy
0.2 0.2 0.5 40
0.4
0.0 0.0 30
type-1 type-2c type-3 type-3b Overall 0.3
0.2 20
(a) Gini-coefficient of clone clusters (with at least 10 clones).
0.1 10
100 100 0.0
80 80 0.00579 0.2 0.4 0.6 0.8 1
Rank percentage
(b) Relative rank of the most active contract. Authorship. Contracts in a clone cluster tend to be created
Figure 4: Clone clusters (with at least 10 clones) and the by many authors. We measure this observation by the nor-
activity of the related contracts. malized Shannon-entropy [38] within a cluster. Maximum
entropy (1.0) is measured for distributions with elements of
uniform probability, i.e., in clone clusters with contracts that
smart contracts are written to conduct Initial Coin Offerings have equal transactions. The less uniform the probabilities
for raising capital for projects. Similarly, finalize is another of elements in a distribution, the lower the entropy. To ob-
Helper function, with the purpose of terminating the crowd- tain meaningful results, we once again investigate clusters
sale initiated by the previous function. with at least ten clones. As Figure 5 shows, the median
One of the top 20 categories is the Oracle functionality, entropy in our sample is 0.7, while the median normalized
specifically, the callback function. As the name suggests, cluster size is close to zero (0.058). This means the average
the purpose of this function is to serve as a callback function cluster is relatively small compared to the largest clusters
to be invoked when an external query is completed. In our while showing high entropy, i.e., a large variance in the
example, the most common calls are made to the Oracle authors. The darker area in the bottom-left corner shows
smart contract. As explained in Section 6.2.1, an Oracle that the majority of clone clusters have high entropy.
serves as a doorway between the blockchain and the ex-
ternal world. Therefore, the purpose of queries to Oracles is 6.2.3 Discussion
to access external data and resources. Out of the functionality that is subject to frequent cloning,
token management contracts, including authorization, pose
Activity. Following the orig- Table 3: Gini-coefficients. the most pressing issue. A detailed look at the cloned func-
inal study, we measured ac- tions reveal that basic transaction functions such as transfer
tivity in terms of the num- Clone type Proportion and createTokens are among most frequently cloned. Provid-
ber of transactions that are Type-1 0.86 ing a library of secure transfer primitives could simplify
related to contracts. First, we Type-2c 0.73 the development of such functionality. From a language
observe that activity tends Type-3 0.86 design point of view, declarative and verifiable language
Type-3b 0.77
to be concentrated on a few constructs have been identified as potential enablers to a
contracts. We use the Gini- Overall 0.86 more secure design of smart contracts [39]. The benefits
coefficient [37] as the measure of such techniques have been demonstrated in blockchain
of inequality among transactions related to clone clusters. A languages, such as Pact and Liquidity.
Gini-coefficient of 0 indicates no inequality among values, The relatively high Gini-coefficients suggest that activ-
while a value of 1 indicates maximal inequality. To obtain ity within a cluster tends to focus on a small number of
meaningful results, we investigate clusters with at least ten contracts. The overall Gini-coefficient of 0.86 is roughly
clones. As Table 3 and Figure 4a show, the overall Gini- equivalent to a cluster of ten contracts with nine contracts
coefficient of the clone clusters is 0.86. Second, the medians having one transaction, and one contract having 250 trans-
in Figure 4b show that in 50% of the cases, the most active actions. Vulnerabilities in such frequently used contracts
contract was created before 66.7% of contracts in the same are more likely to be identified by malicious attackers. The
clone cluster. (The proportion above the median line in the effects of such vulnerabilities, in turn, are amplified by code
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 9
cloning as the same vulnerabilities can be anticipated in the could reduce the number of clones, and improve the main-
contracts of the same clone cluster. tainability of the overall code base. This, in turn, could
The high entropy in authorship suggests that cloning is improve the extra-functional properties of Ethereum, such
a widespread phenomenon on Ethereum. Such community- as security, reliability, and integrity. The functionality cloned
wide bad practices are often addressed by guidelines pub- from OpenZeppelin tends to concentrate on transfer-related
lished by community leaders, such as the Python Enhance- functionality, and mostly from the StandardToken contract.
ment Proposal (PEP) 8 style guidelines for Python [40].
However, such general rules cannot be enforced in a 7 C OMPARISON WITH THE ORIGINAL STUDY
computer-automated fashion, and a better solution could
be establishing community-specific DevOps processes that In this section, we provide an overview of how the findings
include the usage of quality gates enforced by code quality in Section 6 align with the results of the original study of
tools that evaluate contracts that are ready to be deployed. Kondo et al. [13]. The mapping between the two studies
is shown in Table 5. For the sake of compactness, we have
presented our results in slightly different groups of findings.
6.3 RQ3: How frequently are code blocks of verified Below we give a detailed explanation.
contracts identical to those from OpenZeppelin?
6.3.1 Approach 7.1 RQ1
To answer the research question, we identify the code blocks We have observed the most important difference between
present in OpenZeppelin releases that are also present in our study and the original study while analyzing RQ1.
the corpus. We do so by (i) extracting code blocks from
OpenZeppelin, (ii) calculating their hashes (as explained Clone ratio. The overall proportion of clones that we
in Section 5.1.4), and (iii) comparing those hashes with the detect (30.13%) is considerably smaller than the proportion
hashes calculated for the code blocks of the corpus. observed in the original study (79.2%). This difference is
due to three factors. First, our analysis is performed at the
6.3.2 Findings function-level, which is a finer granularity and provides a
Table 4 shows the 10 most commonly cloned functions from larger sample of code units that are subject to cloning. Sec-
OpenZeppelin, along with their category, the respective ond, we count every identified clone once, as explained in
number of clones, and the proportion of these clones in the Section 5.1.3. Third, due to the normalization, the clones that
overall set of OpenZeppelin (OZ) clones. we identify are mainly exact copies, further reducing the
number of instances of less strict clone clusters. However,
Clone proportion. Of all verified contracts, 21.79% have as a common treatment in near-miss clone detectors, we
functions identical to those of OpenZeppelin. As seen in normalize the text of the function blocks using standard
Table 4, the three most cloned functions encompass 73% of ECMA formatting, as explained in Section 5.1, reducing
all clones from OpenZeppelin. the number of false-negatives, and consequently, potentially
increasing the number of identified clones.
Functionality. Most functions have been defined in the We corroborate the high ratio of Type-1 clones but ob-
StandardToken OpenZeppelin contract. Six of the ten most serve a much larger proportion of Type-1 clones among all
cloned functions (Table 4) and fourteen of the hundred most clones. Our experiments show 89.7% of all clones are of
cloned functions belong to this category. Other frequently Type-1, as opposed to the 21.1% (16.7% overall) reported
encountered categories are SafeMath (10) and VestedToken by the original study. This can be explained by removing
(8). The most frequently cloned functionality is related to Type-1 clones from Type-2 and Type-3 clusters.
transfers, accounting for over 50% of cloned functionality. The substantial difference between our results and the
original study shows that while 79.2% of contract files might
6.3.3 Discussion be affected by cloning practices, it is typically only a subset
OpenZeppelin serves as a frequent source of code cloning of the encoded functionality that is actually cloned.
on the Ethereum blockchain platform. The high volume of
cloning from OpenZeppelin suggests that mechanisms for Clone clusters. The original study found that 20% of
reusing functionality from libraries such as OpenZeppelin clusters encompass 68% of clones. These figures are nearly
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 10
Table 5: Mapping the findings of the current study to the authors of cloned functions. Therefore, identifying develop-
observations of the original study. ers who are responsible for clones can be achieved either at
the function or the contract level with similar results, with
Finding: Observations:
current study original study
Comparison potentially different runtime performance.
RQ1
Clone ratio Observations 1, 4, 5 Refined – different results 7.3 RQ3
Clone clusters Observation 2 Corroborated
Clone evolution Observation 3 Corroborated & refined
We observed relevant differences both in the detected clone
RQ2 proportion and the cloned functionality. These differences
Cloned functionality Observations 6, 7 Corroborated – minor diff are due to the finer granularity of our analysis.
Activity Observations 8, 9 Corroborated – minor diff
Authorship Observations 10, 11 Corroborated
RQ3 Clone proportion. The original study reported that 36.3% of
Clone proportion Observations 12, 13 Refined – different results verified contracts have at least one code block identical to an
Functionality Observation 14 Refined – different results OpenZeppelin code block. Our finer-grained results show
that this proportion decreases to 21.79% when analyzing
code similarity at the level of functions with at least 10
identical to those that we observe: 20% of all clusters en- LOC; increases to 47.21% when analyzing code similarity
compass 71.9% of all clones; and half of the clones can be at the level of functions with at least 5 LOC; and increases
found in just 2.07% of clusters. We conclude that our results to 64.59% when not considering a minimum function length.
at a finer level of granularity corroborate the findings of the These proportions are on par with, and in some cases
original study at a coarser level of granularity. exceed the 6–50% cloning rate reported from traditional
engineering domains [2], [3], suggesting security risks.
Clone evolution. Since the original study also observed an
increasing trend, we conclude that our results corroborate Functionality. The original study reported that ERC20 is the
the findings of the original study. However, we point out most frequently cloned category from OpenZeppelin, and
that different types of clones evolve at different paces. that ERC20 is more frequently cloned than its concrete im-
Specifically, the amount of newly created Type-1 clones is plementation, StandardToken. However, our finer-grained
the predominant factor behind the increasing trend. analysis shows that the StandardToken implementation of
ERC20 is more frequently cloned than ERC20. This is not
7.2 RQ2 unexpected because ERC20 is an interface and as such, it
We observed minor differences in RQ2 in terms of the cloned only defines function signatures but no bodies. While ERC20
functionality (due to the different levels of granularity of the might be the most cloned contract block, it is the concrete
two studies), and the activity of cloned contracts. implementations of ERC20 that that contribute the most
cloned function blocks.
Cloned functionality. The original study reports that nine
of the ten most populous clusters are related to Token 8 R ELATED WORK
management. Our finer-grained results also show that nine In this section, we briefly review the related work.
of the top ten clusters are indeed Token management func-
tions. Moreover, 17 of the top 20 are Token management
functions. Unlike the original study, we find that the other 8.1 Empirical studies on smart contracts
top clusters were Helper and Oracle functions rather than Bartoletti and Pompianu [1] conducted a study to analyze
Token Lockers. The Token Locker category of the original top blockchain platforms and their usages. They analyzed
study covers three specific functionalities: lock(), lockOver() 834 smart contracts written for the Ethereum and Bitcoin
and release(). At the finer level of granularity of functions, technologies, and grouped the contracts by application do-
however, these functionalities prove to be less frequently main and design patterns that were applied. We use the
cloned than at the contract level. same design patterns as the basis for assigning commonly
cloned functions into different categories.
Activity. The original study also observes that transactions Durieux et al. [41] studied nine automated analysis tools
tend to be concentrated on a few contracts, and reports for Solidity. Automated analysis tools can aid developers
an overall Gini-coefficient of 0.817. We enhance the prior in meeting required functional and extra-functional quali-
observations by adding that Type-2c clones show a lower tative measures, resulting in better performing, safer and
Gini-coefficient (0.73). We report slightly different figures more reliable code. The authors conclude that state-of-the-
regarding the relative creation date of the most active con- art analysis tools fall short in detecting numerous classes of
tracts. In 50% of cases, the top-active contract of a cluster vulnerabilities, identifying only 40% of the vulnerabilities
was found created before 74.7% of other contracts in the in a testing corpus, and produce a large number of false
original cluster by the original study, and 63.4% by our finer- positives. These results are corroborated by Ghaleb et al. [42]
grained study. However, this difference is minor. who investigated the effectiveness of static analysis tools for
Solidity smart contracts using bug injection. These results
Authorship. We observe numbers that are almost identical provide evidence that code clones are hotspots of software
to the ones reported by the original study. This means the issues because they facilitate the spread of faulty code, code
authors of cloned smart contract files are the same as the smells, and anti-patterns. The ineffectiveness of analysis
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 11
Table 6: Studies on clone detection in Solidity contracts. Table 7: Studies on cloning in conventional software.
broader definition of the categories, there is always room Unfortunately, the lack of coordination between develop-
for interpretation when conducting the classification. To ers renders such efforts particularly challenging. Thus, we
address this potential threat, we made the list of categorized anticipate automated audit mechanisms to appear in the
functions available to public scrutiny.10 integration (pre-deployment) phase of blockchain DevOps
External validity. Our study has sampled only veri- processes [51]. Solutions such as the NiCad-based tool pre-
fied smart contracts deployed on the Ethereum platform, sented in this paper could serve as a machinery to generate
a subset of all smart contracts deployed on the platform. refactoring recommendations to reduce the clone ratio in the
Thus, there are no guarantees on the safe generalization code to be deployed, thereby improving the overall code
of our findings to all smart contracts written in Solidity. quality of the platform. Furthermore, we foresee the emer-
The same reasoning applies to generalizing our findings gence of quality control as a service, provided by platform
to other blockchain platforms. However, the goal of these agents in exchange for compensation that is proportional to
experiments was not to provide a general theory for all their computation investment.
Ethereum smart contracts, but to extract initial and high- Future work should focus on extending the scope of
level insights from existing smart contracts in order to raise the current study to smart contract programming languages
awareness about the highly vulnerable state of systems re- of other platforms, such as Script, the language of Bitcoin.
lying on immutable code. We are still reasonably confident, Opportunities in adapting traditional software engineering
that many of our insights translate well to other platforms lifecycle models to the particularities of smart contract de-
relying on immutable source code. An external threat to velopment should be considered as well.
validity w.r.t. the original study we could not mitigate is
the number of transactions used in the analysis of RQ2, as
explained in Section 6.2.2. R EFERENCES
Limitations. Due to the limited parsing support for Solid-
[1] M. Bartoletti and L. Pompianu, “An empirical analysis of smart
ity (especially compared to that for mainstream languages, contracts: platforms, applications, and design patterns,” in Inter-
such as Java and C++), we have developed a custom parser national conf. on financial cryptography and data security. Springer,
using the TXL grammar [30]. Since this is the first version 2017, pp. 494–509.
[2] B. Laguë et al., “Assessing the Benefits of Incorporating Function
of the parser, bugs and other shortcomings are possible. Clone Detection in a Development Process,” in International Con-
Although we have not experienced such issues during our ference on Software Maintenance. IEEE, 1997, pp. 314–321.
experiments, we have made the parser available to public [3] W. Hasanain et al., “An analysis of complex industrial test code us-
scrutiny10 . Nevertheless, as a sign of maturity, the Solidity ing clone analysis,” in International Conference on Software Quality,
Reliability and Security. IEEE, 2018, pp. 482–489.
parsers and normalizers developed for our experiments [4] C. J. Kapser and M. W. Godfrey, ““Cloning considered harmful”
have become part of NiCad starting with its v6.2 release. considered harmful: patterns of cloning in software,” Empirical
Software Engineering, vol. 13, no. 6, pp. 645–692, 2008.
[5] C. K. Roy et al., “The vision of software clone management:
10 C ONCLUSION Past, present, and future,” in 2014 Software Evolution Week - IEEE
Conference on Software Maintenance, Reengineering, and Reverse En-
In this paper, we reported the results of our study on gineering. IEEE, 2014, pp. 18–33.
source code cloning practices on the Ethereum blockchain [6] R. Koschke, “Frontiers of software clone management,” in 2008
Frontiers of Software Maintenance. IEEE, 2008, pp. 119–128.
platform. By analyzing 33,073 Solidity smart contracts, we [7] D. Chatterji et al., “Effects of cloned code on software maintain-
found that 30.13% of the source code is cloned. Our work ability: A replicated developer study,” in Working Conference on
is an extended conceptual replication of the study of Kondo Reverse Engineering. IEEE, 2013, pp. 112–121.
et al. [13] who reported a substantially higher clone ratio [8] E. Jürgens, F. Deissenboeck, and B. Hummel, “Code similarities
beyond copy & paste,” in European Conference on Software Mainte-
of 79.2%. The main difference between the two studies is nance and Reengineering. IEEE, 2010, pp. 78–87.
the level of granularity clones are analyzed at. Our analysis [9] N. Tsantalis, D. Mazinanian, and G. P. Krishnan, “Assessing
was carried out at the level of functions, while the original the refactorability of software clones,” IEEE Trans. Software Eng.,
analysis was carried out at the level of whole source files. vol. 41, no. 11, pp. 1055–1090, 2015.
[10] C. Roy and J. Cordy, “A survey on software clone detection
To achieve this finer granularity of cloning analysis, we research,” Ontario, Canada, Tech. Rep. 2007-541, 2007.
extended the NiCad clone detection tool to support Solid- [11] B. van Bladel and S. Demeyer, “Clone Detection in Test Code:
ity, the programming language of the Ethereum platform. An Empirical Evaluation,” in International Conference on Software
Analysis, Evolution and Reengineering. IEEE, 2020, pp. 492–500.
Our study reports a lower boundary of the clones on the
[12] C. Dannen, Introducing Ethereum and solidity. Springer, 2017.
Blockchain platform. This lower boundary is still on par [13] M. Kondo et al., “Code cloning in smart contracts: a case study
with the 6–50% rate of cloning reported from traditional on verified contracts from the Ethereum blockchain platform,”
software engineering domains [2], [3], suggesting potential Empirical Software Engineering, vol. 25, no. 6, pp. 4617–4675, 2020.
[14] C. K. Roy and J. R. Cordy, “NICAD: accurate detection of near-
risks of reduced security, reliability, and performance of the miss intentional clones using flexible pretty-printing and code
overall software system. normalization,” in Int. Conf. on Program Comprehension. IEEE,
An important takeaway of our study is that these prob- 2008, pp. 172–181.
lems could be effectively addressed by refactoring. The [15] L. Jiang et al., “DECKARD: scalable and accurate tree-based
detection of code clones,” in International Conference on Software
majority, about 90% of clones are of Type-1, i.e., exact Engineering. IEEE, 2007, pp. 96–105.
replicas, and such clones have been shown to be easier to [16] C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evalu-
refactor [9]. Moreover, as shown by our cluster analysis, ation of code clone detection techniques and tools: A qualitative
approach,” Sci. Comput. Program., vol. 74, no. 7, pp. 470–495, 2009.
cloned functions tend to form hotspots in the source code:
[17] R. K. Saha et al., “Understanding the evolution of type-3 clones: an
half of the clones can be found in just about 2% of clusters. exploratory study,” in Proceedings of the 10th Working Conference on
Such clusters should be the prime candidates for refactoring. Mining Software Repositories. IEEE, 2013, pp. 139–148.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. X, MONTH 20XX 13
[18] W. O. A. Hasanain, “Analysis and maintainability of complex [45] M. Scherer, “Performance and scalability of blockchain networks
industry test code using clone detection,” Ph.D. dissertation, Car- and smart contracts,” Master’s thesis, Umea Uni., Sweden, 2017.
leton University, 2020. [46] R. Belchior, A. Vasconcelos, S. Guerreiro, and M. Correia, “A
[19] Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, “Gemini: Main- survey on blockchain interoperability: Past, present, and future
tenance Support Environment Based on Code Clone Analysis,” in trends,” ACM Comput. Surv., vol. 54, no. 8, pp. 168:1–168:41, 2022.
International Software Metrics Symposium. IEEE, 2002, pp. 67–76. [47] P. Treleaven, R. G. Brown, and D. Yang, “Blockchain technology in
[20] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Com- finance,” Computer, vol. 50, no. 9, pp. 14–17, 2017.
parison and Evaluation of Clone Detection Tools,” IEEE Trans. [48] C. Alexopoulos et al., “Benefits and Obstacles of Blockchain Ap-
Softw. Eng., vol. 33, no. 9, p. 577–591, 2007. plications in e-Government,” in Hawaii International Conference on
[21] A. Kumar et al., “A systematic review of semantic clone detection System Sciences. ScholarSpace, 2019, pp. 1–10.
techniques in software systems,” IOP Conference Series: Materials [49] C. C. Agbo et al., “Blockchain technology in healthcare: A system-
Science and Engineering, vol. 1022, p. 11, 2021. atic review,” Healthcare, no. 2, 2019.
[22] M. Swan, Blockchain: Blueprint for a new economy. O’Reilly, 2015. [50] E. Tüzün and E. Er, “A case study on applying clone technology
[23] J. C. Carver, “Towards reporting guidelines for experimental repli- to an industrial application framework,” 2012 6th International
cations: A proposal,” in 1st International Workshop on Replication in Workshop on Software Clones, IWSC 2012 - Proceedings, 06 2012.
Empirical Software Engineering Research, vol. 1, 2010, pp. 1–4. [51] M. Wöhrer and U. Zdun, “Devops for ethereum blockchain smart
[24] A. R. Dennis and J. S. Valacich, “A replication manifesto,” AIS contracts,” in 2021 IEEE Intl. Conference on Blockchain, Blockchain
Transactions on Replication Research, vol. 1, no. 1, p. 1, 2015. 2021, Melbourne, Australia, 2021. IEEE, 2021, pp. 244–251.
[25] Z. Gao et al., “SmartEmbed: A Tool for Clone and Bug Detection
in Smart Contracts through Structural Code Embedding,” in Int.
Conference on Software Maintenance and Evolution. IEEE, 2019, pp.
394–397.
[26] H. Liu et al., “Enabling clone detection for ethereum via smart con-
Faizan Khan is a software engineer at Plotly
tract birthmarks,” in Proceedings of the 27th International Conference
working on data-visualization libraries. He com-
on Program Comprehension. IEEE / ACM, 2019, pp. 105–115.
pleted his Masters at the Department of the
[27] M. I. Mehar et al., “Understanding a Revolutionary and Flawed
Electrical and Computer Engineering at McGill
Grand Experiment in Blockchain: The DAO Attack,” J. Cases Inf.
University. His research interests include pro-
Technol., vol. 21, no. 1, pp. 19–32, 2019.
gramming languages and program synthesis.
[28] C. K. Roy and J. R. Cordy, “Towards a mutation-based automatic
framework for evaluating code clone detection tools,” in Canadian
Conf. on Comp. Science & Software Eng., ser. ACM International
Conference Proceeding Series, vol. 290. ACM, 2008, pp. 137–140.
[29] T. Wang et al., “Searching for better configurations: a rigorous
approach to clone evaluation,” in European Software Engineering
Conference. ACM, 2013, pp. 455–465.
[30] J. R. Cordy, C. D. Halpern-Hamu, and E. Promislow, “TXL: A rapid
prototyping system for programming language dialects,” Comput.
Lang., vol. 16, no. 1, pp. 97–107, 1991. Istvan David is a postdoctoral researcher at the
[31] M. Beller, A. Zaidman, and A. N. Karpov, “The last line effect,” in University of Montreal, Canada. He received his
Proceedings of the 2015 IEEE 23rd International Conference on Program PhD in Computer Science from the University of
Comprehension. IEEE, 2015, pp. 240–243. Antwerp, Belgium. His research interests include
[32] R. van Tonder and C. L. Goues, “Defending against the attack of model-driven engineering of complex heteroge-
the micro-clones,” in 24th IEEE International Conference on Program neous systems, and software quality improve-
Comprehension. IEEE, 2016, pp. 1–4. ment through automation. He is active outside
[33] M. Mondal, C. K. Roy, and K. A. Schneider, “Micro-clones in of academia as well, especially in innovation
evolving software,” in 25th International Conference on Software consulting. Contact: https://istvandavid.com.
Analysis, Evolution and Reengineering. IEEE, 2018, pp. 50–60.
[34] D. S. Cruzes and T. Dybå, “Research synthesis in software en-
gineering: A tertiary study,” Information and Software Technology,
vol. 53, no. 5, pp. 440–455, 2011.
[35] I. D. Baxter et al., “Clone detection using abstract syntax trees,” in
Int. Conference on Software Maintenance. IEEE, 1998, pp. 368–377.
[36] S. Palladino, “The parity wallet hack explained,” OpenZeppelin, Daniel Varro is a full professor at McGill Univer-
2017. sity. He serves on the editorial board of Software
[37] R. Dorfman, “A formula for the gini coefficient,” The review of and Systems Modeling and Journal of Object
economics and statistics, pp. 146–149, 1979. Technology periodicals, and served as a pro-
[38] C. E. Shannon, “A mathematical theory of communication,” The gram co-chair of MODELS 2021, SLE 2016,
Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948. ICMT 2014, FASE 2013 conferences. He is a
[39] R. M. Parizi, Amritraj, and A. Dehghantanha, “Smart contract co-founder of the VIATRA open source soft-
programming languages on blockchains: An empirical evaluation ware framework as well as IncQuery Labs, a
of usability and security,” in Blockchain - ICBC 2018 - First Inter- technology-intensive company.
national Conference, ser. Lecture Notes in Computer Science, vol.
10974. Springer, 2018, pp. 75–91.
[40] G. Van Rossum, B. Warsaw, and N. Coghlan, “Pep 8: style guide
for python code,” Python. org, vol. 1565, 2001.
[41] T. Durieux, J. F. Ferreira, R. Abreu, and P. Cruz, “Empirical review
of automated analysis tools on 47,587 Ethereum smart contracts,” Shane Mcintosh is an associate professor at
in Int. Conference on Software Engineering. ACM, 2020, pp. 530–541. the University of Waterloo. Previously, he was
[42] A. Ghaleb et al., “How effective are smart contract analysis tools? an assistant professor at McGill University. He
Evaluating smart contract static analysis tools using bug injec- received his Ph.D. from Queen’s University. In
tion,” in Int. Symp. on Software Testing and Analysis. ACM, 2020, his research, Shane uses empirical methods to
pp. 415–427. study software build systems, release engineer-
[43] X. Li et al., “A survey on the security of blockchain systems,” ing, and software quality: http://shanemcintosh.
Future Gener. Comput. Syst., vol. 107, pp. 841–853, 2020. org/.
[44] S. Rouhani and R. Deters, “Security, performance, and applications
of smart contracts: A systematic survey,” IEEE Access, vol. 7, p. 20,
2019.