0% found this document useful (0 votes)
35 views

Using Large-Scale Heterogeneous Graph Representation Learning For Code Review Recommendations at Microsoft

This document summarizes a research paper about using graph representation learning on code review data from Microsoft to recommend reviewers. The paper proposes an approach called CORAL that leverages a large heterogeneous graph of work artifacts and relationships between developers to identify reviewers with relevant expertise, even if they have not previously reviewed similar files. CORAL aims to address limitations of existing recommendation systems that primarily rely on historical file change and review information to suggest reviewers. It posits these approaches may miss reviewers with needed expertise who have not interacted with the files before.

Uploaded by

Asif Haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Using Large-Scale Heterogeneous Graph Representation Learning For Code Review Recommendations at Microsoft

This document summarizes a research paper about using graph representation learning on code review data from Microsoft to recommend reviewers. The paper proposes an approach called CORAL that leverages a large heterogeneous graph of work artifacts and relationships between developers to identify reviewers with relevant expertise, even if they have not previously reviewed similar files. CORAL aims to address limitations of existing recommendation systems that primarily rely on historical file change and review information to suggest reviewers. It posits these approaches may miss reviewers with needed expertise who have not interacted with the files before.

Uploaded by

Asif Haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) | 979-8-3503-0037-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSE-SEIP58684.2023.00020

Using Large-scale Heterogeneous Graph


Representation Learning for Code Review
Recommendations at Microsoft
Jiyang Zhang* Chandra Maddila* Ram Bairi
The University of Texas at Austin Microsoft Research Microsoft Research
[email protected] [email protected] [email protected]

Christian Bird Ujjwal Raizada Apoorva Agrawal


Microsoft Research Microsoft Research Microsoft Research
[email protected] [email protected] [email protected]

Yamini Jhawar Kim Herzig Arie van Deursen


Microsoft Research Microsoft Delft University of Technology
[email protected] [email protected] [email protected]

Abstract—Code review is an integral part of any mature industrial and open source development [1], [2], [3] and all
software development process, and identifying the best reviewer code hosting systems support it. Code reviews facilitate knowl-
for a code change is a well-accepted problem within the software edge transfer, help to identify potential issues in code, and
engineering community. Selecting a reviewer who lacks expertise
and understanding can slow development or result in more promote discussion of alternative solutions [4]. Modern code
defects. To date, most reviewer recommendation systems rely review is characterized by asynchronous review of changes
primarily on historical file change and review information; those to the software system, facilitated by automated tools and
who changed or reviewed a file in the past are the best positioned infrastructure [4] .
to review in the future. As code review inherently requires expertise and prior
We posit that while these approaches are able to identify
and suggest qualified reviewers, they may be blind to reviewers
knowledge, many studies have noted the importance of
who have the needed expertise and have simply never interacted identifying the “right” reviewers, which can lead to faster
with the changed files before. Fortunately, at Microsoft, we have turnaround, more useful feedback, and ultimately higher code
a wealth of work artifacts across many repositories that can quality [5], [6]. Selecting the wrong reviewer slows down
yield valuable information about our developers. To address the development at best and can lead to post-deployment issues.
aforementioned problem, we present C ORAL, a novel approach to
reviewer recommendation that leverages a socio-technical graph
In response to this finding, a vibrant line of code reviewer
built from the rich set of entities (developers, repositories, files, recommendation research has emerged, to great success [7],
pull requests (PRs), work items, etc.) and their relationships in [8], [9], [10], [11], [12], [13], [14]. Some of these have, in
modern source code management systems. We employ a graph fact, even been put into practice in industry [15].
convolutional neural network on this graph and train it on two All reviewer recommender approaches that we are aware
and a half years of history on 332 repositories within Microsoft.
of rely on historical information of changes and reviews. The
We show that C ORAL is able to model the manual history
of reviewer selection remarkably well. Further, based on an principle underlying these is that best reviewers of a change
extensive user study, we demonstrate that this approach identifies are those who have previously authored or reviewed the files
relevant and qualified reviewers who traditional reviewer recom- involved in the review. While recommenders that leverage this
menders miss, and that these developers desire to be included idea have proven to be valid and successful, we posit that
in the review process. Finally, we find that “classical” reviewer
they may be blind to qualified reviewers who may have never
recommendation systems perform better on smaller (in terms
of developers) software projects while C ORAL excels on larger interacted with these files in the past, especially as the number
projects, suggesting that there is “no one model to rule them all.” of developers in a project grows.
We note that there is a wealth of additional recorded
I. I NTRODUCTION information in software repositories that can be leveraged to
Code review (also known as pull request review) has be- improve reviewer recommendation and address this weakness.
come an integral process in software development, both in Specifically, we assert that incorporating information around
interactions between code contributors as well as the semantics
* Work performed while at Microsoft Research; Equal contribution of code changes and their descriptions can help identify

2832-7659/23/$31.00 ©2023 IEEE 162


DOI 10.1109/ICSE-SEIP58684.2023.00020
Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
the best reviewers. As one intuitive example, if a set of C ORAL improves on the state of the art deployed approaches
existing pull requests (PRs) are determined to be semantically on a broad scale of historical reviews and also conduct a user
similar to a new incoming pull request, then reviewers who study based running our system on real-time PRs.
contributed meaningfully to the former may likely be good
candidates to review the latter, even if the reviews do not II. R ELATED WORK
share common files or components. To leverage this idea,
we construct a socio-technical graph on top of the repository There have been many approaches to the code reviewer
information, comprising files, authors, reviewers, pull requests, recommendation problem. We survey a broad set of studies
and work items, along with the relationships that connect them. and approaches here and refer the reader to the work of
Prior work has shown that code review is a social process in Çetin et al. [19] for a more comprehensive survey of existing
addition to a technical one [16], [17]. As such, our primary work.
belief is that this heterogeneous graph captures both and can be The first reviewer recommendation system we are aware
used to address various software engineering tasks, with code of was introduced by Balachandran et al. [20]. They used
reviewer recommendation being the first that we address. authorship of the changed lines in a code review (using git
Learning on such a graph poses a challenge. Fortunately, the blame) to identify who had worked on that code before
area of machine learning has advanced by leaps and bounds and suggested a ranked list of this set as potential review-
in the past years since reviewer recommendation became a ers. Lipcak and Rossi [7] performed a large scale (293,000
recognized important research problem. Neural approaches pull requests) study of reviewer recommendation systems.
give us tools to deal with this relational information found in They found that no single recommender works best for all
software repositories and make inferences about who is best projects, further supporting our assertion that there is no “one
able to review a change [18]. recommender to rule them all.” Thongtanunam et al. [14]
Based on these observations and ideas, we introduce proposed R EV F INDER, a reviewer recommender based on file
C ORAL, a novel approach for identifying the best reviewers locations. R EV F INDER is able to recommend reviewers for
for a code change. We train a graph convolutional neural new files based on reviews of files that have similar paths in
network on this socio-technical graph and use it to recommend the filesystem. The approach was evaluated on over 40,000
reviewers for future pull requests. Compared to existing state code reviews across three OSS projects, and recalls a correct
of the art, this approach works quite well. reviewer in the top 10 recommendations 79% of the time on
To test our hypotheses, we build a socio-technical graph average. Sülün et al. [13] construct an artifact graph similar to
of the entities and their relationships in 332 software projects our socio-technical graph and recommend potential reviewers
over a two and a half year period at Microsoft. We show based on paths through this graph from people to the artifact
that a neural network trained on this graph is able to model under review. Lee et al. [12] build a graph of developers and
review history surprisingly well. We perform a large scale user files with edges indicating a developer committed to a file
study of C ORAL by contacting those potential reviewers rec- or one file is close to another file in the Java namespace tree.
ommended by our neural approach that the “classical” baseline They use a random walk approach on this graph to recommend
(in production) approach did not identify because they did not reviewers.
previously interact with the files in the pull requests. Their Yu et al. [10], [11] recommend reviewers for a pull request
responses reveal that there is a large population of developers by examining other pull requests whose terms have high
who not only are qualified to contribute in these code reviews, textual similarity (cosine similarity in a term vector space), the
but that desire to be involved as well. We also investigate in comment network of other developers who have commented on
what contexts C ORAL works best and find it performs better the author’s pull requests in the past, and prior social interac-
than the baseline in large (in terms of developers) software tions of developers with the author on GitHub. Jiang et al. [9]
projects, but the baseline excels in small projects, indicating examine the impact of various attributes of a pull request
that there is “no one model to rule them all.” Finally, through on a reviewer recommender, including file similarity, PR text
an ablation study of C ORAL, we demonstrate that while both similarity, social relations, and “activeness” and time differ-
files and their natural language text in the graph are important, ence. They find that adding measures of activeness to prior
there is a tremendous performance boost when used together. models increases performance considerably. Ouni et al. [8]
We make the following contributions in this paper: used a genetic search-based approach to find the optimal set
1. We present a general socio-technical graph based on of reviewers based on their expertise on the files involved and
the entities and interactions in modern source code repository their previous collaboration with the change author. Zanjani et
systems. al. [21] train a model of expertise based on author interactions
2. We introduce C ORAL, a novel code reviewer recom- with files and a time decay to provide ranked lists of potential
mendation approach that leverages graph convolutional neural reviewers for a given change set. Rahman et al. [22] propose
networks on the socio-technical repository graph. CORRECT an approach to recommend reviewers based on
3. We evaluate our approach through retrospective analyses, their history across all of GitHub as well as their experience
a large scale user study, and an ablation study to show that certain specialized technologies associated with a pull request.

163

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 1: C ORAL architecture

Dougan et al. [23] investigated the problem of ground truth


in reviewer recommendation systems. They point out that
many tools are trained and evaluated on historical code reviews
and rely on an (often unstated) assumption that the selected
reviewers were the correct reviewers. They find that using
history as the ground truth is inherently flawed. Fig. 2: Socio-technical graph

III. SOCIO - TECHNICAL GRAPH


C ORAL system contains three main building blocks (as changes created between a pull request node and a file
shown in Figure 1). node if the pull request is changing the file.
1) Building the socio-technical graph. linkedTo created between a pull request node and a work
2) Performing the graph representation learning to generate item node if the pull request is linked to the work item.
node embeddings. commentsOn created between a pull request node and a
3) Performing inductive inference to predict reviewers for reviewer node if the reviewer places code review comments.
the new pull requests. parentOf created between a work item node and another
work item node if there exist a parent-child relationship
In this section, we describe the process of building the socio-
between them.
technical graph from entities (developers, repositories, files,
pull requests, work items) and their relationships in modern
source code management systems shown as step 1 in Figure
B. Augmented Socio-technical graph
1.
To include semantic information, we expand the Socio-
A. Socio-technical Graph technical graph to have text tokens represented as nodes. This
The Socio-technical graph consists of nodes, which has two benefits:
represent the people and the artifacts, and edges, which 1) Map users to concepts (word tokens): this helps in
represent the relationships or interactions that exist between building a knowledge base of users (authors, reviewers)
the nodes. Figure 2 shows the nodes and the edges along to concepts. For example., if a user is authoring/reviewing
with their properties. The Socio-technical graph (STG) has pull requests which contain a token, a second order
two fundamental elements. relationship will be established from that user to the
Nodes There are six types of nodes in the Socio-technical token.
graph. They are pull request, work item, author, reviewer, 2) Bring semantically similar token together: as we are
file, and repository. establishing edges between words that appear together,
Edges There are five types of edges in the Socio-technical we capture the semantic similarity between the words.
graph as listed below.
creates created between an author node and a pull request We perform the four steps explained below to construct the
node. augmented Socio-technical graph (ASTG):
reviews created between a reviewer node and a pull request • Tokenize the text (title and description) of each pull
node. request, work item, and the names of the source code
contains created between a repository node and a pull files edited in those pull requests.
request node if the repository contains the pull request. • Filter the stop words by implementing a block list [24].

164

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Distribution of node and edge types in the Socio- The Socio-technical graph contains 5,858,834 nodes and
technical graph 23,803,053 edges. A detailed statistics of node and edge types
Element type Label Count
can be found in Table I.
Node pull request 1,342,821
Node work item 542,866 IV. R EVIEWER R ECOMMENDATION VIA G RAPH N EURAL
Node file 2,809,805 N ETWORKS
Node author 18,001
Node reviewer 30,585 Reviewing a pull request is a collaborative effort. Good
Node text 1,104,427 reviewers are expected to write good code review comments
Total 5,858,834 that help improve the quality of the code and thus shape a
Edge creates 1,342,821 good product. In order to achieve this, a good reviewer needs
Edge reviews 7,066,703 to be 1) familiar with the feature that is implemented in the
Edge contains 1,342,821
Edge changes 12,595,859
pull request, 2) experienced in working with the source code
Edge parent of 148,422 and the files that are modified by the pull request, 3) a good
Edge linked to 1,252,901 collaborator with others in the team, and, 4) actively involved
Edge comments on 53,506
in creating and reviewing related pull requests in the repos-
Total 23,803,053 itory. Hence a machine learning algorithm that recommends
reviewers for a pull request needs to model these complex in-
teraction patterns to produce a good recommendation. Feature
• All the text nodes that appear in a pull request title or learning via embedding generation has shown good promise in
description and work item title or description are linked the literature in capturing complex patterns from the data [26],
to the respective pull requests. All the text nodes that [27], [28], [29]. Hence in this work we propose to pose the
appear in a file name are linked to the file nodes. reviewer recommendation problem as ranking reviewers using
• Text nodes are linked to each other based on their co- similarity scores between the users and the pull requests in the
occurrence in the pull request corpus. Pointwise Mutual embedding space. In the rest of this section we give details
Information (PMI) [25] is a common measure of the on learning embedding for pull requests and users along with
strength of association between two terms. other entities (such as files, word tokens, etc.), and scoring top
p(x, y) reviewers for a new pull request using the learned embeddings.
P M I(x, y) = log (1) The socio-technical graph shown in Figure 2 has essential
p(x)p(y)
ingredients to model the characteristics of a good reviewer: 1)
The formula is based on maximum likelihood estimates: the user - pull request - token path in the graph associates a
when we know the number of observations for token x, user to a set of words that characterize the user’s familiarity
o(x), the number of observations for token y, o(y) and the with one or more topics. 2) user - pull request - file path
size of the corpus N, the probabilities for the tokens x associates a user to a set of files that the user authors or
and y, and for the co-occurrence of x and y are calculated reviews. 3) user - pull request - user path characterizes the
by: collaboration between people in a project. 4) pull request - user
o(x) o(y) o(x, y) - pull request path characterizes users working on related pull
p(x) = p(y) = p(x, y) = (2)
N N N requests. Essentially, by envisioning software development
The term p(x,y) is the probability of observing x and y activity as an interaction graph of various entities, we are able
together. to capture interesting and complex relations and patterns in
the system. We aim to encode all these complex interactions
into entity embeddings using Graph Neural Network (GNN)
C. Scale [18]. These embeddings are then used as features to predict
The Socio-technical graph is built using the software de- most relevant reviewers to a pull request. In Figure 1 this is
velopment activity data from 332 repositories. We ingest data depicted as step 2 and 3.
starting from 1st January, 2019, or from when the first pull
A. Graph Neural Network Architecture
request is created in a repository (whichever is older). The
graph is refreshed three times a day. During the refresh we Graph Convolutional Network (GCN) [30] (which is a form
perform two operations: of GNN) has shown great success in the machine learning
Insert Ingest new pull requests, work items, and code review community for capturing complex relations and interaction
information, across all the 332 repositories, by creating corre- patterns in a graph through node embedding learning. In GCN,
sponding nodes, edges, properties. for each node, we aggregate the feature information from all
Update the word tokens connected to nodes, if there are its neighbors and of course, the feature itself. During this
changes. We also update the edges between nodes to reflect aggregation, neighbors are weighted as per the edge (relation)
the changes in the source data. weight. A common approach that has been used effectively

165

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
in the literature is to weigh the edges using a symmetric- embeddings derived from the 2-layer GCN. In particular, we
normalization approach. Here we normalize the edge weight set the link probability as equal to σ zTu zv . Here, σ denotes
by the degrees of both the nodes connected by the edge. The the logistic function, and zu , zv denote the embeddings of
(2) (2)
aggregated feature values are then transformed and fed to the nodes u, v respectively (i.e., zu = hu , zv = hv from
next layer. This procedure is repeated for every node in the Equation 4). This probability is high when the nodes u and v
graph. are connected in the graph. And, it is low when the nodes u
Mathematically it can be represented as follows: and v are not connected in the graph. Accordingly, we prepare
⎛ ⎞ a training data set D containing records of triplets (u, v, y),
 W (k−1) (k−1)
h v
where (u, v) are the edges in the graph and y ∈ {0, 1} denotes
h(k)
u =σ
⎝  ⎠ (3) the presence or absence of an edge between u and v. Since
|N (u)| |N (v)|
v∈N (u)∪{u} there can be very large number of node pairs (u, v) where
(k) u and v are not connected, we employ random sampling to
where hu is the embedding of node u in the k th layer;
(0) select a sizable number of such pairs. The training objective
h is the initial set of node features, which can be set to
is to minimize the cross-entropy loss L in the Equation 5.
one-hot vectors if no other features are available; N (u) is
the set of neighbors of node u; W (k) is the feature transfor- 
1   
mation weights for the k th step (learned via training), σ is the L=− ylog σ zTu zv +(1 − y)log σ zTu zv
activation function (such as RELU [31]). Note |D|
 that symmetric- (u,v,y)∈D
normalization is achieved by dividing by |N (u)| |N (v)|. (5)
GCN learns node embeddings from a homogeneous graph Minimizing the above loss enforces the dot product of the
with same node types and relations. However, the pull request embeddings of the nodes u, v to attain high value when they
graph in Figure 2 is a heterogeneous graph with different are connected by an edge in the graph (i.e., when y = 1), and
node types and different relation types between them. In this a low dot product value when they are not connected in the
case, inspired by RGCN [30], for each node, we aggregate the graph (i.e., when y = 0). The parameters of the model are
feature information separately for each type of relation. updated as the training progresses to minimize the above loss.
Mathematically it can be represented as follows: We stop training when the loss function stops decreasing (or
the decrease becomes negligible).
⎛ ⎞
  W
(k−1) (k−1)
r h v (k−1)
C. Inductive Recommendation for New Pull Requests
h(k)
u =σ
⎝  + W0 h(k−1)
u

|Nr (u)| |Nr (v)| GCN by design is a transductive model. That is, it can
r∈R v∈Nr (u)
generate embeddings only for the nodes that are present in
(4)
the graph during the training. It cannot generate embeddings
where R is the set of relations, Nr (u) is the set of neighbors
(k) for the new nodes without adding those nodes to the graph
of u having relation r, Wr is the relation-specific feature and retraining. On the other hand, inductive models can infer
(k)
transformation weights for the k th layer; W0 is the feature embeddings for the new nodes that were unseen during the
transformation weights for the self node. training by applying the learned model to the new nodes. Since
The set of relations R captures the semantic relatedness of C ORAL is a GCN-based model, we will not have embedding
different types of nodes in the graph. This is generally deter- for the new pull request u at the inference time. We need to
mined by the domain knowledge. For C ORAL we identified a derive the embedding for u on-the-fly by applying Equation
bunch of useful relations as listed in Table II. 4. The challenge in deriving the embedding is in getting the
In our experiments, we use a 2-layer GCN network, i.e., correct self embedding for u . That is, as per Equation 4, to
we set k = 2 in Equation 4. With this, GCN can capture (2) (0) (1)
generate hu , we need trained W0 and W0 , which are
second order relations such as User-User, File-File, User-File, not available for the new nodes. Hence we approximate the
User-Word, etc., which we believe are useful in capturing embedding of the new node by ignoring its self embedding part
interesting dependencies between various entities, such as in Equation 4, which leads to the following approximation:
related files, related users, files authored/modified by users,
words associated with users, etc. While setting k to a higher ⎛ ⎞
value can fold-in longer distance relations, it is not clear if (2)
  (1) (1)
Wr h v
zu = hu = σ⎝  ⎠ (6)
that helps or brings more noise. We leave that exploration to |Nr (u )| |Nr (v)|
r∈R v∈Nr (u )
our future work.
Here, zu is the embedding of the new pull request u , R
B. Training the Model is the set of relations involving the pull request node (i.e.,
(·) (·)
To learn the parameters of the model (i.e.,Wr and W0 ) PullRequest-User, PullRequest-File, and PullRequest-Word),
(1)
we pose it as a Link Prediction problem. Here, we set the Wr are the trained model weights from the 2nd layer of
(1)
probability of existence of a link/edge between two nodes the GCN, and hv are the embeddings coming out of the first
u and v as proportional to the dot product between their layer of the GCN.

166

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Relations (R) used for generating embeddings
Relation Semantic Description
1 PullRequest - User Captures the author or reviewer relationship between a pull request and a user
2 PullRequest - File Captures the related file modification needed for a pull request
3 PullRequest - Word Captures the semantic description of a pull request through the words
4 File - Word Captures the semantic description of a file.
5 Word - Word Captures the related words in a window of size 5 in a sentence (in the pull request title/description)

After obtaining the embedding of the new pull request as TABLE III: Pull request distribution across dimensions
per Equation 6, we can get the top k reviewers for it by finding Repo size (# of developers) # data
the top k closest users in the embedding space. That is,
Large 220
 Medium 200
reviewersk (u ) = argmax zTu zvi (7) Small 80
v1 ..vk

where zvi is the embedding of the user vi .


Since our training objective enforces high zTu zv score when by observing the recommendations made by the C ORAL and
the likelihood of an edge (u, v) is high, Equation 7 finds the telemetry generated from the production deployment.
the users who are most likely to be associated with the pull
A. Methodology
request, as reviewers. Finding top k reviewers in this way using
their embeddings allows us to naturally make use of complex 1) Retrospective Evaluation: To address RQ1, we construct
relationships that are encoded in those embeddings to capture a dataset of 254K code reviews, i.e. pull request–reviewer
user’s relatedness to the pull request. pairs, starting from 2019 to evaluate C ORAL. To keep training
and validation cases separate, these nodes and their edges are
V. E XPERIMENTS not present in the graph during model training. We use the
following metrics, which are the most common measures for
To assess the value of C ORAL empircally, we pose three
evaluating reviewer recommender approaches [14], [8], [13],
research questions:
[21], [20], [22]:
RQ1 How well does C ORAL model the review history? Accuracy We measure the percentage of pull requests from
RQ2 Under what circumstances does C ORAL perform test data for which C ORAL is able to recommend at least one
better than a Rule-based Model (and vice versa)? reviewer correctly and report the percentage for top 1, 3, 5, and
RQ3 What are developers’ perceptions about C ORAL? 7 reviewers suggested by the model. Specifically, given a set
The vast majority of code reviewer recommendation ap- of pull requests P , the top-k accuracy can be calculated using
proaches are evaluated by comparing recommendations from Equation 8. The isCorrect(p, Top-k) function returns value of 1
the tool with historical code reviews and examining how often if at least one of top-k recommended reviewers actually review
the recommended reviewers were the actual reviewers [23]. the pull request p; and returns value of 0 for otherwise.
In line with this accepted practice, RQ1 asks how often the
network is able to recommend the reviewers that the authors isCorrect(p, Top-k )
p∈P
added. However, as Dougan et al. point out, there is an under- Top-k accuracy(P ) = (8)
|P |
lying (and often unstated) assumption that these are the correct
reviewers* [23]. To address this flawed assumption and pursue Mean reciprocal rank (MRR) This metric is used exten-
a more complete ground truth, we reach out to the reviewers sively in recommender systems to assess whether the correct
recommended by C ORAL that were not recommended by a recommendation is made at the top of a ranked list [32]. MRR
rule-based model. The results of this developer study help is calculated using Equation 9, where rank(candidates(p))
address RQ2 and RQ3. returns value of the first rank of correct reviewer in the
For the purpose of conducting the experiments and com- recommendation list candidates(p).
parative studies, we use a rule-based model built based on the
1  1
heuristics proposed by Zanjani et al. [21] which demonstrated MRR = (9)
that considering the history of source code files edited in a |P | rank(candidates(p))
p∈P
pull request in terms of authorship and reviewership is an
2) User Study: To address RQ2 and RQ3, we conduct
effective way to recommend peer reviewers for a code change.
a user study by reaching out to reviewers recommended by
This model is currently deployed in production at Microsoft.
C ORAL to see if they would be qualified to review the pull
This gives us an opportunity to conduct comparative studies
requests.
* We would point out that if this assumption were correct, then there would Sampling We select 500 recent pull requests from the test
be no need for a recommender in the first place! data set of 254K pull requests and randomly pick one of

167

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
the top 2 recommendations by C ORAL as the recommended TABLE IV: Link prediction accuracy and MRR
reviewer to reach out. Note that pull requests selected had Metric k=1 k=3 k=5 k=7
not been recommended by the rule-based model and each
Accuracy 0.50 0.73 0.78 0.80
recommended reviewer appears at most once. The pull requests MRR 0.49 0.61 0.68 0.72
are collected from repositories having different number of
developers using stratified random sampling following the TABLE V: Comparative user study precision across dimen-
distribution in Table III. The categories are defined as follows: sions. RM is Rule-based Model. The differences between the
number of developers: Large ( > 100 developers), Medium two models with the same Greek letter suffix (and only those
(between 25 and 100 developers), Small (< 25 developers). pairs) are not statistically significant.
Questionnaire We perform the user study by posing a set of
questions on what actions a reviewer might take when they Repo size (# of developers) RM C ORAL
were recommended for a specific pull request: Large 0.19 0.37
Medium 0.31α 0.36α
1) Is this pull request relevant to you (as of the PR creation Small 0.35β 0.23β
date, state)?
A - Not relevant
B - Relevant, I’d like to be informed about the pull survey (and save time and frustration), because they already
request. indicated their preferences on the pull request when it was
C - Relevant, I’d take some action and/or I’d comment active and when they were added as reviewers.
on the pull request. An important point to keep in mind is, the rule-based
2) If possible, could you please explain the reason behind model is adding recommended reviewers directly to the pull
your choice? requests. This increases the probability of them taking an
We avoid intruding in the actual work flow, yet still maintain action even if they may not be an appropriate developer to
an adequate level of realism by working with actual pull conduct the review. The reason for this is that the reviewers
requests, thus balancing realism and control in our study [33]. are being selected and the assignment of them to the PR is
Note that, with 287 responses, this is one of the largest field public (everyone, including their managers, can see who is
studies conducted to understand the effects of an automated reviewing the pull request) [5]. If they do not respond, it
reviewer recommender system. might look like they are blocking the pull request progression.
We divided the questionnaire among 4 people to conduct the In contrast, C ORAL’s recommendations are validated through
user studies. The interviewers did not know these reviewers, user studies, which are conducted in a private 1-1 setting where
nor had worked with them before. The teams working on the participants likely feel more comfortable indicating that they
systems under study are organizationally far away from the are not appropriate for the review. Reviewers can be open
interviewers. Therefore, they do not have any direct influence about their decisions in the user studies. Therefore, C ORAL
on the study participants. The interview format is semi- might be at a slight disadvantage.
structured where users are free to bring up their own ideas
and free to express their opinions about the recommendations. B. Results
We use the question (2) to collect user feedback and analyze 1) How well does C ORAL model the review history?:
it to generate insights about the perceptions of the developers To answer RQ1 , we examine who the pull request author
about the automated reviewer recommendation systems (RQ3 ). invited to review a change and then check to see if C ORAL
Namely, the factors that influence people to not lean towards recommended the same reviewers. In this context, the “cor-
using an automated reviewer recommendation system. rect” recommendation is defined as the recommended reviewer
3) Comparing with Rule-based Model: To compare C ORAL being invited to the pull request. While the author’s actions
with the rule-based model, we select another 500 recent pull may not actually reflect the ground truth of who is best
requests from the set of pull requests on which the rule- able to review the change, most prior work in code reviewer
based model (currently deployed in production) has made recommendation evaluates recommenders in this way (see [23]
recommendations, by following the same distribution as the for a thorough discussion of this) and so we follow suit here.
pull requests selected for evaluating C ORAL (Table III). We Table IV shows the accuracy and MRR for C ORAL across all
then collect the recommendations made by the rule-based 254K (pull request-reviewer) pairs. In 73% of the pull requests,
model, the subsequent actions performed by the recommended C ORAL is able to replicate the human authors’ behavior in
reviewers (changing the status of the pull request, or adding a picking the reviewers in the top 3 recommendations which
code review comment, or both) for the selected pull requests validates that C ORAL matches the history of reviewer selection
from telemetry. The telemetry yields two benefits: 1. it helps quite well. *
us gather user feedback without doing another large-scale user * Note that by design the rule-based model always includes the author-
study, as the telemetry captures the user actions already 2. it invited people to review, so we do not evaluate rule-based model in this
avoids the probable study participants from taking one more approach.

168

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
2) RQ2 : Under what circumstances does C ORAL perform TABLE VI: Distribution of qualitative user study responses.
better than a rule-based model (and vice versa)?: In Table V, Category # of responses (%)
we show the recommendation precision of rule-based model
I will review this pull request 170 (59.23%)
and C ORAL. Specifically, on the sampled data for the C ORAL I’d like to be added to this pull request 24 (8.36%)
model and rule-based model, precision is calculated as the This pull request is not relevant to me 93 (32.40%)
percentage of the recommended reviewers who are willing to
engage in reviewing the pull requests. For rule-based model,
reviewers who either change the status of the pull request or “The content of the PR might impact another repository that
add a code review comment are considered as engaged. For I have ownership of because we use some of the components
C ORAL, reviewers who say that the pull request is relevant in that lib. Based on that I would say it is a relevant PR and
and they would take some action are considered as engaged. I will not mind reviewing it.”
Generally, there is “no model to rule them all”. Neither 3) RQ3 : What are developer’s perceptions about an au-
of the models performs consistently better than the other in tomated reviewer recommendation model?: We show the
all the pull requests from repositories of all categories. As distribution of user study responses in Table VI. Out of 500
shown in table V, C ORAL performs better on pull requests user study messages we sent, 287 users responded. 67.6% of
from large repositories and medium repositories while the rule- the users give positive feedback saying that the given pull
based model does well on pull requests from small reposi- request is relevant to them to some degree. In this, 8.36% of
tories. However, when we statistically tested for differences, the users say they would like to be informed about the given
Fisher exact tests [34] only showed a statistically significant pull request. 59.23% of the users say that they would like to
difference between the two approaches for large repositories take some action and/or leave comment on the pull request.
(p = 0.013). 32.4% of the users give negative feedback saying that the given
One observation that may explain this result is that due to pull request is not relevant.
their size, large software projects dominate the graph. Thus, To understand the reason that users do not like C ORAL’s
C ORAL is trained on many more pull requests from large recommendations, we analyze the negative feedback (com-
projects than from smaller projects. If the mechanisms, factors, ments/anecdotes from the developers) and classify them into 3
behaviors, etc., for reviewer selection are different in smaller categories with their distribution shown in Table VII. To offer
projects than large ones, then the model is likely learn those an impression, we show some typical negative quotes that we
used in larger projects. This hypothesis could be confirmed received from users.
by splitting the training data by project size and training 91.03% of the negative feedback we received said that the
multiple models. However, as reviewer recommendation is pull request is no longer relevant to them and 69.23% of
most important in projects with many developers and that them said it is because they started to work in a different
appears to be where C ORAL excels, we do not pursue this area and 21.79% of them mentioned that they do not work
line of inquiry. in this repository because of switching groups or transferring
We have observed that in small repositories usually with few teams: “Not relevant since I no longer work on the team that
developers, one or two experienced developers are more likely manages this service.” 6.41% of the users mentioned that they
to take the responsibility of reviewing pull requests which are actually never involved in code review: “I’m a PM. I’m
accounts for the high accuracy of rule-based model. However, less interested in PR in general. Only when I’m needed by
this phenomenon in which a small number of experienced the devs and then they mention me there.” Two users said that
people in a particular repository are assigned the lion’s share of the pull requests we provided does not need to be reviewed:
reviews is problematic, and heuristics have been used to “share “Let me explain. This is an automated commit that updates the
the load” [15]. As the socio-technical graph contains historical version number of the product as part of the nightly build. It
information about a developer across many repositories and pretty much happens every night. So it doesn’t need reviewer
PRs from different repositories may be semantically related, like a traditional pull request would.”
C ORAL is able to leverage more information per developer From users’ negative feedback, we learn that in order to
and per PR, which may avoid this problem. improve C ORAL we need to include several extra factors. First,
The following feedback received from the user study (ques- our socio-technical graph should take the people movement
tion (2)) also demonstrates that C ORAL identifies relevant and into consideration and update the graph dynamically, namely
qualified reviewers who traditional reviewer recommenders identifying inactive users and removing edges or decaying the
miss: weight on the edges between user nodes and repository nodes.
“This PR is created in a repository on which our service Second, C ORAL should include and learn the job role
has a dependency on. I would love to review these PRs. In for every user in the socio-technical graph through node
fact, I am thinking of asking x on these PRs going forward.” embeddings, such as SDE, PM, so that it can filter those
“I never reviewed y’s PRs. I work with her on the same irrelevant users and suggest the reviewers more precisely.
project and know what she is doing. I am happy to provide Third, before running the C ORAL, some heuristic rules can
any feedback (of course if she’d like :))” be designed to filter the automatic, deprecated pull requests.

169

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
TABLE VII: Users’ Negative Feedback Categories.
Category Feedback # of feedback (%)
I This pull request is no longer relevant to me 71 (91.03%)
II Never participate in code review 5 (6.41%)
III Pull request does not need reviewer 2 (2.56%)

Besides the negative feedback, we receive a lot of credits about the system because they wanted to make the interviewers
from users: happy.
“The recommendation makes a lot of sense since I primarily The Socio-technical graph contains information about who
contributed to that repository for a few years. However, a was added as a reviewer on a PR, but it does not explain why
recent re-org means I no longer work on that repository.” that person was added or if they were added as the result of
“I am lead of this area and would like to review these kinds a reviewer recommendation tool. Thus, in our evaluation of
of PRs which are likely fixing some regressions.” how well C ORAL is able to recommend reviewers that were
They validate our claim that C ORAL does consider the historically added to reviews, it is unclear how much of history
interactions between users and files, and the recommendations comes from the rule-based model recommender and how much
are understandable by humans. Since C ORAL is trained and from authors without the aid of a recommender.
evaluated on historical pull requests starting from 2019, it is When looking at repository history, the initial recommenda-
hard to reconstruct the situation where the pull requests were tion by the rule-based model is based on files involved in the
created and many users complain that it is difficult to recall initial review, while C ORAL includes files and descriptions in
the context of the pull requests, thus putting C ORAL in a the review’s final state. If the description or the set of files was
disadvantage. We expect it will have better performance in modified, then C ORAL may have a different set of information
the actual production. available to it than it would have had it been used at the time
4) Ablation Study: To evaluate the contribution of each of of PR creation.
the entities in C ORAL, we perform an ablation study, with In our evaluation of C ORAL, we use a training set of PRs to
results shown in Table VIII. Specifically, we first remove train the model and keep a hold out set for evaluation. These
the entities from the socio-technical graph and training data, datasets are disjoint, but they are not temporally divided. In
and then retrain the graph convolutional neural network. We an ideal setting all training PRs would precede all evaluation
find that ablating each entity deteriorates performance across PRs in time and we would evaluate our approach by looking
metrics. After removing word entities and file entities from at C ORAL’s recommendation for the the next unseen PR
graph, i.e. the socio-technical graph only contains user and (ordered by time), then add that PR to the Socio-technical
pull requests entities, the model can hardly recommend correct graph, and then retrain the model on the updated graph for
reviewers. By comparing (1) and (2), (1) and (3), we demon- the following PR and repeat until all PRs in the evaluation set
strate the importance of semantic information and file change were exhausted. This form of evaluation proved too costly and
history introduced by file entities in recommending reviewers time consuming to conduct and so we used a random split of
and file entities give more value than words. Looking at (3) and training and testing data sets.
(4), we observe boost in performance when adding semantics We sampled the 500 PRs from the population using a
information on top of the file change and review activities, random selection approach. We selected sample size in an
which underlines our claim that incorporating information effort to avoid bias and confounding factors in the sample,
around interactions between code contributors as well as the but we cannot guarantee that this data set is free from noise,
semantics of code changes and their descriptions can help bias, etc.
identify the best reviewers. VII. F UTURE W ORK

VI. T HREATS AND L IMITATIONS In this work we showed that a simple GCN style model is
able to capture complex interaction patterns between various
As part of our study, we reached out to people who were entities in the code review ecosystem and can be used to
not invited to a review but that C ORAL recommended as predict relevant reviewers for pull requests effectively. While
potential reviewers. It is possible that their responses to our this method is very promising on large sized repositories, we
solicitations differed from what they may have actually done believe that the method can be improved to make good rec-
if they were unaware that their actions/responses were being ommendations on other repositories too by training repository
observed (the so-called Hawthorne Effect [35]). Microsoft has type specific models. In this work we mainly focused on using
tens of thousands of developers and we were careful not to interaction graph of various entities (pull requests, users, files,
include any repositories or participants that we have interacted words, etc.) to learn complex features through embeddings.
with before or might have a conflict of interest with us. We neither captured any node specific features (e.g., user-
Nonetheless, there is a chance that respondents may be positive specific features, file-specific features, etc.) nor any edge

170

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
TABLE VIII: Link prediction accuracy and MRR for various configurations of parameters
Accuracy MRR
Models
k=1 k=3 k=5 k=7 k=1 k=3 k=5 k=7
(1) No words or files 0.02 0.08 0.13 0.16 0.01 0.04 0.05 0.06
(2) Words only 0.21 0.30 0.32 0.34 0.21 0.25 0.26 0.32
(3) Files only 0.29 0.69 0.73 0.76 0.29 0.48 0.49 0.50
(4) Words + Files 0.49 0.73 0.77 0.80 0.49 0.61 0.68 0.72

specific features (e.g., how long ago user authored/modified IX. DATA AVAILABILITY
files, whether two users belong to the same org or not, We are unfortunately unable to make the data involved
etc.). Incorporating such features may help the model learn in this study publicly available as it contains personally
even complex patterns from the data and further improve identifiable information as well as confidential information.
the recommendation accuracy. Furthermore, we believe that Access to the data for this study was made under condition
a detailed study of effect of model hyper-parameters (such of confidentiality from Microsoft and we cannot share it
as embedding dimension, number of GCN layers, different while remaining compliant with the General Data Protection
activation functions, etc.) on the recommendation accuracy Regulation (GDPR) [36].
will be a very useful result. We intend to explore these
directions in our future work. R EFERENCES
The techniques explained in this paper and the C ORAL [1] G. Gousios, M. Pinzger, and A. v. Deursen, “An exploratory study of the
system are generic enough to be applied on any dataset that pull-based software development model,” in International Conference on
Software Engineering, 2014, pp. 345–355.
follows a GIT based development model. Therefore, we see [2] P. Rigby, B. Cleary, F. Painchaud, M.-A. Storey, and D. German,
opportunities for implementing C ORAL for source control “Contemporary peer review in action: Lessons from open source de-
systems like GitHub and GitLab. velopment,” IEEE software, vol. 29, pp. 56–61, 2012.
[3] P. C. Rigby and C. Bird, “Convergent contemporary software peer review
practices,” in International Symposium on the Foundations of Software
VIII. C ONCLUSION Engineering, 2013, pp. 202–212.
[4] A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges
In this work, we seek to to leverage additional recorded of modern code review,” in International Conference on Software
information in software repositories to improve reviewer rec- Engineering, 2013, pp. 712–721.
ommendation and address the weakness of the approaches that [5] P. C. Rigby and M.-A. Storey, “Understanding broadcast based peer
review on open source software projects,” in International Conference
rely only on the historical information of changes and reviews. on Software Engineering, 2011, pp. 541–550.
To that end we propose C ORAL, a novel Graph-based ma- [6] A. Bosu, M. Greiler, and C. Bird, “Characteristics of useful code
chining learning model that leverages a socio-technical graph reviews: An empirical study at microsoft,” in International Working
Conference on Mining Software Repositories, 2015, pp. 146–156.
built from the rich set of entities (developers, repositories, [7] J. Lipcak and B. Rossi, “A large-scale study on source code reviewer
files, pull requests, work items, etc.) and their relationships in recommendation,” in Euromicro Conference on Software Engineering
modern source code management systems. We train a Graph and Advanced Applications (SEAA), 2018, pp. 378–387.
[8] A. Ouni, R. G. Kula, and K. Inoue, “Search-based peer reviewers
Convolutional Neural network (GCN) on this graph to learn recommendation in modern code review,” in 2016 IEEE International
to recommend code reviewers for pull requests. Conference on Software Maintenance and Evolution (ICSME), 2016, pp.
Our retrospective results show that in 73% of the pull 367–377.
[9] J. Jiang, Y. Yang, J. He, X. Blanc, and L. Zhang, “Who should comment
requests, C ORAL is able to replicate the human pull request on this pull request? analyzing attributes for more accurate commenter
authors’ behavior in top 3 recommendations and it performs recommendation in pull-based development,” Information and Software
better than the rule-based model in production on pull requests Technology, vol. 84, pp. 48–62, 2017.
[10] Y. Yu, H. Wang, G. Yin, and C. X. Ling, “Reviewer recommender of
in large repositories by 94.7%. A large-scale user study pull-requests in github,” in IEEE International Conference on Software
with 500 developers showed 67.6% positive feedback, and Maintenance and Evolution, 2014, pp. 609–612.
relevance in suggesting the correct code reviewers for pull [11] Y. Yu, H. Wang, G. Yin, and T. Wang, “Reviewer recommendation for
pull-requests in github: What can we learn from code review and bug
requests. assignment?” Information and Software Technology, vol. 74, pp. 204–
Our results open new possibilities for incorporating the rich 218, 2016.
set of information available in software repositories and the [12] J. B. Lee, A. Ihara, A. Monden, and K.-i. Matsumoto, “Patch reviewer
recommendation in oss projects,” in Asia-Pacific Software Engineering
interactions that exist between various actors and entities to Conference (APSEC), vol. 2, 2013, pp. 1–6.
develop code reviewer recommendation models. We believe [13] E. Sülün, E. Tüzün, and U. Doğrusöz, “Reviewer recommendation using
the techniques and the system has a wider applicability ranging software artifact traceability graphs,” in International Conference on
Predictive Models and Data Analytics in Software Engineering, 2019,
from individual organizations to large open source projects. pp. 66–75.
Beyond code reviewer recommendation, future research could [14] P. Thongtanunam, C. Tantithamthavorn, R. G. Kula, N. Yoshida, H. Iida,
also target other recommendation scenarios in source code and K.-i. Matsumoto, “Who should review my code? a file location-
based code-reviewer recommendation approach for modern code re-
repositories that could aid software developers leveraging the view,” in International Conference on Software Analysis, Evolution, and
Socio-technical graphs. Reengineering (SANER), 2015, pp. 141–150.

171

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.
[15] S. Asthana, R. Kumar, R. Bhagwan, C. Bird, C. Bansal, C. Maddila, [24] “English stop words,” Accessed 2021. [Online]. Available: https:
S. Mehta, and B. Ashok, “Whodo: automating reviewer suggestions //gist.github.com/sebleier/554280
at scale,” in Joint European Software Engineering Conference and [25] C. D. Manning and H. Schütze, Foundations of Statistical Natural
Symposium on the Foundations of Software Engineering, 2019, pp. 937– Language Processing. Cambridge, Massachusetts: The MIT Press,
945. 1999. [Online]. Available: http://nlp.stanford.edu/fsnlp/
[16] O. Kononenko, O. Baysal, L. Guerrouj, Y. Cao, and M. W. Godfrey, [26] P. D. Hoff, A. E. Raftery, and M. S. Handcock, “Latent space approaches
“Investigating code review quality: Do people and participation matter?” to social network analysis,” Journal of the american Statistical associ-
in IEEE international conference on software maintenance and evolution ation, vol. 97, pp. 1090–1098, 2002.
(ICSME), 2015, pp. 111–120. [27] W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on
[17] A. Bosu and J. C. Carver, “Impact of peer code review on peer impres- graphs: Methods and applications,” IEEE Data Eng. Bull., vol. 40, pp.
sion formation: A survey,” in International Symposium on Empirical 52–74, 2017.
Software Engineering and Measurement, 2013, pp. 133–142. [28] H. Chen, B. Perozzi, R. Al-Rfou, and S. Skiena, “A tutorial on network
[18] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A embeddings,” 2018.
comprehensive survey on graph neural networks,” IEEE transactions [29] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and
on neural networks and learning systems, vol. 32, pp. 4–24, 2020. M. Sun, “Graph neural networks: A review of methods and applications,”
[19] H. A. Çetin, E. Doğan, and E. Tüzün, “A review of code reviewer AI Open, vol. 1, pp. 57–81, 2020.
recommendation studies: Challenges and future directions,” Science of [30] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, and
Computer Programming, p. 102652, 2021. M. Welling, “Modeling relational data with graph convolutional net-
[20] V. Balachandran, “Reducing human effort and improving quality in peer works,” in European semantic web conference, 2018, pp. 593–607.
code reviews using automatic static analysis and reviewer recommenda- [31] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
tion,” in International Conference on Software Engineering, 2013, pp. boltzmann machines,” in International Conference on International
931–940. Conference on Machine Learning, 2010, p. 807–814.
[21] M. B. Zanjani, H. Kagdi, and C. Bird, “Automatically recommending [32] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Informa-
peer reviewers in modern code review,” IEEE Transactions on Software tion Retrieval. USA: Cambridge University Press, 2008.
Engineering, vol. 42, pp. 530–543, 2015. [33] K.-J. Stol and B. Fitzgerald, “The abc of software engineering research,”
[22] M. M. Rahman, C. K. Roy, and J. A. Collins, “Correct: code reviewer ACM Trans. Softw. Eng. Methodol., vol. 27, 2018.
recommendation in github based on cross-project and technology experi- [34] A. Agresti, Categorical data analysis. John Wiley & Sons, 2003, vol.
ence,” in International Conference on Software Engineering Companion, 482.
2016, pp. 222–231. [35] J. G. Adair, “The hawthorne effect: a reconsideration of the method-
[23] E. Doğan, E. Tüzün, K. A. Tecimer, and H. A. Güvenir, “Investigating ological artifact.” Journal of applied psychology, vol. 69, p. 334, 1984.
the validity of ground truth in code reviewer recommendation studies,” [36] General data protection regulation. European Commission. [Online].
in International Symposium on Empirical Software Engineering and Available: https://gdpr-info.eu/
Measurement (ESEM), 2019, pp. 1–6.

172

Authorized licensed use limited to: Univ of Calif Davis. Downloaded on August 09,2023 at 16:47:32 UTC from IEEE Xplore. Restrictions apply.

You might also like