Functional Connectome Fingerprinting Identifying Individuals Based on Patterns of Brain Connectivity
Functional Connectome Fingerprinting Identifying Individuals Based on Patterns of Brain Connectivity
Author manuscript
Nat Neurosci. Author manuscript; available in PMC 2016 September 01.
Author Manuscript
Abstract
While fMRI studies typically collapse data from many subjects, brain functional organization
varies between individuals. Here, we establish that this individual variability is both robust and
reliable, using data from the Human Connectome Project to demonstrate that functional
connectivity profiles act as a “fingerprint” that can accurately identify subjects from a large group.
Author Manuscript
Identification was successful across scan sessions and even between task and rest conditions,
indicating that an individual’s connectivity profile is intrinsic, and can be used to distinguish that
individual regardless of how the brain is engaged during imaging. Characteristic connectivity
patterns were distributed throughout the brain, but notably, the frontoparietal network emerged as
most distinctive. Furthermore, we show that connectivity profiles predict levels of fluid
intelligence; the same networks that were most discriminating of individuals were also most
predictive of cognitive behavior. Results indicate the potential to draw inferences about single
subjects based on functional connectivity fMRI.
Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research,
subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
Author Manuscript
Introduction
Author Manuscript
We are all unique individuals. Nevertheless, human neuroimaging studies have traditionally
collapsed data from many subjects to draw inferences about general patterns of brain activity
that are common across people. Studies that contrast two populations—such as patients and
healthy controls—typically ignore the considerable heterogeneity within each group.
subjects8. Many of the early analyses of HCP data have focused on elucidating the general
blueprint for brain connectivity that is shared across people. Yet despite the gross
similarities, there is reason to believe that a substantial portion of the brain connectome is
unique to each individual9.
session. We show that identification is successful between rest sessions, task sessions and
even across rest and task. Results indicate that while changes in brain state may modulate
connectivity patterns to some degree, an individual’s underlying intrinsic functional
architecture is reliable enough across sessions and distinct enough from that of other
individuals to identify him or her from the group regardless of how the brain is engaged
during imaging.
Results
Data for this study consisted of scans from 126 subjects provided in the Q2 data release of
the Human Connectome Project8. Each subject was scanned over a period of two days. Here,
we used data from six separate imaging conditions: two rest sessions (one on each of the two
days) and four task sessions (working memory, emotion, motor and language; two on each
day). Functional connectivity was assessed using a functional brain atlas10 consisting of 268
Author Manuscript
nodes covering the whole brain; this atlas was defined based on a separate population of
healthy subjects (see Online Methods for Yale dataset description). The Pearson correlation
coefficient between the timecourses of each possible pair of nodes was calculated and used
to construct 268×268 symmetrical connectivity matrices where each element represents a
connection strength, or edge, between two nodes. This was done for each subject for each
condition separately, such that each subject had a total of six matrices reflecting connectivity
patterns during each of the different scan sessions.
Identification was performed across pairs of scans consisting of one “target” and one
“database” session, with the requirement that the target and database sessions be taken from
different days: for example, day 1 rest matrices were used as the target and compared to a
database of day 2 rest matrices (see Fig. 1a for a schematic). In an iterative process, one
individual’s connectivity matrix was selected from the target set and compared against each
Author Manuscript
of the connectivity matrices in the database to find the matrix that was maximally similar
(Fig. 1b). Similarity was defined as the Pearson correlation coefficient between vectors of
edge values taken from the target matrix and each of the database matrices. Once an identity
had been predicted, the true identity of the target matrix was decoded and that iteration was
scored 1 if the predicted identity matched the true identity, 0 if it did not. Within a target-
database pair, each individual target connectivity matrix was tested against the database in
an independent trial.
Identification was tested across all the various pairs of scan sessions (Fig. 1a; nine pairs,
each with two possible configurations created by exchanging the roles of target and database
session). In each case, the success rate was measured as the percentage of subjects whose
identity was correctly predicted out of the total number of subjects.
Author Manuscript
Given that the 126 identification trials are not independent from one another, we performed
non-parametric permutation testing to assess the statistical significance of these results (see
Author Manuscript
Online Methods). Across 1,000 iterations, the highest success rate achieved was 6/126, or
roughly 5%. Thus the p value associated with obtaining at least 68 correct identifications
(the minimum rate achieved above) is 0.
Network-based identification
We next tested identification accuracy based on each of eight specific functional networks to
test the hypothesis that certain brain networks contribute more to individual subject
discriminability than others. These networks were derived from the same set of healthy
Author Manuscript
subjects used to define the 268-node atlas (Fig. 1c). Two networks emerged as the most
successful in individual subject identification; these were the medial frontal (network 1) and
frontoparietal network (network 2), both comprised of higher-order association cortices in
the frontal, parietal and temporal lobes. A combination of these two networks was also
tested to determine if this combination might afford even higher predictive power than each
network on its own. Figure 2a shows identification rates based on each network separately,
the combination of networks 1 and 2 as well as the whole brain for each of the nine database
and target pairs. We also highlight identification based on the combination of networks 1 and
2, referred to for convenience as the frontoparietal networks, in Fig. 2b. Frontoparietal-based
identification was extremely high between Rest1 and Rest2 (98–99%). Accuracy dropped
slightly when identification was performed between rest and task, or between two task
conditions, but remained highly significant (80–90% for most condition pairs). The
Author Manuscript
Restricting analysis to the two rest sessions, DP and Φ were calculated for all edges in the
brain, and edges in the 99.5 percentile of DP and Φ are visualized in Figure 3a. The majority
of high-DP edges are in the frontal, temporal, and parietal lobes and involve nodes in the
Author Manuscript
frontoparietal networks (1 and 2) or default mode (network 3). (This result was stable across
percentile thresholds; see Supplementary Table 1.) This data-driven mathematical definition
of characteristic edges recapitulates the results of our network-based analysis, showing that
in general, connections involving higher-order association cortices are the most
discriminating of subjects. Approximately 28% of the edges with high DP were within and
between the two frontoparietal networks. Another 48% were edges linking these two
networks to other networks (Fig. 3a, top right), suggesting that levels of interaction and
integration between the frontoparietal networks and the rest of the brain are also highly
Author Manuscript
discriminating.
(284), or the emotion scan (176). To investigate the effect of the number of time points on
identification power, we performed frontoparietal-based identification between the two rest
sessions while varying the number of time points used to calculate connectivity matrices
between 100 and 1,100. Results indicate that longer timecourses are preferable in preserving
individual characteristics in connectivity profiles (Fig. 3b), and that temporal variability in
the connectivity profiles degrades identification based on shorter timecourses, especially
those under approximately 500 time points.
al.’s network scheme15 (where networks 3, 4 and 6 represent the frontoparietal networks).
Author Manuscript
Between the two rest sessions, the identification rate based on this atlas was about 89%
using the whole brain, and about 75% using the frontoparietal networks (Fig. 4a). Reduction
in identification accuracy compared to our 268-node functional parcellation and
corresponding network definitions, especially in the case of frontoparietal-based
identification, suggests that a relatively high-resolution parcellation contributes to the
detection of individual variability and boosts identification rate.
requires the diagonal elements to be the largest. The comparison shows that using the
FreeSurfer and Yeo scheme, the raw coefficients are larger for both diagonal (t250 = −4.3, p
< 10e−5) and off-diagonal (t31,750 = −18.0, p < 10e−72) elements (Fig. 4b, bottom), which is
uninformative in explaining the difference in identification accuracy. To control for equal
global distribution of correlation coefficients, we normalized both matrices (Fig. 4c). After
normalization, there was no difference between schemes in off-diagonal elements (t31,750 =
0.27, p = 0.79), however using our parcellation and network scheme, the diagonal elements
were significantly larger than using the FreeSurfer and Yeo scheme (t250 = 14.6, p < 10e−35),
underpinning the increase in identification rate. In Fig.4c (top), the diagonal line is visually
more prominent in the matrix on the left compared to the matrix on the right.
from anatomy should remain largely static, and should not be modulated by task conditions.
To confirm that identification power came from true differences in functional connectivity
rather than anatomic idiosyncrasies, we recalculated connectivity matrices using different
smoothing kernels for the BOLD data (4 mm, 6 mm and 8 mm). With larger smoothing
kernels, the registration advantages for the same brain compared to a different brain should
be vastly reduced or eliminated. Yet we saw only a very slight drop in identification power:
based on Rest1/Rest2 pairs, identification using the frontoparietal networks remained above
Author Manuscript
96% for all three smoothing levels (see Supplementary Table 2).
accuracy observed.
observed gF scores across all subjects. Data from the day 1 rest session was used here.
than the negative model (z = 2.15, two-tailed p = 0.03). Prediction was significant across all
Author Manuscript
three feature-selection thresholds tested: all r > 0.29, p ≤ 0.001 for the positive tail; all r >
0.22, p ≤ 0.01 for the negative tail. (Note, though, that the predicted range is narrower than
the observed range in Fig. 5a; thus the model is most successful at generating predictions of
gF level relative to other subjects.)
Due to the nature of our cross-validated approach, a slightly different set of edges was
selected in each iteration. To explore which networks contributed the most predictive power,
for each of the eight networks, we calculated the average number of within-network edges
selected across all iterations and normalized by the total number of within-network edges (to
control for differences in overall network sizes). Networks 1 and 2 contributed the highest
fraction of edges to the positive model, while network 3 contributed the highest fraction of
edges to the negative model; this was consistent across statistical thresholds for feature
selection (Fig. 5b). Thus, edges that show a consistent positive correlation with gF are
Author Manuscript
disproportionately located in the frontoparietal networks, and edges that show a consistent
negative correlation with gF are mostly in the default-mode network.
As a second-pass analysis, we directly tested whether predictive power varied across the
different networks; specifically, whether the networks that performed best for identification
(the frontoparietal networks) also performed best for predicting cognitive behavior. To do
this, we repeated the leave-one-subject-out cross-validated procedure described above, this
time restricting the feature selection step to features (within-network edges) from each of the
eight networks in turn. We also tested a combination of networks 1 and 2. Thus, nine sets of
predicted gF scores were generated. Each of these was correlated with observed gF scores to
assess the predictive power of each network or network combination. Results are shown in
Figure5c and 5d.
Author Manuscript
As hypothesized, predictive power based on the positive features was highest for the
frontoparietal networks (at a feature-selection threshold of p < 0.01: r = 0.42, p < 10e−6 and r
= 0.39, p = < 10e−5 for networks 1 and 2 respectively, r = 0.50, p < 10e−9 for the
combination of the two networks; this pattern was consistent across different statistical
thresholds used for feature selection). The subcortical-cerebellar network (network 4) also
had significant predictive power at a feature-selection threshold of p < 0.01 (r = 0.22, p =
0.01, though this result was less consistent across feature-selection thresholds). Based on the
negative features, only the default mode (network 3) had significant predictive power (r =
0.35, p < 10e−5 at p < 0.01; this result was consistent across feature-selection thresholds).
These results reinforce the functional relevance of our identification analyses, in that the
networks most discriminating of individuals are also the most relevant to individual
Author Manuscript
differences in behavior. Crucially, the relationship between connectivity and cognitive ability
is sufficiently robust to generalize to previously unseen subjects.
Discussion
Here we show that an individual’s functional brain connectivity profile is both unique and
reliable, analogous to a fingerprint. We demonstrate that it is possible, with near-perfect
accuracy, to identify individuals from a large group of subjects based solely on their
Author Manuscript
That the frontoparietal networks were most distinguishing of individuals—and the most
predictive of behavior—is consistent with the role these networks play in cognition. Nodes
in these networks tend to act as flexible hubs, switching connectivity patterns according to
task demands34. Additionally, broadly distributed across-network connectivity has been
reported in these same regions35, suggesting a role in large-scale coordination of brain
Author Manuscript
activity. Although the frontoparietal network is particularly active in tasks requiring a high
degree of cognitive control, here we show that it can identify individuals regardless of
whether the data is collected during task or at rest. Training cross-subject classifiers based
on frontoparietal connectivity to predict which task a subject is performing yields
classification accuracy that is statistically significant but still quite low34; the present
findings of high inter-individual variability may help explain these results. In light of this,
future work might use within-subject classification to explore how the frontoparietal
networks reorganize according to task demands in individual subjects.
Similarly, the frontoparietal networks emerged as most predictive of gF, which is consistent
with previous reports that structural and functional properties of these networks relate to
intelligence36–38. Also of interest, aberrant functional connectivity in the frontoparietal
Author Manuscript
networks has been linked to a variety of neuropsychiatric illnesses39,40. The work presented
here, while focused on healthy subjects, suggests that sensitivity may be compromised in
studies of disease if inferences are drawn only at the group level. New insights into
neuropsychiatric illnesses may be gained from an approach that links individual functional
connectivity profiles to a spectrum of behavioral and symptom measures rather than a single
diagnosis41,42.
Additional considerations
Author Manuscript
Note that the discriminating power of connectivity profiles here is a result of integrating over
a relatively long period of time (i.e., runs that last several minutes). There is a growing body
of literature43 showing that in the resting state, functional connectivity is dynamic and varies
considerably over short periods of time (i.e., intervals less than one minute). Future work
may seek to characterize individuals based on properties of these dynamic fluctuations,
however, the current results indicate that single measures of time-averaged functional
connectivity, based on relatively long scan sessions, provide meaningful information about
individuals.
Note also that the cross-session identification performed here was between sessions
separated by a single day; it remains unclear to what degree individual connectivity profiles
are consistent across the lifespan. Cross-sectional studies have shown changes in functional
connectivity with age44–46, but future work should employ longitudinal designs to test the
Author Manuscript
stability or evolution of the functional connectivity “fingerprint” over the course of months
or years rather than days.
From a methodological perspective, the scale of the node atlas appears to influence
identification accuracy. The parcellation used in our primary analysis consisted of 268 nodes
across the whole brain, with each node optimized to contain voxels with similar resting-state
timecourses10. This number is consistent with the range postulated by other groups47,48, but
represents a more fine-grained scheme than other atlases such as the automatic anatomic
labeling atlas (90 nodes)49 or the FreeSurfer atlas (68 nodes). In a comparison of
identification rates, network definitions based on our high-resolution parcellation
outperformed networks based on the FreeSurfer atlas; the coarser node size of the latter
likely diminishes accuracy by averaging out individual variability.
Author Manuscript
Conclusion
Together, these advances suggest that analysis of individual fMRI data is possible, and
indeed, desirable. Given this foundation, human neuroimaging studies have an opportunity
to move beyond population-level inferences, in which general networks are derived from the
whole sample, to inferences about single subjects, examining how individuals’ networks are
functionally organized in unique ways and relating this functional organization to behavioral
phenotypes in both health and disease.
Online Methods
Subject information
Author Manuscript
The primary dataset used in this work is from the Human Connectome Project (HCP). A
second dataset, acquired at Yale, was used for node and network definitions. These two
datasets are described in turn below.
HCP data—We used the Q2 HCP data release, which was all the HCP data publicly
available at the time that this project was begun. The full Q2 release contains 142 healthy
subjects; we restricted our analysis to subjects for whom all six fMRI sessions were
available (n = 126; 40 males, age 22 to 35). This represents a relatively large sample size
Author Manuscript
compared to most neuroimaging studies and has the advantage of being an open-source
dataset, facilitating replication and extension of this work by other researchers. Note that
most subjects have at least one blood relative in the group, with many sets of twins. A more
heterogeneous population should make the identification problem easier, and therefore the
high accuracy rate observed here, despite the homogeneity of the sample, underscores the
power of functional connectivity-based identification.
working memory, motor, language, and emotion. The HCP scanning protocol was approved
by the local Institutional Review Board at Washington University in St. Louis. For all
sessions, data from both the left-right (LR) and right-left (RL) phase-encoding runs were
used to calculate connectivity matrices. Full details on the HCP dataset can be found
elsewhere8.
resolution = 1 mm3) and a two-dimensional anatomical T1 image (TR = 285 ms, TE = 2.61
ms, resolution = 2.5 mm3) were acquired for registration purposes. All participants provided
written informed consent in accordance with a protocol approved by the Human Research
Protection Program of Yale University.
Preprocessing
The HCP minimal preprocessing pipeline was used50 for the HCP dataset. This pipeline
includes artifact removal, motion correction, and registration to standard space. For the Yale
dataset, images were motion corrected using SPM8 and were warped to common space
using a series of linear and non-linear transformations as previously described10.
For both the HCP and Yale datasets, standard preprocessing procedures were applied to the
Author Manuscript
fMRI data, including removing linear components related to the six motion parameters (Yale
data) or 12 motion parameters (HCP data; these include first derivatives, given as
Movement_Regressors_dt.txt), regressing the mean time courses of the white matter and
cerebro-spinal fluid as well as the global signal, removing the linear trend, and low-pass
filtering. For the HCP dataset, we investigated a range of spatial smoothing Gaussian kernel
sizes—from no smoothing to a full-width half-max (FWHM) of 4 mm, 6 mm or 8 mm—and
found that smoothing level had essentially no effect on identification accuracy (see
Supplementary Table 2); thus results based on data with no spatial smoothing are presented
Author Manuscript
in the main text. (Note that our node-based analysis, in contrast to a voxel-wise analysis,
contains a considerable degree of inherent smoothing because the timecourses of many
contiguous voxels are averaged into a single node.)
Image preprocessing and calculation of connectivity matrices was done using BioImage
Suite software51. Pearson correlation coefficients between pairs of node timecourses were
calculated and normalized to z scores using the Fisher transformation, resulting in a 268 ×
268 symmetric connectivity matrix for each session for each subject. Connectivity matrices
were not thresholded or binarized in any way.
separately into a target number of 150 regions. The final parcellation was examined to
ensure each node contained a reasonable number of voxels. Note that this single whole-brain
parcellation atlas was defined in MNI space, and was applied to all subjects in the HCP
dataset via traditional registration techniques. The parcellation image is publicly available on
the BioImage Suite NITRC page (https://www.nitrc.org/frs/?group_id=51).
In addition to parcellating the brain into 268 functionally coherent nodes, we further
clustered these nodes into large-scale networks. To define the networks, the same group-wise
spectral clustering algorithm was applied to connectivity matrices from the45 Yale subjects
to group the 268 nodes into eight networks10. The eight networks were evaluated and
compared visually to existing definitions of resting-state networks published by other
groups15,25. Despite the fact that we included subcortical regions and cerebellum, whereas
Author Manuscript
other definitions excluded these regions, our network configuration matched well with these
other network definitions. Our eight clusters represent approximately the following
networks: 1) medial frontal, 2) frontoparietal, 3) default mode, 4) subcortical/cerebellum, 5)
motor, 6) visual I, 7) visual II, 8) visual association (see Fig. 1c).
Identification analysis
Fig. 1b illustrates the prediction procedure. First, a database was created that consisted of all
the individual subjects’ connectivity matrices from a single condition, D = [Xi, i = 1, …,
126], where Xi is a 268 × 268 correlation matrix and the subscript i denotes subject. In the
identification step, the identity of the target matrix was predicted using a correlation matrix
obtained from a different session. To predict the subject identity, the similarity between the
current target matrix Yi and all other matrices in D was computed, and the predicted identity
Author Manuscript
was that with the maximal similarity score. Similarity was defined as the Pearson correlation
between two vectors of edge values taken from the target matrix and each of the database
matrices. Note that we performed prediction with replacement, such that the algorithm was
not forced to predict a unique subject on each iteration within a condition.
serve as the database set, and a second condition from day 2 to serve as the target set. Next,
Author Manuscript
subject identity was permuted—such that each subject in the target set was assigned a
“correct” identity corresponding to a different subject in the database set—and identification
performed. Then the roles of database and target sets were reversed. This procedure was
repeated 1,000 times.
individual identification than other edges. When performing subject identification, the
Pearson correlation coefficients were computed between the target connectivity pattern and
all connectivity patterns in the database, and the subject identity was chosen to be the one
that had the largest correlation coefficient. Computationally, the Pearson correlation of two
vectors is the sum of element-wise products, given that the two vectors are z-score
normalized (zero-mean with unit standard deviation). Therefore, this score can be broken
down to quantify the individual amount contributed from each entry in the vector, where
some edges contribute positively to the total coefficient and others contribute negatively.
Given two sets of connectivity matrices obtained from Rest1 and Rest2 runs
after z-score normalization, we computed the corresponding edge-wise product vector (φi),
Author Manuscript
Where i indexes subject, e indexes edge and M is the total number of edges in the selected
network (or the whole brain). The sum of φi over all edges ∑eφi(e) is the correlation between
and . The group consistency measure (Φ) was computed as the mean of φi across
all subjects. Large positive entries in Φ are edges that are consistent both within a subject
and across the group.
In the same way, we can calculate φi between patterns from different subjects, e.g.:
Author Manuscript
It is possible that an edge e is equally correlated within the same subject and between
different subjects. In other words, φij(e) (when subject subscript is not matched) and φii(e)
(when subject subscript is matched) are of similar value. Therefore, such an edge will not
contribute to distinguishing an individual from the other subjects.
For an edge to be truly helpful in individual identification, the following property must hold,
In this way, the particular edge contributes to maximize the correlation between connectivity
patterns from matched subjects. To quantify the differential power of an edge for the purpose
of subject identification, we computed an empirical probability, Pi(e),
We defined Pi(e) in this way so that it can be interpreted similarly as the p value in a
standard statistical test. The smaller the Pi(e)value the better differential power the edge has
to identify a single subject i. The overall differential power of an edge across all subjects is
then defined by the differential power measure (DP),
Author Manuscript
1,100 in increments of 100. For each number of time points n, identification accuracy was
tested for each of 500 randomizations; these randomizations were generated by choosing
among 50 starting points for each of the two rest runs and using n brain volumes beginning
with that starting point to calculate matrices. Results based on a database and target of Rest1
and Rest2, respectively, are shown in Fig. 3b; results based on reversing the database and
target were extremely similar.
cognitive task. Inputting additional information about the difference between task and rest
could potentially improve identification performance. Therefore, we created a database that
included a connectivity matrix obtained from a resting-state session and another matrix
obtained from a task session acquired on the same day: . For
identification, we always required that the target matrix be obtained on a different day. To
predict the subject identity, we projected the current target matrix Yi to the subspace
spanned by the pair , to obtain a projection Ỹi, and then computed the similarity
Author Manuscript
between Ỹi and Yi to find the best match. Results are shown in Fig. 3c.
the given node. Because the 68-node parcellation is created independently from the seven
networks, their boundaries do not align in a one-to-one manner. Therefore, we also allowed
a secondary network association for a node when the number of voxels in this network
ranked second and the proportion to the total number of voxels in the node exceeded 30
percent. All 68 nodes have at least one primary network association. When defining a sub-
matrix that represents a single network, we included nodes for which the target network was
either the primary or secondary label.
Comparisons of identification accuracy between the Shen and FreeSurfer and Yeo schemes
are shown in Fig. 4.
In order to rule out the possibility that successful identification was driven simply by
characteristic movement patterns leading to predictable motion artifacts, we performed
prediction using motion estimates only. HCP data collection provides an estimate of frame-
to-frame displacement for each run (Movement_RelativeRMS.txt). This data was used to
generate a discrete motion distribution vector from each of the six Relative RMS vectors
(from the six conditions, using data from the left-right phase encoding run). We computed
the mean and standard deviation of the Relative RMS across all conditions and subjects. We
then specified 60 bins that spanned three standard deviations below and above the grand
mean, and the motion distribution vectors were calculated accordingly. The motion
distribution vectors were then used in the same way as the correlation matrices for individual
identity prediction purposes.
While connectivity calculations are based on functional BOLD data, subtle effects of
anatomic variability could potentially confer a preference between the same subject on two
different days when it comes to applying the 268-node parcellation, defined in standard
space, to each individual subject. To help rule out confounds of anatomy introduced at the
registration step, we recalculated connectivity matrices based on BOLD data smoothed with
three different kernel sizes (4 mm, 6 mm and 8 mm, keeping all other preprocessing steps
the same), and re-performed the identification analysis; at higher levels of smoothing, any
registration advantage for the same brain relative to a different brain should be eliminated or
Author Manuscript
vastly reduced. The resulting identification accuracies are presented in Supplementary Table
2.
We also performed a second analysis to help rule out the possibility that identification is
driven mainly by anatomic rather than functional differences between subjects. Rather than
connectivity between pairs of nodes, we tested whether individuals could be identified based
simply on a measure of BOLD variance in each node (calculation described below). In
theory, while BOLD variance likely reflects baseline metabolic function to a substantial
degree, it could also be influenced by anatomic factors such as partial volume effects
introduced by the gray/white-matter segmentation and/or differing numbers of gray-matter
voxels per node due to underlying variation in regional tissue volumes and gyral folding
patterns. This analysis helps to address these potential confounds. BOLD variance was
calculated and identification performed as follows:
Author Manuscript
1. Within each node, we computed the mean BOLD signal in each frame.
This yields an N×268 matrix of node-wise mean BOLD intensities for
each subject for each condition, where N is the number of frames (1,200
for the resting-state runs and fewer for the task runs). (This is identical to
the first step in calculating connectivity matrices. Note that the mean
across the time dimension is zero because of the drift removal and band-
pass steps.)
2. For each node, using its N×1 timecourse vector, we compute its variance
as follows:
Author Manuscript
Behavioral prediction
In the HCP protocol, fluid intelligence (gF) was assessed using a form of Raven’s
Author Manuscript
progressive matrices with 24 items23 (scores are integers indicating number of correct items;
mean = 16.8, s.d. = 4.7, median = 18, mode = 20, range 5–24; HCP: PMAT24_A_CR).
individuals with both high head motion and low gF score. Thus, for purposes of the
Author Manuscript
behavioral analysis, we excluded subjects with particularly high motion during the Rest1
run; specifically, eight subjects with > 0.14 frame-to-frame head motion estimate (averaged
across both day 1 rest runs; HCP: Movement_RelativeRMS_mean) were excluded. There
was no correlation between head motion and gF in the remaining set of n = 118 subjects (r =
−0.05, p = 0.55).
Leave-one-subject-out cross-validation was used for the prediction analysis. In this iterative
analysis, features are selected and a predictive model is built based on n−1 subjects (the
training set) and the model is then tested on the remaining subject (the test set). Each subject
is left out once. Each iteration consisted of 1) feature selection, 2) model building, and 3)
prediction, described in turn below.
In the feature-selection step, Pearson correlation was performed between each edge in in the
Author Manuscript
connectivity matrices andgF score across subjects in the training set. (Note that it is not
necessary to correct for multiple comparisons in this step because the nature of a predictive
analysis includes a built-in guard against false positives: if the proportion of false positives
in the feature-selection step is high, the model should not generalize well to independent
data.) Based on the signs of the resulting correlation values, edges were separated into two
tails: those positively correlated with gF and those inversely correlated with gF. Edges were
then thresholded based on the statistical significance of their correlation with gF, resulting in
two sets of features (positive and negative). Because the choice of statistical threshold at this
step is somewhat arbitrary, a range of thresholds was tested — p < 0.01, 0.05, 0.10 — to
ensure that results were consistent.
strength,” by summing values of all edges in each feature set in individual connectivity
matrices. In graph-theoretic terms, this statistic can be thought of as a type of weighted
degree for each feature network52. This summary statistic can be represented as follows:
Where c is an individual s’s connectivity matrix, and m(+) and m(-) are binary matrices
Author Manuscript
indexing the edges (i,j) that are significantly positively or negatively correlated with gF,
respectively.
After obtaining network strength for each subject in the training set, simple linear regression
was used to model the relationship between network strength (the explanatory variable) and
gF (the dependent variable). Two models were built: one based on strength in the positive-
feature network, and a second based on strength in the negative-feature network. Each model
—positive and negative—consisted of a first-degree polynomial that fit the training data best
Author Manuscript
in a least-squares sense:
Finally, in the prediction step, positive and negative network strengths from the excluded
subject were calculated and input into each of the two respective models to generate
predicted gF scores for that subject.
These three steps were repeated iteratively such that each subject was excluded once.
Author Manuscript
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
Acknowledgments
Data were provided in part by the Human Connectome Project, WU-MinnConsortium (Principal Investigators:
David Van Essen and Kamil Ugurbil;1U54MH091657) funded by the 16 NIH Institutes and Centers that support the
NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington
University. This work was also supported by NIH EB009666 (RTC), T32 DA022975 (DS), and the NSF Graduate
Research Fellowship Program (ESF and MDR).
Author Manuscript
References
1. Mangin JF, Rivière D, Cachia A, Duchesnay E, Cointepas Y, et al. A framework to study the cortical
folding patterns. Neuroimage. 2004; 23(Suppl 1):S129–S138. [PubMed: 15501082]
2. Amunts K, Malikovic A, Mohlberg H, Schormann T, Zilles K. Brodmann's areas 17 and 18 brought
into stereotaxic space-where and how variable? Neuroimage. 2000; 11:66–84. [PubMed: 10686118]
3. Bürgel U, Amunts K, Hoemke L, Mohlberg H, Gilsbach JM, et al. White matter fiber tracts of the
human brain: three-dimensional mapping at microscopic resolution, topography and intersubject
variability. Neuroimage. 2006; 29:1092–1105. [PubMed: 16236527]
4. Grabner RH, Ansari D, Reishofer G, Stern E, Ebner F, et al. Individual differences in mathematical
competence predict parietal brain activation during mental calculation. Neuroimage. 2007; 38:346–
356. [PubMed: 17851092]
5. Newman SD, Carpenter PA, Varma S, Just MA. Frontal and parietal participation in problem solving
in the Tower of London: fMRI and computational modeling of planning and high-level perception.
Author Manuscript
9. Barch DM, Burgess GC, Harms MP, Petersen SE, Schlaggar BL, et al. Function in the human
connectome: Task-fMRI and individual differences in behavior. Neuroimage. 2013; 80:169–189.
Author Manuscript
[PubMed: 23684877]
10. Shen X, Tokoglu F, Papademetris X, Constable R. Groupwise whole-brain parcellation from
resting-state fMRI data for network node identification. Neuroimage. 2013; 82:403–415.
[PubMed: 23747961]
11. Bianciardi M, Fukunaga M, van Gelderen P, Horovitz SG, de Zwart JA, et al. Modulation of
spontaneous fMRI activity in human visual cortex by behavioral state. Neuroimage. 2009; 45:160–
168. [PubMed: 19028588]
12. Jiang T, He Y, Zang Y, Weng X. Modulation of functional connectivity during the resting state and
the motor task. Hum Brain Mapp. 2004; 22:63–71. [PubMed: 15083527]
13. Stevens WD, Buckner RL, Schacter DL. Correlated low-frequency BOLD fluctuations in the
resting human brain are modulated by recent experience in category-preferential visual regions.
Cereb Cortex. 2010; 20:1997–2006. [PubMed: 20026486]
14. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, et al. Automatically parcellating
the human cerebral cortex. Cereb Cortex. 2004; 14:11–22. [PubMed: 14654453]
Author Manuscript
15. Buckner RL, Krienen FM, Castellanos A, Diaz JC, Yeo BT. The organization of the human
cerebellum estimated by intrinsic functional connectivity. J. Neurophysiol. 2011; 106:2322–2345.
[PubMed: 21795627]
16. Van Dijk KR, Sabuncu MR, Buckner RL. The influence of head motion on intrinsic functional
connectivity MRI. Neuroimage. 2012; 59:431–438. [PubMed: 21810475]
17. Cattell, RB. Intelligence: Its Structure, Growth and Action: Its Structure, Growth and Action.
Amsterdam, Netherlands: Elsevier; 1987.
18. Deary IJ, Whalley LJ, Lemmon H, Crawford JR, Starr JM. The Stability of Individual Differences
in Mental Ability from Childhood to Old Age: Follow-up of the 1932 Scottish Mental Survey.
Intelligence. 2000; 28:49–55.
19. Colom R, Flores-Mendoza CE. Intelligence predicts scholastic achievement irrespective of SES
factors: Evidence from Brazil. Intelligence. 2007; 35:243–251.
20. Strenze T. Intelligence and socioeconomic success: A meta-analytic review of longitudinal
research. Intelligence. 2007; 35:401–426.
Author Manuscript
21. Gottfredson LS. Intelligence: is it the epidemiologists' elusive" fundamental cause" of social class
inequalities in health? J. Pers. Soc. Psychol. 2004; 86:174. [PubMed: 14717635]
22. Chandola T, Deary I, Blane D, Batty G. Childhood IQ in relation to obesity and weight gain in
adult life: the National Child Development (1958) Study. Int. J. Obes. 2006; 30:1422–1432.
23. Bilker WB, Hansen JA, Brensinger CM, Richard J, Gur RE, et al. Development of abbreviated
nine-item forms of the Raven’s Standard Progressive Matrices Test. Assessment. 2012
1073191112446655.
24. Cole MW, Bassett DS, Power JD, Braver TS, Petersen SE. Intrinsic and task-evoked network
architectures of the human brain. Neuron. 2014; 83:238–251. [PubMed: 24991964]
25. Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, et al. Correspondence of the brain's functional
architecture during activation and rest. Proc Natl Acad Sci U S A. 2009; 106:13040–13045.
[PubMed: 19620724]
26. Martuzzi R, Ramani R, Qiu M, Rajeevan N, Constable RT. Functional connectivity and alterations
in baseline brain state in humans. Neuroimage. 2010; 49:823–834. [PubMed: 19631277]
27. Laumann TO, Gordon EM, Adeyemo B, Snyder AZ, Joo SJ, et al. Functional System and Areal
Author Manuscript
31. Zilles K, Armstrong E, Schleicher A, Kretschmann HJ. The human pattern of gyrification in the
cerebral cortex. Anat Embryol (Berl). 1988; 179:173–179. [PubMed: 3232854]
Author Manuscript
32. Hill J, Dierker D, Neil J, Inder T, Knutsen A, et al. A surface-based analysis of hemispheric
asymmetries and folding of cerebral cortex in term-born human infants. J Neurosci. 2010;
30:2268–2276. [PubMed: 20147553]
33. Miranda-Dominguez O, Mills BD, Carpenter SD, Grant KA, Kroenke CD, et al. Connectotyping:
Model Based Fingerprinting of the Functional Connectome. 2014
34. Cole MW, Reynolds JR, Power JD, Repovs G, Anticevic A, et al. Multi-task connectivity reveals
flexible hubs for adaptive task control. Nat. Neurosci. 2013; 16:1348–1355. [PubMed: 23892552]
35. Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, et al. Functional network organization of
the human brain. Neuron. 2011; 72:665–678. [PubMed: 22099467]
36. Cole MW, Yarkoni T, Repovš G, Anticevic A, Braver TS. Global connectivity of prefrontal cortex
predicts cognitive control and intelligence. J. Neurosci. 2012; 32:8988–8999. [PubMed:
22745498]
37. Choi YY, Shamosh NA, Cho SH, DeYoung CG, Lee MJ, et al. Multiple bases of human
intelligence revealed by cortical thickness and neural activation. J. Neurosci. 2008; 28:10323–
Author Manuscript
23707587]
44. Hampson M, Tokoglu F, Shen X, Scheinost D, Papademetris X, et al. Intrinsic brain connectivity
related to age in young and middle aged adults. PLoS One. 2012; 7:e44067. [PubMed: 22984460]
45. Meunier D, Achard S, Morcom A, Bullmore E. Age-related changes in modular organization of
human brain functional networks. Neuroimage. 2009; 44:715–723. [PubMed: 19027073]
46. Scheinost D, Finn ES, Tokoglu F, Shen X, Papademetris X, et al. Sex differences in normal age
trajectories of functional brain networks. Hum. Brain Mapp. 2014
47. Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS. A whole brain fMRI atlas
generated via spatially constrained spectral clustering. Hum. Brain Mapp. 2012; 33:1914–1928.
[PubMed: 21769991]
48. Van Essen DC, Glasser MF, Dierker DL, Harwell J, Coalson T. Parcellations and hemispheric
asymmetries of human cerebral cortex analyzed on surface-based atlases. Cereb Cortex. 2012;
22:2241–2262. [PubMed: 22047963]
49. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, et al. Automated
anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI
Author Manuscript
References
8. Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, et al. The WU-Minn human
connectome project: an overview. Neuroimage. 2013; 80:62–79. [PubMed: 23684880]
10. Shen X, Tokoglu F, Papademetris X, Constable R. Groupwise whole-brain parcellation from
resting-state fMRI data for network node identification. Neuroimage. 2013; 82:403–415.
[PubMed: 23747961]
15. Buckner RL, Krienen FM, Castellanos A, Diaz JC, Yeo BT. The organization of the human
cerebellum estimated by intrinsic functional connectivity. J. Neurophysiol. 2011; 106:2322–2345.
Author Manuscript
[PubMed: 21795627]
23. Bilker WB, Hansen JA, Brensinger CM, Richard J, Gur RE, et al. Development of abbreviated
nine-item forms of the Raven’s Standard Progressive Matrices Test. Assessment. 2012
1073191112446655.
25. Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, et al. Correspondence of the brain's functional
architecture during activation and rest. Proc Natl Acad Sci U S A. 2009; 106:13040–13045.
[PubMed: 19620724]
50. Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, et al. The minimal preprocessing
pipelines for the Human Connectome Project. Neuroimage. 2013; 80:105–124. [PubMed:
23668970]
51. Joshi A, Scheinost D, Okuda H, Belhachemi D, Murphy I, et al. Unified framework for
development, deployment and robust testing of neuroimaging algorithms. Neuroinformatics. 2011;
9:69–84. [PubMed: 21249532]
52. Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations.
Author Manuscript
the database. The predicted identity (ID*) is the one with the highest correlation coefficient.
c) Node and network definitions. We used a 268-node functional atlas defined on an
independent dataset of healthy control subjects using a group-wise spectral clustering
algorithm. Nodes were further grouped into eight networks using the same clustering
algorithm, and these networks were named according to their correspondence to other
existing resting-state network definitions.
posterior (bottom of circle), and split into left and right hemispheres; lines indicate edges. In
the colored matrices (right), the same data are plotted as percentage of edges within and
between each pair of networks; a darkly shaded cell indicates a relative over-representation
of that network pair in the DP (top) or Φ (bottom) masks. PFC, prefrontal; Mot, motor; Ins,
insula; Par, parietal; Tem, temporal; Occ, occipital; Lim, limbic (including cingulate cortex,
amygdala and hippocampus); Cer, cerebellum; Sub, subcortical (including thalamus and
striatum); Bsm, brainstem; L, left hemisphere; R, right hemisphere. b) Longer timeseries
improve identification accuracy. To control for the fact that task sessions contained fewer
time points than rest sessions, we recalculated rest connectivity matrices using truncated
timeseries containing between 100 and 1,100 time points. Results shown are from 500
randomizations using Rest1 and Rest2 as the database and target sessions, respectively. Box
represents median with 25 and 75th percentiles; whiskers represent range. c) Use of a two-
Author Manuscript
matrix database improves identification rate relative to a single matrix (task or rest). Dots
and error bars represent mean and range of identification rate across all possible database
and target pairs, where the target matrix was always from a task session and the database
consisted of a rest-task pair (n = 8 combinations), task only (n = 8) or rest only (n = 4). *p <
0.01, Mann Whitney U test.
are higher using the FS+Yeo scheme, for both diagonal elements (n = 126) and off-diagonal
elements (n = 15,876). **p < 10−5, two-tailed t-test. c) Cross-subject correlation matrices
after z score normalization (top; scale bar indicates z score). The global difference in
correlation values is eliminated since there is no significant difference in the off-diagonal z
scores. However, correlations between diagonal elements are significantly higher using the
Shen scheme than the FS+Yeo scheme (bottom; error bars represent ± s.d.), which helps
account for the increase in identification accuracy using the Shen scheme. **p < 10−5, two-
tailed t-test.
Author Manuscript
for best-fit line, used to assess predictive power of the model. b) Mean fraction of within-
network edges selected in the whole-brain positive-feature (left, red) and negative-feature
(right, blue) models, shown at a range of statistical thresholds for feature selection. Y-axis
indicates mean fraction of edges selected across all LOOCV iterations; x-axis indicates
network label (see Fig. 1c). c) Results from a LOOCV analysis in which feature selection
was restricted to within-network edges in the frontoparietal networks (1 and 2), at a feature-
selection threshold of p < 0.01. As in (a), each dot is one subject; gray area represents 95%
confidence interval for best-fit line. d) Results from nine separate LOOCV analyses in
which feature selection was restricted to within-network edges in each of the eight networks
plus a combination of networks 1 and 2. Y-axis indicates correlation between predicted and
observed gF scores; x-axis indicates network label. Asterisks indicate correlations significant
at p < 0.05 (uncorrected). Results based on a range of feature-selection thresholds (p-values)
are shown to demonstrate consistency across thresholds. Note that for some networks, no
Author Manuscript
features passed the statistical thresholding step, and thus it was not possible to generate
predictions; this is reflected by missing bars.